由于文章太长,其余部分在我的其他几篇博客中!
- 第一部分:Hadoop介绍及安装
- 第二部分:HDFS
- 第三部分:MapReduce
6、基于Web日志数据处理的网站KPI分析系统项目
分析资料
- 链接:https://pan.baidu.com/s/1sn9uRWi3Rhl4GL4g04Tv5w 提取码:zidg
6.1 项目开发流程
6.2 项目任务
6.2.1 合并小文件
说明:
- 由于在网络上挖掘下来的数据可能不止一个文件,而对每一个小文件进行MapReduce分析则需要分成等数量的块,很可能会对mapreduce造成很大的压力,所以在之前根据需求有必要进行一个简单的文件合并
代码:
FirstStep.java
package com.atSchool.WebLog;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
* 第一步:
* 合并小文件
*/
public class FirstStep {
private InputStream inputStream; // 输入流,用于读入数据
private OutputStream outputStream; // 输出流,用于写出数据
private String localPath; // 本地路径,指向需要合并的小文件
private String hdfsPath; // hdfs路径,指向合并后输出结果的路径
public FirstStep(String localPath, String hdfsPath) {
this.localPath = localPath;
this.hdfsPath = hdfsPath;
}
/**
* 合并步骤:
* 1.根据用户定义的参数设置本地目录和HDFS的目标文件
* 2.创建一个输出流写入到HDFS文件
* 3.遍历本地目录中的每个文件,打开文件,并读取文件内容,将文件的内容写到HDFS文件中。
*/
private void startWrite(File file) throws IOException {
if (file.isFile()) { // 如果是一个文件
int len = 0;
byte[] buffer = new byte[1024];
inputStream = new BufferedInputStream(new FileInputStream(file));
while ((len = inputStream.read(buffer)) > 0) { // 如果返回的实际读取的字节数>0就一直读取
outputStream.write(buffer, 0, len); // 写出到hdfs上
}
} else if (file.isDirectory()) { // 如果是一个目录
File[] listFiles = file.listFiles();
for (File file2 : listFiles) {
startWrite(file2);
}
}
}
/**
* 关闭输入输出流
* @throws IOException
*/
private void close() throws IOException {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
}
/**
* 开始合并
* @throws IOException
*/
private void start() throws IOException {
File file = new File(localPath); // 本地文件
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
FileSystem fs = FileSystem.get(conf);
// 在指定路径下打开HDFS的输出流
outputStream = fs.create(new Path(hdfsPath));
// 开始将本地的文件写入到hdfs中
startWrite(file);
System.out.println("合并成功!");
// 关闭所有的输入输出流
close();
}
// 测试
public static void main(String[] args) throws IOException {
new FirstStep("C:\\Users\\USER_WU\\Desktop\\测试", "/合并小文件测试.txt").start();
}
}
6.2.2 完成网站KPI指标的统计
说明:
- KPI:关键业绩指标
- 我们可以通过KPI看出这个网站的经营状况
- 这里我们统计
PV(PageView): 页面访问量统计
和IP: 页面独立IP的访问量统计
1、页面访问量统计
代码:
LogEntity.java
package com.atSchool.WebLog;
/**
* 日志解析类
*/
public class LogEntity {
private String date; // 请求的日期
private String time; // 请求的时间
private String c_ip; // 访问用户的 IP 地址或者用户使用的代理服务器 IP 地址
private String cs_username; // 用户名,由于通常用户没有进行注册,故一般都为占位符“-”
private String s_ip; // 客户端访问网站的IP 地址
private String s_port; // 客户端访问网站的端口号
private String cs_method; // 访问者的请求命令,常见的方法有三种,分别是 GET、POST 和 HEAD
private String cs_uri_stem; // 访问者请求的资源,即相对于服务器上根目录的途径
private String cs_uri_query; // 协议类型,由于通常使用HTTP协议,故一般使用占位符“-”
private String sc_status; // 服务器返回的状态代码。一般而言,以2开头的状态代码表示成功,以3开头表示由于各种不同的原因用户请求被重定向到了其他位置,以4开头表示用户端存在某种错误,以5开头表示服务器遇到了某个错误;
private String cs_User_Agent; // 附加信息,包括浏览器类型、操作系统等
private boolean isValid; // 判断值是否有效
public LogEntity() {
}
public LogEntity(String line) {
String[] split = line.split(" +");
if ((line.charAt(0) != '#') && (split.length == 11)) {
this.date = split[0];
this.time = split[1];
this.c_ip = split[2];
this.cs_username = split[3];
this.s_ip = split[4];
this.s_port = split[5];
this.cs_method = split[6];
this.cs_uri_stem = split[7];
this.cs_uri_query = split[8];
this.sc_status = split[9];
this.cs_User_Agent = split[10];
this.isValid = true;
} else {
this.isValid = false;
}
}
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public String getTime() {
return time;
}
public void setTime(String time) {
this.time = time;
}
public String getC_ip() {
return c_ip;
}
public void setC_ip(String c_ip) {
this.c_ip = c_ip;
}
public String getCs_username() {
return cs_username;
}
public void setCs_username(String cs_username) {
this.cs_username = cs_username;
}
public String getS_ip() {
return s_ip;
}
public void setS_ip(String s_ip) {
this.s_ip = s_ip;
}
public String getS_port() {
return s_port;
}
public void setS_port(String s_port) {
this.s_port = s_port;
}
public String getCs_method() {
return cs_method;
}
public void setCs_method(String cs_method) {
this.cs_method = cs_method;
}
public String getCs_uri_stem() {
return cs_uri_stem;
}
public void setCs_uri_stem(String cs_uri_stem) {
this.cs_uri_stem = cs_uri_stem;
}
public String getCs_uri_query() {
return cs_uri_query;
}
public void setCs_uri_query(String cs_uri_query) {
this.cs_uri_query = cs_uri_query;
}
public String getSc_status() {
return sc_status;
}
public void setSc_status(String sc_status) {
this.sc_status = sc_status;
}
public String getCs_User_Agent() {
return cs_User_Agent;
}
public void setCs_User_Agent(String cs_User_Agent) {
this.cs_User_Agent = cs_User_Agent;
}
public boolean isValid() {
return isValid;
}
public void setValid(boolean isValid) {
this.isValid = isValid;
}
@Override
public String toString() {
return "date=" + date + ", time=" + time + ", c_ip=" + c_ip + ", cs_username=" + cs_username + ", s_ip=" + s_ip
+ ", s_port=" + s_port + ", cs_method=" + cs_method + ", cs_uri_stem=" + cs_uri_stem + ", cs_uri_query="
+ cs_uri_query + ", sc_status=" + sc_status + ", cs_User_Agent=" + cs_User_Agent + ", isValid="
+ isValid;
}
}
WebVisitsNumsJob.java / WebVisitsNumsMapper.java / WebVisitsNumsReduce.java
package com.atSchool.WebLog;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.atSchool.utils.HDFSUtils;
/**
* 页面访问量
*/
public class WebVisitsNumsJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
new ToolRunner().run(new WebVisitsNumsJob(), null);
}
@Override
public int run(String[] args) throws Exception {
// 获取Job
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
Job job = Job.getInstance(configuration);
// 设置需要运行的任务
job.setJarByClass(WebVisitsNumsJob.class);
// 告诉job Map和Reduce在哪
job.setMapperClass(WebVisitsNumsMapper.class);
job.setReducerClass(WebVisitsNumsReduce.class);
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 告诉job Reduce输出的key和value的数据类型的是什么
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 告诉job输入和输出的路径
FileInputFormat.addInputPath(job, new Path("/web.log"));
/**
* 因为输出的文件不允许存在,所以需要处理一下
*/
FileSystem fileSystem = HDFSUtils.getFileSystem();
Path path = new Path("/MapReduceOut");
if (fileSystem.exists(path)) {
fileSystem.delete(path, true);
System.out.println("删除成功");
}
FileOutputFormat.setOutputPath(job, path);
// 提交任务
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
class WebVisitsNumsMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outKey = new Text();
private IntWritable outValue = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
LogEntity logEntity = new LogEntity(value.toString());
// 由于静态资源不算页面访问量,所以得进行过滤
if (logEntity.isValid() == true) {
String cs_uri = logEntity.getCs_uri_stem();
if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
outKey.set(cs_uri);
context.write(outKey, outValue);
}
}
}
}
class WebVisitsNumsReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable outValue = new IntWritable(1);
@Override
protected void reduce(Text key, Iterable<IntWritable> value,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable intWritable : value) {
sum += intWritable.get();
}
outValue.set(sum);
context.write(key, outValue);
}
}
2、页面访问量统计数据写到MySQL数据库中
代码:
LogWritable.java
package com.atSchool.WebLog;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.hadoop.mapred.lib.db.DBWritable;
/**
* 输出到MySQL对应的日志类
*/
public class LogWritable implements DBWritable {
private String uri;
private Integer nums;
public LogWritable() {
}
public LogWritable(String line) {
String[] split = line.split("\t");
if (split.length == 2) {
this.uri = split[0];
this.nums = Integer.valueOf(split[1]);
}
}
public String getUri() {
return uri;
}
public void setUri(String uri) {
this.uri = uri;
}
public Integer getNums() {
return nums;
}
public void setNums(Integer nums) {
this.nums = nums;
}
@Override
public void write(PreparedStatement statement) throws SQLException {
statement.setString(1, uri);
statement.setInt(2, nums);
}
@Override
public void readFields(ResultSet resultSet) throws SQLException {
this.uri = resultSet.getString(1);
this.nums = resultSet.getInt(2);
}
@Override
public String toString() {
return "uri=" + uri + ", nums=" + nums;
}
}
MRToMysqlMapper.java / MRToMysqlJob.java
package com.atSchool.WebLog;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* 读取HDFS的文件输出到MySQL中
*/
class MRToMysqlMapper extends Mapper<LongWritable, Text, LogWritable, NullWritable> {
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, LogWritable, NullWritable>.Context context)
throws IOException, InterruptedException {
LogWritable logWritable = new LogWritable(value.toString());
context.write(logWritable, NullWritable.get());
}
}
public class MRToMysqlJob extends Configured implements Tool {
private String className = "com.mysql.cj.jdbc.Driver";
private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
private String user = "root";
private String password = "password";
public static void main(String[] args) throws Exception {
new ToolRunner().run(new MRToMysqlJob(), null);
}
@Override
public int run(String[] args) throws Exception {
/**
* 获取job:一个工作对象
*/
// 创建一个 配置 对象
Configuration configuration = new Configuration();
// 设置 name属性 的值。
// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
// 名称将在配置前进行修剪。
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
// 在configuration中设置数据库访问相关字段。
DBConfiguration.configureDB(configuration, className, url, user, password);
// 根据配置文件创建一个job
Job job = Job.getInstance(configuration);
/**
* 设置job
*/
/**
* setOutput(Job job, String tableName, String... fieldNames) throws IOException
* 用适当的输出设置初始化作业的缩减部分
* 参数:
* job:The job
* tableName:要插入数据的表
* fieldNames:表中的字段名。
*/
DBOutputFormat.setOutput(job, "webkpi", new String[] { "uri", "nums" });
// 通过查找给定类的来源来设置Jar。
job.setJarByClass(MRToMysqlJob.class);
// 给 job 设置 Map和Reduce
job.setMapperClass(MRToMysqlMapper.class);
job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(LogWritable.class);
job.setMapOutputValueClass(NullWritable.class);
// 给 job 设置InputFormat
// InputFormat:描述 Map-Reduce job 的输入规范
// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
/**
* 设置输入路径
*/
FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");
// 将job提交到集群并等待它完成。
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
3、页面独立IP的访问量统计
说明:
- 独立IP,即统计每个IP的访问量
代码:
LogEntity.java
package com.atSchool.WebLog.AloneIP;
/**
* 日志解析类
*/
public class LogEntity {
private String date; // 请求的日期
private String time; // 请求的时间
private String c_ip; // 访问用户的 IP 地址或者用户使用的代理服务器 IP 地址
private String cs_username; // 用户名,由于通常用户没有进行注册,故一般都为占位符“-”
private String s_ip; // 客户端访问网站的IP 地址
private String s_port; // 客户端访问网站的端口号
private String cs_method; // 访问者的请求命令,常见的方法有三种,分别是 GET、POST 和 HEAD
private String cs_uri_stem; // 访问者请求的资源,即相对于服务器上根目录的途径
private String cs_uri_query; // 协议类型,由于通常使用HTTP协议,故一般使用占位符“-”
private String sc_status; // 服务器返回的状态代码。一般而言,以2开头的状态代码表示成功,以3开头表示由于各种不同的原因用户请求被重定向到了其他位置,以4开头表示用户端存在某种错误,以5开头表示服务器遇到了某个错误;
private String cs_User_Agent; // 附加信息,包括浏览器类型、操作系统等
private boolean isValid; // 判断值是否有效
public LogEntity() {
}
public LogEntity(String line) {
String[] split = line.split(" +");
if ((line.charAt(0) != '#') && (split.length == 11)) {
this.date = split[0];
this.time = split[1];
this.c_ip = split[2];
this.cs_username = split[3];
this.s_ip = split[4];
this.s_port = split[5];
this.cs_method = split[6];
this.cs_uri_stem = split[7];
this.cs_uri_query = split[8];
this.sc_status = split[9];
this.cs_User_Agent = split[10];
this.isValid = true;
} else {
this.isValid = false;
}
}
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public String getTime() {
return time;
}
public void setTime(String time) {
this.time = time;
}
public String getC_ip() {
return c_ip;
}
public void setC_ip(String c_ip) {
this.c_ip = c_ip;
}
public String getCs_username() {
return cs_username;
}
public void setCs_username(String cs_username) {
this.cs_username = cs_username;
}
public String getS_ip() {
return s_ip;
}
public void setS_ip(String s_ip) {
this.s_ip = s_ip;
}
public String getS_port() {
return s_port;
}
public void setS_port(String s_port) {
this.s_port = s_port;
}
public String getCs_method() {
return cs_method;
}
public void setCs_method(String cs_method) {
this.cs_method = cs_method;
}
public String getCs_uri_stem() {
return cs_uri_stem;
}
public void setCs_uri_stem(String cs_uri_stem) {
this.cs_uri_stem = cs_uri_stem;
}
public String getCs_uri_query() {
return cs_uri_query;
}
public void setCs_uri_query(String cs_uri_query) {
this.cs_uri_query = cs_uri_query;
}
public String getSc_status() {
return sc_status;
}
public void setSc_status(String sc_status) {
this.sc_status = sc_status;
}
public String getCs_User_Agent() {
return cs_User_Agent;
}
public void setCs_User_Agent(String cs_User_Agent) {
this.cs_User_Agent = cs_User_Agent;
}
public boolean isValid() {
return isValid;
}
public void setValid(boolean isValid) {
this.isValid = isValid;
}
@Override
public String toString() {
return "date=" + date + ", time=" + time + ", c_ip=" + c_ip + ", cs_username=" + cs_username + ", s_ip=" + s_ip
+ ", s_port=" + s_port + ", cs_method=" + cs_method + ", cs_uri_stem=" + cs_uri_stem + ", cs_uri_query="
+ cs_uri_query + ", sc_status=" + sc_status + ", cs_User_Agent=" + cs_User_Agent + ", isValid="
+ isValid;
}
}
AloneIPVisitsNumsMapper.java / AloneIPVisitsNumsCombiner.java
package com.atSchool.WebLog.AloneIP;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
/**
* 页面访问量
*
* 输出格式:
* mapper-out:172.17.40.35 /news/newsweb/call_news_top.asp
* mapper-out:172.16.74.253 /index.asp
* mapper-out:172.16.74.253 /news/newsweb/call_news_top.asp
* mapper-out:172.16.94.47 /index.asp
* mapper-out:172.16.80.33 /index.asp
*/
public class AloneIPVisitsNumsMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
private Text outKey = new Text();
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, NullWritable>.Context context)
throws IOException, InterruptedException {
LogEntity logEntity = new LogEntity(value.toString());
if (logEntity.isValid() == true) {
String cs_uri = logEntity.getCs_uri_stem();
String c_ip = logEntity.getC_ip();
// 由于静态资源不算页面访问量,所以得进行过滤
if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
// 注意:由于相同的IP+相同的uri只能算一个访问记录,所以这里得传`IP+uri`
outKey.set(c_ip + "\t" + cs_uri);
context.write(outKey, NullWritable.get());
// System.out.println("mapper-out:" + outKey.toString() + "\t");
}
}
}
}
/**
* combiner:这里只是输出一下,为了减轻reduce的压力
*
* 输出格式:
* combiner-out:219.132.7.2 /index.asp
* combiner-out:219.132.7.2 /info_pub/zsjy/zsxx/jj.htm
* combiner-out:219.132.7.2 /news/newsweb/call_news_top.asp
* combiner-out:219.132.83.251 /index.asp
* combiner-out:219.132.83.251 /info_pub/zsjy/zsxx/01yishu.htm
* combiner-out:219.132.83.251 /info_pub/zsjy/zsxx/pic/zsxx.htm
* combiner-out:219.132.83.251 /news/newsweb/call_news_top.asp
* combiner-out:219.132.83.251 /pop/newyear.htm
*/
class AloneIPVisitsNumsCombiner extends Reducer<Text, NullWritable, Text, NullWritable> {
@Override
protected void reduce(Text key, Iterable<NullWritable> value,
Reducer<Text, NullWritable, Text, NullWritable>.Context context) throws IOException, InterruptedException {
context.write(key, NullWritable.get());
// System.out.println("combiner-out:" + key.toString() + "\t");
}
}
AloneIPVisitsNumsReduce.java / StringSameCount.java
package com.atSchool.WebLog.AloneIP;
import java.io.IOException;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
/**
* 由于combiner提前进行了合并,所以传过来的都是不相同的
* 对数据进行分割,统计所有相同的uri
* 最终输出到文件中
*/
public class AloneIPVisitsNumsReduce extends Reducer<Text, NullWritable, Text, IntWritable> {
// 存储传过来的值
Set<String> s = new HashSet<String>();
@Override
protected void reduce(Text key, Iterable<NullWritable> value,
Reducer<Text, NullWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
// 直接将传过来的值装到HashSet集合中
// 无序(存储顺序和读取顺序不同),不包含重复元素的集合。
s.add(key.toString());
}
@Override
protected void cleanup(Reducer<Text, NullWritable, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
StringSameCount stringSameCount = new StringSameCount();
for (String string : s) {
String[] split = string.split("\t");
stringSameCount.hashInsert(split[1]); // 将uri放入到stringSameCount中,进行统计
}
HashMap<String, Integer> map = stringSameCount.getHashMap(); // 获取统计好后的结果
Set<String> keySet = map.keySet();
for (String key : keySet) {
Integer value = map.get(key); // 根据key获取value
context.write(new Text(key), new IntWritable(value));
// System.out.println("reduce-out:" + key + "\t" + value);
}
}
}
// 用来处理数据的类
class StringSameCount {
private HashMap<String, Integer> map;
private int counter; // 计数器
public StringSameCount() {
map = new HashMap<String, Integer>();
}
// 判断是否有重复的key
public void hashInsert(String string) {
if (map.containsKey(string)) { // 判断map中是否有相同的key
counter = (Integer) map.get(string); // 根据key获取值
map.put(string, ++counter); // 值+1
} else { // 如果没有则key为string,value为1
map.put(string, 1);
}
}
public HashMap<String, Integer> getHashMap() {
return map;
}
}
AloneIPVisitsNumsJob.java
package com.atSchool.WebLog.AloneIP;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.atSchool.utils.HDFSUtils;
public class AloneIPVisitsNumsJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
new ToolRunner().run(new AloneIPVisitsNumsJob(), null);
}
@Override
public int run(String[] args) throws Exception {
// 获取Job
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
Job job = Job.getInstance(configuration);
// 设置需要运行的任务
job.setJarByClass(AloneIPVisitsNumsJob.class);
// 告诉job Map和Reduce在哪
job.setMapperClass(AloneIPVisitsNumsMapper.class);
job.setReducerClass(AloneIPVisitsNumsReduce.class);
job.setCombinerClass(AloneIPVisitsNumsCombiner.class);
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
// 告诉job Reduce输出的key和value的数据类型的是什么
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 告诉job输入和输出的路径
FileInputFormat.addInputPath(job, new Path("/web.log"));
/**
* 因为输出的文件不允许存在,所以需要处理一下
*/
FileSystem fileSystem = HDFSUtils.getFileSystem();
Path path = new Path("/MapReduceOut");
if (fileSystem.exists(path)) {
fileSystem.delete(path, true);
System.out.println("删除成功");
}
FileOutputFormat.setOutputPath(job, path);
// 提交任务
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
4、页面独立IP的访问量统计写到MySQL中
AloneIpWritable.java
package com.atSchool.WebLog.AloneIP;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.hadoop.mapred.lib.db.DBWritable;
/**
* 独立Ip访问量输出到MySQL对应的日志类
*/
public class AloneIpWritable implements DBWritable {
private String uri;
private Integer count;
public AloneIpWritable() {
}
public AloneIpWritable(String line) {
String[] split = line.split("\t");
this.uri = split[0];
this.count = Integer.valueOf(split[1]);
}
public String getUri() {
return uri;
}
public void setUri(String uri) {
this.uri = uri;
}
public Integer getCounter() {
return count;
}
public void setCounter(Integer counter) {
this.count = counter;
}
@Override
public void write(PreparedStatement statement) throws SQLException {
statement.setString(1, this.uri);
statement.setInt(2, this.count);
}
@Override
public void readFields(ResultSet resultSet) throws SQLException {
this.uri = resultSet.getString(1);
this.count = resultSet.getInt(2);
}
@Override
public String toString() {
return "uri=" + uri + ", counter=" + count;
}
}
MRToMysqlMapper.java / MRToMysqlJob.java
package com.atSchool.WebLog.AloneIP;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* 独立Ip访问量
* 读取HDFS的文件输出到MySQL中
*/
class MRToMysqlMapper extends Mapper<LongWritable, Text, AloneIpWritable, NullWritable> {
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, AloneIpWritable, NullWritable>.Context context)
throws IOException, InterruptedException {
AloneIpWritable logWritable = new AloneIpWritable(value.toString());
context.write(logWritable, NullWritable.get());
}
}
public class MRToMysqlJob extends Configured implements Tool {
private String className = "com.mysql.cj.jdbc.Driver";
private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
private String user = "root";
private String password = "password";
public static void main(String[] args) throws Exception {
new ToolRunner().run(new MRToMysqlJob(), null);
}
@Override
public int run(String[] args) throws Exception {
/**
* 获取job:一个工作对象
*/
// 创建一个 配置 对象
Configuration configuration = new Configuration();
// 设置 name属性 的值。
// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
// 名称将在配置前进行修剪。
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
// 在configuration中设置数据库访问相关字段。
DBConfiguration.configureDB(configuration, className, url, user, password);
// 根据配置文件创建一个job
Job job = Job.getInstance(configuration);
/**
* 设置job
*/
/**
* setOutput(Job job, String tableName, String... fieldNames) throws IOException
* 用适当的输出设置初始化作业的缩减部分
* 参数:
* job:The job
* tableName:要插入数据的表
* fieldNames:表中的字段名。
*/
DBOutputFormat.setOutput(job, "alone_ip_kpi", new String[] { "uri", "count" });
// 通过查找给定类的来源来设置Jar。
job.setJarByClass(MRToMysqlJob.class);
// 给 job 设置 Map和Reduce
job.setMapperClass(MRToMysqlMapper.class);
job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(AloneIpWritable.class);
job.setMapOutputValueClass(NullWritable.class);
// 给 job 设置InputFormat
// InputFormat:描述 Map-Reduce job 的输入规范
// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
/**
* 设置输入路径
*/
FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");
// 将job提交到集群并等待它完成。
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
6.2.3 在网页上显示 网页访问量统计的top5
1、MVC框架说明
M:module,业务模型,用于提供数据
V:view,视图、用户界面,用于显示数据
C:controller,控制器、分发,用于分发请求
2、新建Web项目
之前已经使用MapReduce
分析出了结果,并将数据写入到了MySQL
中,现在如果要将数据显示到网页上就得新建一个web项目:
- 新建一个web项目-
Dynamic Web Project
Dynamic web module version
设置为3.0即可- 一直
next
,最后勾选Generate web.xml deployment descriptor
然后finish
即可。
3、项目结构
4、代码
1、实体类
package com.atschool.Entity;
/**
* webkpi表对应的实体类
*/
public class webkpi {
private String uri;
private Integer nums;
public webkpi() {
}
public webkpi(String uri, Integer nums) {
this.uri = uri;
this.nums = nums;
}
public String getUri() {
return uri;
}
public void setUri(String uri) {
this.uri = uri;
}
public Integer getNums() {
return nums;
}
public void setNums(Integer nums) {
this.nums = nums;
}
@Override
public String toString() {
return "uri=" + uri + ", nums=" + nums;
}
}
2、工具类
package com.atschool.DBUtils;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
/**
* 数据库工具类
*/
public class DBUtils {
private static String className = "com.mysql.cj.jdbc.Driver";
private static String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
private static String user = "root";
private static String password = "password";
// 获取连接
public static Connection getConnection() {
Connection connection = null;
try {
// 1.加载驱动
Class.forName(className);
// 2.建立连接(Connection)
if (connection == null) {
connection = DriverManager.getConnection(url, user, password);
}
} catch (ClassNotFoundException | SQLException e) {
e.printStackTrace();
}
return connection;
}
// 释放连接
public static void close(Connection connection, PreparedStatement preparedStatement, ResultSet resultSet) {
try {
if (connection != null) {
connection.close();
}
if (preparedStatement != null) {
preparedStatement.close();
}
if (resultSet != null) {
resultSet.close();
}
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
/**
* 对数据库进行增、删、改操作,直接调用即可,不需要连接数据库和释放资源。
*
* @param sql 需要执行的sql语句
* @param args 包含占位符信息的一个数组
*/
public static void action(String sql, Object... args) {
Connection conn = null;
PreparedStatement ps = null;
try {
// 连接数据库
conn = DBUtils.getConnection();
// 预编译sql语句,返回一个PrepareStatem实例
ps = conn.prepareStatement(sql);
// 填充占位符
for (int num = 0; num < args.length; num++) {
ps.setObject(num + 1, args[num]);
/**
* 注意:
* 这里你的args里面的数据是一定的,表示你要从1开始赋值,赋值一定的次数
* 但是你的args存储数据是从0开始的,如果统一从1开始就会报错。
*/
}
// 执行sql语句
ps.execute();
} catch (SQLException throwables) {
throwables.printStackTrace();
}
// 释放资源
DBUtils.close(conn, ps, null);
}
}
3、Dao层
package com.atschool.Dao;
import java.util.List;
/**
* DAO:
* Data Access Object访问数据信息的类和接口,包括了对数据的CRUD(Create、Retrival、Update、 Delete),而不包含任何业务相关的信息。
* 有时也称作:BaseDAO
* 作用:
* 为了实现功能的模块化,更有利于代码的维护和升级。
*
* uri的dao层
*/
public interface UriDao<T> {
// 查询页面访问量前五
List<T> queryTop();
}
package com.atschool.Dao.Impl;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.webkpi;
public class UriDaoImpl implements UriDao<webkpi> {
@Override
public List<webkpi> queryTop() {
String sql = "SELECT RIGHT(uri,12) AS uri,nums FROM `webkpi` ORDER BY nums DESC LIMIT 5";
Connection connection = DBUtils.getConnection();
PreparedStatement ps = null;
ResultSet resultSet = null;
ArrayList<webkpi> arrayList = null;
try {
// 预编译sql语句,返回一个PrepareStatem实例
ps = connection.prepareStatement(sql);
// 执行sql语句得到结果
resultSet = ps.executeQuery();
arrayList = new ArrayList<>();
while (resultSet.next()) {
String uri = resultSet.getString("uri");
int nums = resultSet.getInt("nums");
arrayList.add(new webkpi(uri, nums));
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
// 释放资源
DBUtils.close(connection, ps, resultSet);
}
return arrayList;
}
// 测试
public static void main(String[] args) {
List<webkpi> queryTop5 = new UriDaoImpl().queryTop();
for (webkpi webkpi : queryTop5) {
System.out.println(webkpi);
}
}
}
4、控制层
package com.atschool.Controller;
import java.io.IOException;
import java.util.List;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.UriDaoImpl;
import com.atschool.Entity.webkpi;
/**
* Servlet implementation class UriTopServlete
*/
@WebServlet("/UriTopServlete")
public class UriTopServlete extends HttpServlet {
private static final long serialVersionUID = 1L;
private UriDaoImpl uriDaoImpl = new UriDaoImpl();
// service 不管是什么请求都会接收
@Override
protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// 获取数据
List<webkpi> queryTop5 = uriDaoImpl.queryTop5();
// 转成json数据
String jsonString = JSONArray.toJSONString(queryTop5);
resp.getWriter().write(jsonString);
}
}
5、页面
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>网站访问量统计</title>
<!-- 引入 echarts.js -->
<script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
<script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
</head>
<body>
<!-- 为ECharts准备一个具备大小(宽高)的Dom -->
<div id="main" style="width: 700px; height: 400px;"></div>
<script type="text/javascript">
// dom加载后就会执行
var xData = new Array();
var yData = new Array();
$(function() {
$.ajax({
//请求方式
type : "POST",
//请求地址
url : "UriTopServlete", // 这里最好写全,以便页面换位置了,也可以访问
//规定返回的数据类型
dataType : "json",
// 由于不能让图表在没有获取到数据之前就显示出来,所以设置为同步操作
async : false,
//请求成功
success : function(result) {
console.log(result); // 打印到浏览器控制台
// 对获取到的数据进行解析
for (var i = 0; i < result.length; i++) {
xData.push(result[i].uri);
yData.push(result[i].nums);
}
},
//请求失败,包含具体的错误信息
error : function(e) {
console.log(e.status);
console.log(e.responseText);
}
});
// 基于准备好的dom,初始化echarts实例
var myChart = echarts.init(document.getElementById('main'));
// 指定图表的配置项和数据
var option = {
title : {
text : '网站访问量'
},
// 工具栏
tooltip : {},
legend : {
data : [ '访问量' ]
},
// x轴
xAxis : {
data : xData
},
// y轴
yAxis : {},
series : [ {
name : '访问量',
type : 'bar', // 图的类型,bar-柱状/条形图,line-线图,pie-饼状图
data : yData
} ]
};
// 使用刚指定的配置项和数据显示图表。
myChart.setOption(option);
});
</script>
</body>
</html>
6.2.4 在网页上显示 独立Ip访问量统计的Top10
说明:
- 前面实现了
网页访问量Top5
的显示,所以这里只需要在其基础上添上几笔
1、实体类
// 由于数据差别不是很大,所以用的还是前面的webkpi类
2、工具类
// 前面写了这里就不写了
3、Dao层
package com.atschool.Dao.Impl;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.webkpi;
public class AloneIpDaoImpl implements UriDao {
@Override
public List<webkpi> queryTop() {
String sql = "SELECT * FROM alone_ip_kpi ORDER BY count DESC LIMIT 10";
Connection connection = DBUtils.getConnection();
PreparedStatement ps = null;
ResultSet resultSet = null;
ArrayList<webkpi> arrayList = null;
try {
// 预编译sql语句,返回一个PrepareStatem实例
ps = connection.prepareStatement(sql);
// 执行sql语句得到结果
resultSet = ps.executeQuery();
arrayList = new ArrayList<>();
while (resultSet.next()) {
String uri = resultSet.getString("uri");
int count = resultSet.getInt("count");
arrayList.add(new webkpi(uri, count));
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
// 释放资源
DBUtils.close(connection, ps, resultSet);
}
return arrayList;
}
// 测试
public static void main(String[] args) {
List<webkpi> queryTop10 = new AloneIpDaoImpl().queryTop();
for (webkpi webkpi : queryTop10) {
System.out.println(webkpi);
}
}
}
4、控制层
package com.atschool.Controller;
import java.io.IOException;
import java.util.List;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.AloneIpDaoImpl;
import com.atschool.Entity.webkpi;
/**
* Servlet implementation class AloneIpTopServlet
*/
@WebServlet("/AloneIpTopServlet")
public class AloneIpTopServlet extends HttpServlet {
private static final long serialVersionUID = 1L;
private AloneIpDaoImpl aloneIpDaoImpl = new AloneIpDaoImpl();
// service 不管是什么请求都会接收
@Override
protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// 获取数据
List<webkpi> queryTop10 = aloneIpDaoImpl.queryTop();
// 转成json数据
String jsonString = JSONArray.toJSONString(queryTop10);
resp.getWriter().write(jsonString);
}
}
5、页面
// 这里只需要修改一下前面写的 index.html 页面的,这里使用饼状图看看
6.2.5 每天最高访问量
1、MapReduce统计
package com.atSchool.WebLog.EveryDayTop1;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.atSchool.WebLog.AloneIP.LogEntity;
import com.atSchool.utils.HDFSUtils;
public class EveryDayTopJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
new ToolRunner().run(new EveryDayTopJob(), null);
}
@Override
public int run(String[] args) throws Exception {
// 获取Job
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
Job job = Job.getInstance(configuration);
// 设置需要运行的任务
job.setJarByClass(EveryDayTopJob.class);
// 告诉job Map和Reduce在哪
job.setMapperClass(EveryDayTopMapper.class);
job.setReducerClass(EveryDayTopReduce.class);
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 告诉job Reduce输出的key和value的数据类型的是什么
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 告诉job输入和输出的路径
FileInputFormat.addInputPath(job, new Path("/web.log"));
/**
* 因为输出的文件不允许存在,所以需要处理一下
*/
FileSystem fileSystem = HDFSUtils.getFileSystem();
Path path = new Path("/MapReduceOut");
if (fileSystem.exists(path)) {
fileSystem.delete(path, true);
System.out.println("删除成功");
}
FileOutputFormat.setOutputPath(job, path);
// 提交任务
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
class EveryDayTopMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outKey = new Text();
private IntWritable outValue = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
LogEntity logEntity = new LogEntity(value.toString());
if (logEntity.isValid() == true) {
String date = logEntity.getDate();
String cs_uri = logEntity.getCs_uri_stem();
// 由于静态资源不算页面访问量,所以得进行过滤
if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
outKey.set(date + "\t" + cs_uri);
context.write(outKey, outValue);
// System.out.println("mapper-out:" + outKey.toString() + "\t");
}
}
}
}
class EveryDayTopReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable outValue = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> value,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable intWritable : value) {
sum += intWritable.get();
}
outValue.set(sum);
context.write(key, outValue);
}
}
2、写出到MySQL中
package com.atSchool.WebLog.EveryDayTop1;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.hadoop.mapred.lib.db.DBWritable;
/**
* 每天最高访问量输出到MySQL对应的日志类
*/
public class EveryDayTopWritable implements DBWritable {
private String date;
private String uri;
private Integer count;
public EveryDayTopWritable() {
}
public EveryDayTopWritable(String line) {
String[] split = line.split("\t");
this.date = split[0];
this.uri = split[1];
this.count = Integer.valueOf(split[2]);
}
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public String getUri() {
return uri;
}
public void setUri(String uri) {
this.uri = uri;
}
public Integer getCounter() {
return count;
}
public void setCounter(Integer counter) {
this.count = counter;
}
@Override
public void write(PreparedStatement statement) throws SQLException {
statement.setString(1, this.date);
statement.setString(2, this.uri);
statement.setInt(3, this.count);
}
@Override
public void readFields(ResultSet resultSet) throws SQLException {
this.date = resultSet.getString(1);
this.uri = resultSet.getString(2);
this.count = resultSet.getInt(3);
}
@Override
public String toString() {
return "date=" + date + "uri=" + uri + ", counter=" + count;
}
}
package com.atSchool.WebLog.EveryDayTop1;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.hadoop.mapred.lib.db.DBWritable;
/**
* 每天最高访问量输出到MySQL对应的日志类
*/
public class EveryDayTopWritable implements DBWritable {
private String date;
private String uri;
private Integer count;
public EveryDayTopWritable() {
}
public EveryDayTopWritable(String line) {
String[] split = line.split("\t");
this.date = split[0];
this.uri = split[1];
this.count = Integer.valueOf(split[2]);
}
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public String getUri() {
return uri;
}
public void setUri(String uri) {
this.uri = uri;
}
public Integer getCounter() {
return count;
}
public void setCounter(Integer counter) {
this.count = counter;
}
@Override
public void write(PreparedStatement statement) throws SQLException {
statement.setString(1, this.date);
statement.setString(2, this.uri);
statement.setInt(3, this.count);
}
@Override
public void readFields(ResultSet resultSet) throws SQLException {
this.date = resultSet.getString(1);
this.uri = resultSet.getString(2);
this.count = resultSet.getInt(3);
}
@Override
public String toString() {
return "date=" + date + "uri=" + uri + ", counter=" + count;
}
}
package com.atSchool.WebLog.EveryDayTop1;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* 每天最高访问量
* 读取HDFS的文件输出到MySQL中
*/
class MRToMysqlMapper extends Mapper<LongWritable, Text, EveryDayTopWritable, NullWritable> {
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, EveryDayTopWritable, NullWritable>.Context context)
throws IOException, InterruptedException {
EveryDayTopWritable logWritable = new EveryDayTopWritable(value.toString());
context.write(logWritable, NullWritable.get());
}
}
public class MRToMysqlJob extends Configured implements Tool {
private String className = "com.mysql.cj.jdbc.Driver";
private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
private String user = "root";
private String password = "password";
public static void main(String[] args) throws Exception {
new ToolRunner().run(new MRToMysqlJob(), null);
}
@Override
public int run(String[] args) throws Exception {
/**
* 获取job:一个工作对象
*/
// 创建一个 配置 对象
Configuration configuration = new Configuration();
// 设置 name属性 的值。
// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
// 名称将在配置前进行修剪。
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
// 在configuration中设置数据库访问相关字段。
DBConfiguration.configureDB(configuration, className, url, user, password);
// 根据配置文件创建一个job
Job job = Job.getInstance(configuration);
/**
* 设置job
*/
/**
* setOutput(Job job, String tableName, String... fieldNames) throws IOException
* 用适当的输出设置初始化作业的缩减部分
* 参数:
* job:The job
* tableName:要插入数据的表
* fieldNames:表中的字段名。
*/
DBOutputFormat.setOutput(job, "everyday_top", new String[] { "date","uri", "count" });
// 通过查找给定类的来源来设置Jar。
job.setJarByClass(MRToMysqlJob.class);
// 给 job 设置 Map和Reduce
job.setMapperClass(MRToMysqlMapper.class);
job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(EveryDayTopWritable.class);
job.setMapOutputValueClass(NullWritable.class);
// 给 job 设置InputFormat
// InputFormat:描述 Map-Reduce job 的输入规范
// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
/**
* 设置输入路径
*/
FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");
// 将job提交到集群并等待它完成。
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
3、显示到页面中
这里和前面一样,只需要添加一些类就可以了
1、实体类
package com.atschool.Entity;
/**
* everyday_top表对应的实体类
*/
public class everyday_top {
private String date;
private String uri;
private Integer count;
public everyday_top() {
}
public everyday_top(String date, String uri, Integer count) {
this.date = date;
this.uri = uri;
this.count = count;
}
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public String getUri() {
return uri;
}
public void setUri(String uri) {
this.uri = uri;
}
public Integer getCount() {
return count;
}
public void setCount(Integer count) {
this.count = count;
}
@Override
public String toString() {
return "date=" + date +", uri=" + uri + ", count=" + count;
}
}
2、dao层
package com.atschool.Dao.Impl;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.everyday_top;
public class EverydayTopImpl implements UriDao<everyday_top> {
@Override
public List<everyday_top> queryTop() {
String sql = "SELECT date,uri,MAX(count) AS count FROM everyday_top GROUP BY date";
Connection connection = DBUtils.getConnection();
PreparedStatement ps = null;
ResultSet resultSet = null;
ArrayList<everyday_top> arrayList = null;
try {
// 预编译sql语句,返回一个PrepareStatem实例
ps = connection.prepareStatement(sql);
// 执行sql语句得到结果
resultSet = ps.executeQuery();
arrayList = new ArrayList<>();
while (resultSet.next()) {
String date = resultSet.getString("date");
String uri = resultSet.getString("uri");
int count = resultSet.getInt("count");
arrayList.add(new everyday_top(date, uri, count));
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
// 释放资源
DBUtils.close(connection, ps, resultSet);
}
return arrayList;
}
// 测试
public static void main(String[] args) {
List<everyday_top> queryTop5 = new EverydayTopImpl().queryTop();
for (everyday_top webkpi : queryTop5) {
System.out.println(webkpi);
}
}
}
3、控制层
package com.atschool.Controller;
import java.io.IOException;
import java.util.List;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.EverydayTopImpl;
import com.atschool.Entity.everyday_top;
@WebServlet("/EveryDayTopServlete")
public class EveryDayTopServlete extends HttpServlet {
private static final long serialVersionUID = 1L;
private EverydayTopImpl everydayTopImpl = new EverydayTopImpl();
// service 不管是什么请求都会接收
@Override
protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// 获取数据
List<everyday_top> queryTop5 = everydayTopImpl.queryTop();
// 转成json数据
String jsonString = JSONArray.toJSONString(queryTop5);
resp.getWriter().write(jsonString);
}
}
4、页面
// 这里和前面的页面一样,这里使用折线图效果应该会更好
6.2.6 统计用户每小时的页面访问量(PV,page view)
说明:
- 统计24小时类各个小时 页面的访问量
1、MR统计每个小时的访问量
package com.atSchool.WebLog.Time;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.atSchool.WebLog.AloneIP.LogEntity;
import com.atSchool.utils.HDFSUtils;
public class TimeJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
new ToolRunner().run(new TimeJob(), null);
}
@Override
public int run(String[] args) throws Exception {
// 获取Job
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
Job job = Job.getInstance(configuration);
// 设置需要运行的任务
job.setJarByClass(TimeJob.class);
// 告诉job Map和Reduce在哪
job.setMapperClass(TimeMapper.class);
job.setReducerClass(TimeReduce.class);
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 告诉job Reduce输出的key和value的数据类型的是什么
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 告诉job输入和输出的路径
FileInputFormat.addInputPath(job, new Path("/web.log"));
/**
* 因为输出的文件不允许存在,所以需要处理一下
*/
FileSystem fileSystem = HDFSUtils.getFileSystem();
Path path = new Path("/MapReduceOut");
if (fileSystem.exists(path)) {
fileSystem.delete(path, true);
System.out.println("删除成功");
}
FileOutputFormat.setOutputPath(job, path);
// 提交任务
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
class TimeMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outKey = new Text();
private IntWritable outValue = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
LogEntity logEntity = new LogEntity(value.toString());
if (logEntity.isValid() == true) {
String time = logEntity.getTime().substring(0, 2) + ":00";
String cs_uri = logEntity.getCs_uri_stem();
// 由于静态资源不算页面访问量,所以得进行过滤
if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
outKey.set(time);
context.write(outKey, outValue);
// System.out.println("mapper-out:" + outKey.toString() + "\t");
}
}
}
}
class TimeReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> value,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable intWritable : value) {
sum += intWritable.get();
}
context.write(key, new IntWritable(sum));
}
}
2、录入到MySQL中
- 建表
- 创建序列化类
- MR写出数据
package com.atSchool.WebLog.Time;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.hadoop.mapred.lib.db.DBWritable;
/**
* 按小时统计访问量 输出到MySQL对应的日志类
*/
public class TimeWritable implements DBWritable {
private String visit_time;
private int nums;
public TimeWritable() {
}
public TimeWritable(String line) {
String[] split = line.split("\t");
this.visit_time = split[0];
this.nums = Integer.valueOf(split[1]);
}
public String getVisit_time() {
return visit_time;
}
public void setVisit_time(String visit_time) {
this.visit_time = visit_time;
}
public int getNums() {
return nums;
}
public void setNums(int nums) {
this.nums = nums;
}
@Override
public void write(PreparedStatement statement) throws SQLException {
statement.setString(1, visit_time);
statement.setInt(2, nums);
}
@Override
public void readFields(ResultSet resultSet) throws SQLException {
this.visit_time = resultSet.getString(1);
this.nums = resultSet.getInt(2);
}
@Override
public String toString() {
return "visit_time=" + visit_time + ", nums=" + nums;
}
}
package com.atSchool.WebLog.Time;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* 按小时统计访问量
* 读取HDFS的文件输出到MySQL中
*/
class MRToMysqlMapper extends Mapper<LongWritable, Text, TimeWritable, NullWritable> {
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, TimeWritable, NullWritable>.Context context)
throws IOException, InterruptedException {
TimeWritable logWritable = new TimeWritable(value.toString());
context.write(logWritable, NullWritable.get());
}
}
public class MRToMysqlJob extends Configured implements Tool {
private String className = "com.mysql.cj.jdbc.Driver";
private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
private String user = "root";
private String password = "password";
public static void main(String[] args) throws Exception {
new ToolRunner().run(new MRToMysqlJob(), null);
}
@Override
public int run(String[] args) throws Exception {
/**
* 获取job:一个工作对象
*/
// 创建一个 配置 对象
Configuration configuration = new Configuration();
// 设置 name属性 的值。
// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
// 名称将在配置前进行修剪。
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
// 在configuration中设置数据库访问相关字段。
DBConfiguration.configureDB(configuration, className, url, user, password);
// 根据配置文件创建一个job
Job job = Job.getInstance(configuration);
/**
* 设置job
*/
/**
* setOutput(Job job, String tableName, String... fieldNames) throws IOException
* 用适当的输出设置初始化作业的缩减部分
* 参数:
* job:The job
* tableName:要插入数据的表
* fieldNames:表中的字段名。
*/
DBOutputFormat.setOutput(job, "time_count", new String[] { "visit_time","nums" });
// 通过查找给定类的来源来设置Jar。
job.setJarByClass(MRToMysqlJob.class);
// 给 job 设置 Map和Reduce
job.setMapperClass(MRToMysqlMapper.class);
job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(TimeWritable.class);
job.setMapOutputValueClass(NullWritable.class);
// 给 job 设置InputFormat
// InputFormat:描述 Map-Reduce job 的输入规范
// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
/**
* 设置输入路径
*/
FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");
// 将job提交到集群并等待它完成。
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
3、显示到页面中
1、实体类
package com.atschool.Entity;
public class time_count {
private String visit_time;
private int nums;
public time_count() {
}
public time_count(String visit_time, int nums) {
this.visit_time = visit_time;
this.nums = nums;
}
public String getVisit_time() {
return visit_time;
}
public void setVisit_time(String visit_time) {
this.visit_time = visit_time;
}
public int getNums() {
return nums;
}
public void setNums(int nums) {
this.nums = nums;
}
@Override
public String toString() {
return "visit_time=" + visit_time + ", nums=" + nums;
}
}
2、dao层
package com.atschool.Dao.Impl;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.time_count;
public class TimeCountDaoImpl implements UriDao<time_count> {
@Override
public List<time_count> queryTop() {
String sql = "SELECT * FROM `time_count`";
Connection connection = DBUtils.getConnection();
PreparedStatement ps = null;
ResultSet resultSet = null;
ArrayList<time_count> arrayList = null;
try {
// 预编译sql语句,返回一个PrepareStatem实例
ps = connection.prepareStatement(sql);
// 执行sql语句得到结果
resultSet = ps.executeQuery();
arrayList = new ArrayList<>();
while (resultSet.next()) {
String visit_time = resultSet.getString("visit_time");
int nums = resultSet.getInt("nums");
arrayList.add(new time_count(visit_time, nums));
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
// 释放资源
DBUtils.close(connection, ps, resultSet);
}
return arrayList;
}
// 测试
public static void main(String[] args) {
List<time_count> queryTop5 = new TimeCountDaoImpl().queryTop();
for (time_count time_count : queryTop5) {
System.out.println(time_count);
}
}
}
3、控制层
package com.atschool.Controller;
import java.io.IOException;
import java.util.List;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.TimeCountDaoImpl;
import com.atschool.Entity.time_count;
@WebServlet("/TimeCountServlete")
public class TimeCountServlete extends HttpServlet {
private static final long serialVersionUID = 1L;
private TimeCountDaoImpl timeCountDaoImpl = new TimeCountDaoImpl();
// service 不管是什么请求都会接收
@Override
protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// 获取数据
List<time_count> queryTop = timeCountDaoImpl.queryTop();
// 转成json数据
String jsonString = JSONArray.toJSONString(queryTop);
resp.getWriter().write(jsonString);
}
}
4、页面
// 和前面一样,使用折线图效果会更好
6.2.7 统计用户的访问设备
说明:
- 统计用户访问页面使用的设备
1、MR统计各个访问记录使用的设备,并统计相同设备的数量
package com.atSchool.WebLog.EquipmentPV;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.atSchool.WebLog.AloneIP.LogEntity;
import com.atSchool.utils.HDFSUtils;
public class EquipmentPvJob extends Configured implements Tool {
public static void main(String[] args) throws Exception {
new ToolRunner().run(new EquipmentPvJob(), null);
}
@Override
public int run(String[] args) throws Exception {
// 获取Job
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
Job job = Job.getInstance(configuration);
// 设置需要运行的任务
job.setJarByClass(EquipmentPvJob.class);
// 告诉job Map和Reduce在哪
job.setMapperClass(EquipmentPvMapper.class);
job.setReducerClass(EquipmentPvReduce.class);
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 告诉job Reduce输出的key和value的数据类型的是什么
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 告诉job输入和输出的路径
FileInputFormat.addInputPath(job, new Path("/web.log"));
/**
* 因为输出的文件不允许存在,所以需要处理一下
*/
FileSystem fileSystem = HDFSUtils.getFileSystem();
Path path = new Path("/MapReduceOut");
if (fileSystem.exists(path)) {
fileSystem.delete(path, true);
System.out.println("删除成功");
}
FileOutputFormat.setOutputPath(job, path);
// 提交任务
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
class EquipmentPvMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outKey = new Text();
private IntWritable outValue = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
LogEntity logEntity = new LogEntity(value.toString());
if (logEntity.isValid() == true) {
String cs_User_Agent = logEntity.getCs_User_Agent();
String cs_uri = logEntity.getCs_uri_stem();
// 由于静态资源不算页面访问量,所以得进行过滤
if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
String[] split = cs_User_Agent.split("/");
if (!split[0].equals("-")) {
outKey.set(split[0]);
context.write(outKey, outValue);
}
}
}
}
}
class EquipmentPvReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> value,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable intWritable : value) {
sum += intWritable.get();
}
context.write(key, new IntWritable(sum));
}
}
2、MR录入到MySQL中
package com.atSchool.WebLog.EquipmentPV;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import org.apache.hadoop.mapred.lib.db.DBWritable;
/**
* 统计用户的访问设备 输出到MySQL对应的日志类
*/
public class EquipmentWritable implements DBWritable {
private String user_agent;
private int nums;
public EquipmentWritable() {
}
public EquipmentWritable(String line) {
String[] split = line.split("\t");
this.user_agent = split[0];
this.nums = Integer.valueOf(split[1]);
}
public String getuser_agent() {
return user_agent;
}
public void setuser_agent(String user_agent) {
this.user_agent = user_agent;
}
public int getNums() {
return nums;
}
public void setNums(int nums) {
this.nums = nums;
}
@Override
public void write(PreparedStatement statement) throws SQLException {
statement.setString(1, user_agent);
statement.setInt(2, nums);
}
@Override
public void readFields(ResultSet resultSet) throws SQLException {
this.user_agent = resultSet.getString(1);
this.nums = resultSet.getInt(2);
}
@Override
public String toString() {
return "user_agent=" + user_agent + ", nums=" + nums;
}
}
package com.atSchool.WebLog.EquipmentPV;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
/**
* 统计用户的访问设备
* 读取HDFS的文件输出到MySQL中
*/
class MRToMysqlMapper extends Mapper<LongWritable, Text, EquipmentWritable, NullWritable> {
@Override
protected void map(LongWritable key, Text value,
Mapper<LongWritable, Text, EquipmentWritable, NullWritable>.Context context)
throws IOException, InterruptedException {
EquipmentWritable logWritable = new EquipmentWritable(value.toString());
context.write(logWritable, NullWritable.get());
}
}
public class MRToMysqlJob extends Configured implements Tool {
private String className = "com.mysql.cj.jdbc.Driver";
private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
private String user = "root";
private String password = "password";
public static void main(String[] args) throws Exception {
new ToolRunner().run(new MRToMysqlJob(), null);
}
@Override
public int run(String[] args) throws Exception {
/**
* 获取job:一个工作对象
*/
// 创建一个 配置 对象
Configuration configuration = new Configuration();
// 设置 name属性 的值。
// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
// 名称将在配置前进行修剪。
configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
// 在configuration中设置数据库访问相关字段。
DBConfiguration.configureDB(configuration, className, url, user, password);
// 根据配置文件创建一个job
Job job = Job.getInstance(configuration);
/**
* 设置job
*/
/**
* setOutput(Job job, String tableName, String... fieldNames) throws IOException
* 用适当的输出设置初始化作业的缩减部分
* 参数:
* job:The job
* tableName:要插入数据的表
* fieldNames:表中的字段名。
*/
DBOutputFormat.setOutput(job, "equipment_pv", new String[] { "user_agent","nums" });
// 通过查找给定类的来源来设置Jar。
job.setJarByClass(MRToMysqlJob.class);
// 给 job 设置 Map和Reduce
job.setMapperClass(MRToMysqlMapper.class);
job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0
// 告诉job Map输出的key和value的数据类型的是什么
job.setMapOutputKeyClass(EquipmentWritable.class);
job.setMapOutputValueClass(NullWritable.class);
// 给 job 设置InputFormat
// InputFormat:描述 Map-Reduce job 的输入规范
// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
/**
* 设置输入路径
*/
FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");
// 将job提交到集群并等待它完成。
boolean waitForCompletion = job.waitForCompletion(true);
System.out.println(waitForCompletion ? "执行成功" : "执行失败");
return 0;
}
}
3、显示到页面中
1、实体类
package com.atschool.Entity;
public class equipment_pv {
private String user_agent;
private int nums;
public equipment_pv() {
}
public equipment_pv(String user_agent, int nums) {
this.user_agent = user_agent;
this.nums = nums;
}
public String getUser_agent() {
return user_agent;
}
public void setUser_agent(String user_agent) {
this.user_agent = user_agent;
}
public int getNums() {
return nums;
}
public void setNums(int nums) {
this.nums = nums;
}
@Override
public String toString() {
return "user_agent=" + user_agent + ", nums=" + nums;
}
}
2、dao层
package com.atschool.Dao.Impl;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.equipment_pv;
public class EquipmentPvDaoImpl implements UriDao<equipment_pv> {
@Override
public List<equipment_pv> queryTop() {
String sql = "SELECT * FROM `equipment_pv` ORDER BY nums DESC LIMIT 5";
Connection connection = DBUtils.getConnection();
PreparedStatement ps = null;
ResultSet resultSet = null;
ArrayList<equipment_pv> arrayList = null;
try {
// 预编译sql语句,返回一个PrepareStatem实例
ps = connection.prepareStatement(sql);
// 执行sql语句得到结果
resultSet = ps.executeQuery();
arrayList = new ArrayList<>();
while (resultSet.next()) {
String visit_time = resultSet.getString("user_agent");
int nums = resultSet.getInt("nums");
arrayList.add(new equipment_pv(visit_time, nums));
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
// 释放资源
DBUtils.close(connection, ps, resultSet);
}
return arrayList;
}
// 测试
public static void main(String[] args) {
List<equipment_pv> queryTop5 = new EquipmentPvDaoImpl().queryTop();
for (equipment_pv equipment_pv : queryTop5) {
System.out.println(equipment_pv);
}
}
}
3、控制层
package com.atschool.Controller;
import java.io.IOException;
import java.util.List;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.EquipmentPvDaoImpl;
import com.atschool.Entity.equipment_pv;
@WebServlet("/EquipmentPvServlet")
public class EquipmentPvServlete extends HttpServlet {
private static final long serialVersionUID = 1L;
private EquipmentPvDaoImpl equipmentPvDaoImpl = new EquipmentPvDaoImpl();
// service 不管是什么请求都会接收
@Override
protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// 获取数据
List<equipment_pv> queryTop5 = equipmentPvDaoImpl.queryTop();
// 转成json数据
String jsonString = JSONArray.toJSONString(queryTop5);
resp.getWriter().write(jsonString);
}
}
4、页面
// 和前面一样,使用饼图效果更佳
7、使用网页模板
使用技巧:
- 不要一次性将整个网页全部粘贴到项目中,最开始只需要将必要的
css/fonts/js/img
等文件复制到项目中,然后将index.html
复制到项目中,从主页面开始整合,将需要的留下,不需要的删除。然后根据自己的需求逐步将其他的页面复制到自己的项目中使用。 - 修改页面时,不要盲目寻找,利用浏览器的
开发工具
,找到需要修改的位置的关键词
,再到页面中利用Ctrl+F
寻找位置。 - 如果需要使用的技术和模板使用的技术不一致,则可以将自己写的
js
封装到一个文件中(让js
和页面分离:便于修改和维护),然后再页面末尾使用<Script>
标签引用。例如前面使用Echars图表
1、customer5.js
/**
* 浏览器终端统计Top5
*/
// dom加载后就会执行
var array = new Array();
$(function() {
$.ajax({
// 请求方式
type : "POST",
// 请求地址
url : "http://localhost:8080/WebKpi/EquipmentPvServlet",
// 规定返回的数据类型
dataType : "json",
// 由于不能让图表在没有获取到数据之前就显示出来,所以设置为同步操作
async : false,
// 请求成功
success : function(result) {
console.log(result); // 打印到浏览器控制台
// 对获取到的数据进行解析
for (var i = 0; i < result.length; i++) {
var object = new Object();
object.value=result[i].nums;
object.name=result[i].user_agent;
array.push(object);
}
},
// 请求失败,包含具体的错误信息
error : function(e) {
console.log(e.status);
console.log(e.responseText);
}
});
// 基于准备好的dom,初始化echarts实例
var myChart = echarts.init(document.getElementById('main'));
// 指定图表的配置项和数据
var option = {
series : [ {
name : '数量',
type : 'pie', // 图的类型,bar-柱状/条形图,line-线图,pie-饼状图
radius: '55%',
data : array
} ]
};
// 使用刚指定的配置项和数据显示图表。
myChart.setOption(option);
});
2、页面中
(部分)
... ...
<!-- jquery============================================ -->
<script src="js/vendor/jquery-1.12.4.min.js"></script>
<!-- customer5浏览器终端统计Top5 -->
<script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
<script src="js2/customer5.js"></script>
... ...
- 服务端的代码编写好后不要一股脑的和页面进行整合显示,可以在浏览器中访问一下
Servlete
看是否拿到了数据。最好是保证每一步都准确无误后再进行整合。