由于文章太长,其余部分在我的其他几篇博客中!

  • 第一部分:Hadoop介绍及安装
  • 第二部分:HDFS
  • 第三部分:MapReduce


6、基于Web日志数据处理的网站KPI分析系统项目

分析资料

6.1 项目开发流程

hadoophdfs案例 hadoop案例分析_hadoop

6.2 项目任务

6.2.1 合并小文件

说明:

  • 由于在网络上挖掘下来的数据可能不止一个文件,而对每一个小文件进行MapReduce分析则需要分成等数量的块,很可能会对mapreduce造成很大的压力,所以在之前根据需求有必要进行一个简单的文件合并

代码:

FirstStep.java
package com.atSchool.WebLog;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

/**
 * 第一步:
 * 		合并小文件
 */
public class FirstStep {
	private InputStream inputStream; // 输入流,用于读入数据
	private OutputStream outputStream; // 输出流,用于写出数据
	private String localPath; // 本地路径,指向需要合并的小文件
	private String hdfsPath; // hdfs路径,指向合并后输出结果的路径

	public FirstStep(String localPath, String hdfsPath) {
		this.localPath = localPath;
		this.hdfsPath = hdfsPath;
	}

	/**
	 * 合并步骤:
	 * 	1.根据用户定义的参数设置本地目录和HDFS的目标文件
	 * 	2.创建一个输出流写入到HDFS文件
	 * 	3.遍历本地目录中的每个文件,打开文件,并读取文件内容,将文件的内容写到HDFS文件中。
	 */
	private void startWrite(File file) throws IOException {
		if (file.isFile()) { // 如果是一个文件
			int len = 0;
			byte[] buffer = new byte[1024];
			inputStream = new BufferedInputStream(new FileInputStream(file));
			while ((len = inputStream.read(buffer)) > 0) { // 如果返回的实际读取的字节数>0就一直读取
				outputStream.write(buffer, 0, len); // 写出到hdfs上
			}
		} else if (file.isDirectory()) { // 如果是一个目录
			File[] listFiles = file.listFiles();
			for (File file2 : listFiles) {
				startWrite(file2);
			}
		}
	}

	/**
	 * 关闭输入输出流
	 * @throws IOException
	 */
	private void close() throws IOException {
		if (inputStream != null) {
			inputStream.close();
		}
		if (outputStream != null) {
			outputStream.close();
		}
	}

	/**
	 * 开始合并
	 * @throws IOException 
	 */
	private void start() throws IOException {
		File file = new File(localPath); // 本地文件
		Configuration conf = new Configuration();
		conf.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
		FileSystem fs = FileSystem.get(conf);

		// 在指定路径下打开HDFS的输出流
		outputStream = fs.create(new Path(hdfsPath));

		// 开始将本地的文件写入到hdfs中
		startWrite(file);

		System.out.println("合并成功!");

		// 关闭所有的输入输出流
		close();
	}

	// 测试
	public static void main(String[] args) throws IOException {
		new FirstStep("C:\\Users\\USER_WU\\Desktop\\测试", "/合并小文件测试.txt").start();
	}
}

6.2.2 完成网站KPI指标的统计

说明:

  • KPI:关键业绩指标
  • 我们可以通过KPI看出这个网站的经营状况
  • 这里我们统计PV(PageView): 页面访问量统计IP: 页面独立IP的访问量统计
1、页面访问量统计

代码:

LogEntity.java
package com.atSchool.WebLog;

/**
 * 日志解析类
 */
public class LogEntity {
	private String date; // 请求的日期
	private String time; // 请求的时间
	private String c_ip; // 访问用户的 IP 地址或者用户使用的代理服务器 IP 地址
	private String cs_username; // 用户名,由于通常用户没有进行注册,故一般都为占位符“-”
	private String s_ip; // 客户端访问网站的IP 地址
	private String s_port; // 客户端访问网站的端口号
	private String cs_method; // 访问者的请求命令,常见的方法有三种,分别是 GET、POST 和 HEAD
	private String cs_uri_stem; // 访问者请求的资源,即相对于服务器上根目录的途径
	private String cs_uri_query; // 协议类型,由于通常使用HTTP协议,故一般使用占位符“-”
	private String sc_status; // 服务器返回的状态代码。一般而言,以2开头的状态代码表示成功,以3开头表示由于各种不同的原因用户请求被重定向到了其他位置,以4开头表示用户端存在某种错误,以5开头表示服务器遇到了某个错误;
	private String cs_User_Agent; // 附加信息,包括浏览器类型、操作系统等
	private boolean isValid; // 判断值是否有效

	public LogEntity() {
	}

	public LogEntity(String line) {
		String[] split = line.split(" +");
		if ((line.charAt(0) != '#') && (split.length == 11)) {
			this.date = split[0];
			this.time = split[1];
			this.c_ip = split[2];
			this.cs_username = split[3];
			this.s_ip = split[4];
			this.s_port = split[5];
			this.cs_method = split[6];
			this.cs_uri_stem = split[7];
			this.cs_uri_query = split[8];
			this.sc_status = split[9];
			this.cs_User_Agent = split[10];
			this.isValid = true;
		} else {
			this.isValid = false;
		}
	}

	public String getDate() {
		return date;
	}

	public void setDate(String date) {
		this.date = date;
	}

	public String getTime() {
		return time;
	}

	public void setTime(String time) {
		this.time = time;
	}

	public String getC_ip() {
		return c_ip;
	}

	public void setC_ip(String c_ip) {
		this.c_ip = c_ip;
	}

	public String getCs_username() {
		return cs_username;
	}

	public void setCs_username(String cs_username) {
		this.cs_username = cs_username;
	}

	public String getS_ip() {
		return s_ip;
	}

	public void setS_ip(String s_ip) {
		this.s_ip = s_ip;
	}

	public String getS_port() {
		return s_port;
	}

	public void setS_port(String s_port) {
		this.s_port = s_port;
	}

	public String getCs_method() {
		return cs_method;
	}

	public void setCs_method(String cs_method) {
		this.cs_method = cs_method;
	}

	public String getCs_uri_stem() {
		return cs_uri_stem;
	}

	public void setCs_uri_stem(String cs_uri_stem) {
		this.cs_uri_stem = cs_uri_stem;
	}

	public String getCs_uri_query() {
		return cs_uri_query;
	}

	public void setCs_uri_query(String cs_uri_query) {
		this.cs_uri_query = cs_uri_query;
	}

	public String getSc_status() {
		return sc_status;
	}

	public void setSc_status(String sc_status) {
		this.sc_status = sc_status;
	}

	public String getCs_User_Agent() {
		return cs_User_Agent;
	}

	public void setCs_User_Agent(String cs_User_Agent) {
		this.cs_User_Agent = cs_User_Agent;
	}

	public boolean isValid() {
		return isValid;
	}

	public void setValid(boolean isValid) {
		this.isValid = isValid;
	}

	@Override
	public String toString() {
		return "date=" + date + ", time=" + time + ", c_ip=" + c_ip + ", cs_username=" + cs_username + ", s_ip=" + s_ip
				+ ", s_port=" + s_port + ", cs_method=" + cs_method + ", cs_uri_stem=" + cs_uri_stem + ", cs_uri_query="
				+ cs_uri_query + ", sc_status=" + sc_status + ", cs_User_Agent=" + cs_User_Agent + ", isValid="
				+ isValid;
	}
}
WebVisitsNumsJob.java / WebVisitsNumsMapper.java / WebVisitsNumsReduce.java
package com.atSchool.WebLog;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import com.atSchool.utils.HDFSUtils;

/**
 * 页面访问量
 */
public class WebVisitsNumsJob extends Configured implements Tool {
	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new WebVisitsNumsJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		// 获取Job
		Configuration configuration = new Configuration();
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
		Job job = Job.getInstance(configuration);

		// 设置需要运行的任务
		job.setJarByClass(WebVisitsNumsJob.class);

		// 告诉job Map和Reduce在哪
		job.setMapperClass(WebVisitsNumsMapper.class);
		job.setReducerClass(WebVisitsNumsReduce.class);

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);

		// 告诉job Reduce输出的key和value的数据类型的是什么
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		// 告诉job输入和输出的路径
		FileInputFormat.addInputPath(job, new Path("/web.log"));
		/**
		 * 因为输出的文件不允许存在,所以需要处理一下
		 */
		FileSystem fileSystem = HDFSUtils.getFileSystem();
		Path path = new Path("/MapReduceOut");
		if (fileSystem.exists(path)) {
			fileSystem.delete(path, true);
			System.out.println("删除成功");
		}
		FileOutputFormat.setOutputPath(job, path);

		// 提交任务
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}

class WebVisitsNumsMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	private Text outKey = new Text();
	private IntWritable outValue = new IntWritable(1);

	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		LogEntity logEntity = new LogEntity(value.toString());

		// 由于静态资源不算页面访问量,所以得进行过滤
		if (logEntity.isValid() == true) {
			String cs_uri = logEntity.getCs_uri_stem();
			if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
				outKey.set(cs_uri);
				context.write(outKey, outValue);
			}
		}
	}
}

class WebVisitsNumsReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	private IntWritable outValue = new IntWritable(1);

	@Override
	protected void reduce(Text key, Iterable<IntWritable> value,
			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
		int sum = 0;
		for (IntWritable intWritable : value) {
			sum += intWritable.get();
		}
		outValue.set(sum);
		context.write(key, outValue);
	}
}
2、页面访问量统计数据写到MySQL数据库中

代码:

LogWritable.java
package com.atSchool.WebLog;

import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.mapred.lib.db.DBWritable;

/**
 * 输出到MySQL对应的日志类
 */
public class LogWritable implements DBWritable {
	private String uri;
	private Integer nums;

	public LogWritable() {
	}

	public LogWritable(String line) {
		String[] split = line.split("\t");
		if (split.length == 2) {
			this.uri = split[0];
			this.nums = Integer.valueOf(split[1]);
		}
	}

	public String getUri() {
		return uri;
	}

	public void setUri(String uri) {
		this.uri = uri;
	}

	public Integer getNums() {
		return nums;
	}

	public void setNums(Integer nums) {
		this.nums = nums;
	}

	@Override
	public void write(PreparedStatement statement) throws SQLException {
		statement.setString(1, uri);
		statement.setInt(2, nums);
	}

	@Override
	public void readFields(ResultSet resultSet) throws SQLException {
		this.uri = resultSet.getString(1);
		this.nums = resultSet.getInt(2);
	}

	@Override
	public String toString() {
		return "uri=" + uri + ", nums=" + nums;
	}
}
MRToMysqlMapper.java / MRToMysqlJob.java
package com.atSchool.WebLog;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * 读取HDFS的文件输出到MySQL中
 */
class MRToMysqlMapper extends Mapper<LongWritable, Text, LogWritable, NullWritable> {
	@Override
	protected void map(LongWritable key, Text value,
			Mapper<LongWritable, Text, LogWritable, NullWritable>.Context context)
			throws IOException, InterruptedException {
		LogWritable logWritable = new LogWritable(value.toString());
		context.write(logWritable, NullWritable.get());
	}
}

public class MRToMysqlJob extends Configured implements Tool {
	private String className = "com.mysql.cj.jdbc.Driver";
	private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
	private String user = "root";
	private String password = "password";

	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new MRToMysqlJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		/**
		 * 获取job:一个工作对象
		 */
		// 创建一个 配置 对象
		Configuration configuration = new Configuration();

		// 设置 name属性 的值。
		// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
		// 名称将在配置前进行修剪。
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");

		// 在configuration中设置数据库访问相关字段。
		DBConfiguration.configureDB(configuration, className, url, user, password);

		// 根据配置文件创建一个job
		Job job = Job.getInstance(configuration);

		/**
		 * 设置job
		 */
		/**
		 * setOutput(Job job, String tableName, String... fieldNames) throws IOException
		 * 用适当的输出设置初始化作业的缩减部分
		 * 参数:
		 * job:The job
		 * tableName:要插入数据的表
		 * fieldNames:表中的字段名。
		 */
		DBOutputFormat.setOutput(job, "webkpi", new String[] { "uri", "nums" });

		// 通过查找给定类的来源来设置Jar。
		job.setJarByClass(MRToMysqlJob.class);

		// 给 job 设置 Map和Reduce
		job.setMapperClass(MRToMysqlMapper.class);
		job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(LogWritable.class);
		job.setMapOutputValueClass(NullWritable.class);

		// 给 job 设置InputFormat
		// InputFormat:描述 Map-Reduce job 的输入规范
		// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
		job.setInputFormatClass(TextInputFormat.class);
		job.setOutputFormatClass(DBOutputFormat.class);

		/**
		 * 设置输入路径
		 */
		FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");

		// 将job提交到集群并等待它完成。
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}
3、页面独立IP的访问量统计

说明:

  • 独立IP,即统计每个IP的访问量

代码:

LogEntity.java
package com.atSchool.WebLog.AloneIP;

/**
 * 日志解析类
 */
public class LogEntity {
	private String date; // 请求的日期
	private String time; // 请求的时间
	private String c_ip; // 访问用户的 IP 地址或者用户使用的代理服务器 IP 地址
	private String cs_username; // 用户名,由于通常用户没有进行注册,故一般都为占位符“-”
	private String s_ip; // 客户端访问网站的IP 地址
	private String s_port; // 客户端访问网站的端口号
	private String cs_method; // 访问者的请求命令,常见的方法有三种,分别是 GET、POST 和 HEAD
	private String cs_uri_stem; // 访问者请求的资源,即相对于服务器上根目录的途径
	private String cs_uri_query; // 协议类型,由于通常使用HTTP协议,故一般使用占位符“-”
	private String sc_status; // 服务器返回的状态代码。一般而言,以2开头的状态代码表示成功,以3开头表示由于各种不同的原因用户请求被重定向到了其他位置,以4开头表示用户端存在某种错误,以5开头表示服务器遇到了某个错误;
	private String cs_User_Agent; // 附加信息,包括浏览器类型、操作系统等
	private boolean isValid; // 判断值是否有效

	public LogEntity() {
	}

	public LogEntity(String line) {
		String[] split = line.split(" +");
		if ((line.charAt(0) != '#') && (split.length == 11)) {
			this.date = split[0];
			this.time = split[1];
			this.c_ip = split[2];
			this.cs_username = split[3];
			this.s_ip = split[4];
			this.s_port = split[5];
			this.cs_method = split[6];
			this.cs_uri_stem = split[7];
			this.cs_uri_query = split[8];
			this.sc_status = split[9];
			this.cs_User_Agent = split[10];
			this.isValid = true;
		} else {
			this.isValid = false;
		}
	}

	public String getDate() {
		return date;
	}

	public void setDate(String date) {
		this.date = date;
	}

	public String getTime() {
		return time;
	}

	public void setTime(String time) {
		this.time = time;
	}

	public String getC_ip() {
		return c_ip;
	}

	public void setC_ip(String c_ip) {
		this.c_ip = c_ip;
	}

	public String getCs_username() {
		return cs_username;
	}

	public void setCs_username(String cs_username) {
		this.cs_username = cs_username;
	}

	public String getS_ip() {
		return s_ip;
	}

	public void setS_ip(String s_ip) {
		this.s_ip = s_ip;
	}

	public String getS_port() {
		return s_port;
	}

	public void setS_port(String s_port) {
		this.s_port = s_port;
	}

	public String getCs_method() {
		return cs_method;
	}

	public void setCs_method(String cs_method) {
		this.cs_method = cs_method;
	}

	public String getCs_uri_stem() {
		return cs_uri_stem;
	}

	public void setCs_uri_stem(String cs_uri_stem) {
		this.cs_uri_stem = cs_uri_stem;
	}

	public String getCs_uri_query() {
		return cs_uri_query;
	}

	public void setCs_uri_query(String cs_uri_query) {
		this.cs_uri_query = cs_uri_query;
	}

	public String getSc_status() {
		return sc_status;
	}

	public void setSc_status(String sc_status) {
		this.sc_status = sc_status;
	}

	public String getCs_User_Agent() {
		return cs_User_Agent;
	}

	public void setCs_User_Agent(String cs_User_Agent) {
		this.cs_User_Agent = cs_User_Agent;
	}

	public boolean isValid() {
		return isValid;
	}

	public void setValid(boolean isValid) {
		this.isValid = isValid;
	}

	@Override
	public String toString() {
		return "date=" + date + ", time=" + time + ", c_ip=" + c_ip + ", cs_username=" + cs_username + ", s_ip=" + s_ip
				+ ", s_port=" + s_port + ", cs_method=" + cs_method + ", cs_uri_stem=" + cs_uri_stem + ", cs_uri_query="
				+ cs_uri_query + ", sc_status=" + sc_status + ", cs_User_Agent=" + cs_User_Agent + ", isValid="
				+ isValid;
	}
}
AloneIPVisitsNumsMapper.java / AloneIPVisitsNumsCombiner.java
package com.atSchool.WebLog.AloneIP;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

/**
 * 页面访问量
 * 
 * 输出格式:
 * 	mapper-out:172.17.40.35	/news/newsweb/call_news_top.asp	
 * 	mapper-out:172.16.74.253	/index.asp	
 * 	mapper-out:172.16.74.253	/news/newsweb/call_news_top.asp	
 * 	mapper-out:172.16.94.47	/index.asp	
 * 	mapper-out:172.16.80.33	/index.asp
 */
public class AloneIPVisitsNumsMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
	private Text outKey = new Text();

	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, NullWritable>.Context context)
			throws IOException, InterruptedException {
		LogEntity logEntity = new LogEntity(value.toString());

		if (logEntity.isValid() == true) {
			String cs_uri = logEntity.getCs_uri_stem();
			String c_ip = logEntity.getC_ip();
			// 由于静态资源不算页面访问量,所以得进行过滤
			if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
				// 注意:由于相同的IP+相同的uri只能算一个访问记录,所以这里得传`IP+uri`
				outKey.set(c_ip + "\t" + cs_uri);
				context.write(outKey, NullWritable.get());
//				System.out.println("mapper-out:" + outKey.toString() + "\t");
			}
		}
	}
}

/**
 * combiner:这里只是输出一下,为了减轻reduce的压力
 * 
 * 输出格式:
 * 	combiner-out:219.132.7.2	/index.asp	
 * combiner-out:219.132.7.2	/info_pub/zsjy/zsxx/jj.htm	
 * combiner-out:219.132.7.2	/news/newsweb/call_news_top.asp	
 * combiner-out:219.132.83.251	/index.asp	
 * combiner-out:219.132.83.251	/info_pub/zsjy/zsxx/01yishu.htm	
 * combiner-out:219.132.83.251	/info_pub/zsjy/zsxx/pic/zsxx.htm	
 * combiner-out:219.132.83.251	/news/newsweb/call_news_top.asp	
 * combiner-out:219.132.83.251	/pop/newyear.htm
 */
class AloneIPVisitsNumsCombiner extends Reducer<Text, NullWritable, Text, NullWritable> {
	@Override
	protected void reduce(Text key, Iterable<NullWritable> value,
			Reducer<Text, NullWritable, Text, NullWritable>.Context context) throws IOException, InterruptedException {
		context.write(key, NullWritable.get());
//		System.out.println("combiner-out:" + key.toString() + "\t");
	}
}
AloneIPVisitsNumsReduce.java / StringSameCount.java
package com.atSchool.WebLog.AloneIP;

import java.io.IOException;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

/**
 * 由于combiner提前进行了合并,所以传过来的都是不相同的
 * 对数据进行分割,统计所有相同的uri
 * 最终输出到文件中
 */
public class AloneIPVisitsNumsReduce extends Reducer<Text, NullWritable, Text, IntWritable> {
	// 存储传过来的值
	Set<String> s = new HashSet<String>();

	@Override
	protected void reduce(Text key, Iterable<NullWritable> value,
			Reducer<Text, NullWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
		// 直接将传过来的值装到HashSet集合中
		// 无序(存储顺序和读取顺序不同),不包含重复元素的集合。
		s.add(key.toString());
	}

	@Override
	protected void cleanup(Reducer<Text, NullWritable, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		StringSameCount stringSameCount = new StringSameCount();
		for (String string : s) {
			String[] split = string.split("\t");
			stringSameCount.hashInsert(split[1]); // 将uri放入到stringSameCount中,进行统计
		}

		HashMap<String, Integer> map = stringSameCount.getHashMap(); // 获取统计好后的结果
		Set<String> keySet = map.keySet();
		for (String key : keySet) {
			Integer value = map.get(key); // 根据key获取value
			context.write(new Text(key), new IntWritable(value));
//			System.out.println("reduce-out:" + key + "\t" + value);
		}
	}
}

// 用来处理数据的类
class StringSameCount {
	private HashMap<String, Integer> map;
	private int counter; // 计数器

	public StringSameCount() {
		map = new HashMap<String, Integer>();
	}

	// 判断是否有重复的key
	public void hashInsert(String string) {
		if (map.containsKey(string)) { // 判断map中是否有相同的key
			counter = (Integer) map.get(string); // 根据key获取值
			map.put(string, ++counter); // 值+1
		} else { // 如果没有则key为string,value为1
			map.put(string, 1);
		}
	}

	public HashMap<String, Integer> getHashMap() {
		return map;
	}
}
AloneIPVisitsNumsJob.java
package com.atSchool.WebLog.AloneIP;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import com.atSchool.utils.HDFSUtils;

public class AloneIPVisitsNumsJob extends Configured implements Tool {
	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new AloneIPVisitsNumsJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		// 获取Job
		Configuration configuration = new Configuration();
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
		Job job = Job.getInstance(configuration);

		// 设置需要运行的任务
		job.setJarByClass(AloneIPVisitsNumsJob.class);

		// 告诉job Map和Reduce在哪
		job.setMapperClass(AloneIPVisitsNumsMapper.class);
		job.setReducerClass(AloneIPVisitsNumsReduce.class);
		
		job.setCombinerClass(AloneIPVisitsNumsCombiner.class);

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(NullWritable.class);

		// 告诉job Reduce输出的key和value的数据类型的是什么
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		// 告诉job输入和输出的路径
		FileInputFormat.addInputPath(job, new Path("/web.log"));
		/**
		 * 因为输出的文件不允许存在,所以需要处理一下
		 */
		FileSystem fileSystem = HDFSUtils.getFileSystem();
		Path path = new Path("/MapReduceOut");
		if (fileSystem.exists(path)) {
			fileSystem.delete(path, true);
			System.out.println("删除成功");
		}
		FileOutputFormat.setOutputPath(job, path);

		// 提交任务
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}
4、页面独立IP的访问量统计写到MySQL中
AloneIpWritable.java
package com.atSchool.WebLog.AloneIP;

import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.mapred.lib.db.DBWritable;

/**
 * 独立Ip访问量输出到MySQL对应的日志类
 */
public class AloneIpWritable implements DBWritable {
	private String uri;
	private Integer count;

	public AloneIpWritable() {
	}

	public AloneIpWritable(String line) {
		String[] split = line.split("\t");
		this.uri = split[0];
		this.count = Integer.valueOf(split[1]);
	}

	public String getUri() {
		return uri;
	}

	public void setUri(String uri) {
		this.uri = uri;
	}

	public Integer getCounter() {
		return count;
	}

	public void setCounter(Integer counter) {
		this.count = counter;
	}

	@Override
	public void write(PreparedStatement statement) throws SQLException {
		statement.setString(1, this.uri);
		statement.setInt(2, this.count);
	}

	@Override
	public void readFields(ResultSet resultSet) throws SQLException {
		this.uri = resultSet.getString(1);
		this.count = resultSet.getInt(2);
	}

	@Override
	public String toString() {
		return "uri=" + uri + ", counter=" + count;
	}
}
MRToMysqlMapper.java / MRToMysqlJob.java
package com.atSchool.WebLog.AloneIP;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * 独立Ip访问量
 * 读取HDFS的文件输出到MySQL中
 */
class MRToMysqlMapper extends Mapper<LongWritable, Text, AloneIpWritable, NullWritable> {
	@Override
	protected void map(LongWritable key, Text value,
			Mapper<LongWritable, Text, AloneIpWritable, NullWritable>.Context context)
			throws IOException, InterruptedException {
		AloneIpWritable logWritable = new AloneIpWritable(value.toString());
		context.write(logWritable, NullWritable.get());
	}
}

public class MRToMysqlJob extends Configured implements Tool {
	private String className = "com.mysql.cj.jdbc.Driver";
	private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
	private String user = "root";
	private String password = "password";

	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new MRToMysqlJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		/**
		 * 获取job:一个工作对象
		 */
		// 创建一个 配置 对象
		Configuration configuration = new Configuration();

		// 设置 name属性 的值。
		// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
		// 名称将在配置前进行修剪。
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");

		// 在configuration中设置数据库访问相关字段。
		DBConfiguration.configureDB(configuration, className, url, user, password);

		// 根据配置文件创建一个job
		Job job = Job.getInstance(configuration);

		/**
		 * 设置job
		 */
		/**
		 * setOutput(Job job, String tableName, String... fieldNames) throws IOException
		 * 用适当的输出设置初始化作业的缩减部分
		 * 参数:
		 * job:The job
		 * tableName:要插入数据的表
		 * fieldNames:表中的字段名。
		 */
		DBOutputFormat.setOutput(job, "alone_ip_kpi", new String[] { "uri", "count" });

		// 通过查找给定类的来源来设置Jar。
		job.setJarByClass(MRToMysqlJob.class);

		// 给 job 设置 Map和Reduce
		job.setMapperClass(MRToMysqlMapper.class);
		job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(AloneIpWritable.class);
		job.setMapOutputValueClass(NullWritable.class);

		// 给 job 设置InputFormat
		// InputFormat:描述 Map-Reduce job 的输入规范
		// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
		job.setInputFormatClass(TextInputFormat.class);
		job.setOutputFormatClass(DBOutputFormat.class);

		/**
		 * 设置输入路径
		 */
		FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");

		// 将job提交到集群并等待它完成。
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}

6.2.3 在网页上显示 网页访问量统计的top5

1、MVC框架说明

M:module,业务模型,用于提供数据

V:view,视图、用户界面,用于显示数据

C:controller,控制器、分发,用于分发请求

2、新建Web项目

之前已经使用MapReduce分析出了结果,并将数据写入到了MySQL中,现在如果要将数据显示到网页上就得新建一个web项目:

  1. 新建一个web项目-Dynamic Web Project
  2. Dynamic web module version设置为3.0即可
  3. 一直next,最后勾选Generate web.xml deployment descriptor然后finish即可。
3、项目结构

hadoophdfs案例 hadoop案例分析_hadoop_02

4、代码
1、实体类
package com.atschool.Entity;

/**
 * webkpi表对应的实体类
 */
public class webkpi {
	private String uri;
	private Integer nums;

	public webkpi() {
	}

	public webkpi(String uri, Integer nums) {
		this.uri = uri;
		this.nums = nums;
	}

	public String getUri() {
		return uri;
	}

	public void setUri(String uri) {
		this.uri = uri;
	}

	public Integer getNums() {
		return nums;
	}

	public void setNums(Integer nums) {
		this.nums = nums;
	}

	@Override
	public String toString() {
		return "uri=" + uri + ", nums=" + nums;
	}
}
2、工具类
package com.atschool.DBUtils;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

/**
 * 数据库工具类
 */
public class DBUtils {
	private static String className = "com.mysql.cj.jdbc.Driver";
	private static String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
	private static String user = "root";
	private static String password = "password";

	// 获取连接
	public static Connection getConnection() {
		Connection connection = null;
		try {
			// 1.加载驱动
			Class.forName(className);
			// 2.建立连接(Connection)
			if (connection == null) {
				connection = DriverManager.getConnection(url, user, password);
			}
		} catch (ClassNotFoundException | SQLException e) {
			e.printStackTrace();
		}
		return connection;
	}

	// 释放连接
	public static void close(Connection connection, PreparedStatement preparedStatement, ResultSet resultSet) {
		try {
			if (connection != null) {
				connection.close();
			}
			if (preparedStatement != null) {
				preparedStatement.close();
			}
			if (resultSet != null) {
				resultSet.close();
			}
		} catch (SQLException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
	
	/**
	 * 对数据库进行增、删、改操作,直接调用即可,不需要连接数据库和释放资源。
	 *
	 * @param sql  需要执行的sql语句
	 * @param args 包含占位符信息的一个数组
	 */
	public static void action(String sql, Object... args) {
		Connection conn = null;
		PreparedStatement ps = null;
		try {
			// 连接数据库
			conn = DBUtils.getConnection();

			// 预编译sql语句,返回一个PrepareStatem实例
			ps = conn.prepareStatement(sql);

			// 填充占位符
			for (int num = 0; num < args.length; num++) {
				ps.setObject(num + 1, args[num]);
				/**
				 * 注意:
				 *      这里你的args里面的数据是一定的,表示你要从1开始赋值,赋值一定的次数
				 *      但是你的args存储数据是从0开始的,如果统一从1开始就会报错。
				 */
			}

			// 执行sql语句
			ps.execute();
		} catch (SQLException throwables) {
			throwables.printStackTrace();
		}

		// 释放资源
		DBUtils.close(conn, ps, null);
	}
}
3、Dao层
package com.atschool.Dao;

import java.util.List;

/**
 * DAO:
 * 		Data Access Object访问数据信息的类和接口,包括了对数据的CRUD(Create、Retrival、Update、 Delete),而不包含任何业务相关的信息。
 * 		有时也称作:BaseDAO
 * 作用:
 * 		为了实现功能的模块化,更有利于代码的维护和升级。 
 * 
 * uri的dao层
 */
public interface UriDao<T> {
	// 查询页面访问量前五
	List<T> queryTop();
}
package com.atschool.Dao.Impl;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;

import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.webkpi;

public class UriDaoImpl implements UriDao<webkpi> {

	@Override
	public List<webkpi> queryTop() {
		String sql = "SELECT RIGHT(uri,12) AS uri,nums FROM `webkpi` ORDER BY nums DESC LIMIT 5";
		Connection connection = DBUtils.getConnection();

		PreparedStatement ps = null;
		ResultSet resultSet = null;
		ArrayList<webkpi> arrayList = null;
		try {
			// 预编译sql语句,返回一个PrepareStatem实例
			ps = connection.prepareStatement(sql);

			// 执行sql语句得到结果
			resultSet = ps.executeQuery();

			arrayList = new ArrayList<>();
			while (resultSet.next()) {
				String uri = resultSet.getString("uri");
				int nums = resultSet.getInt("nums");
				arrayList.add(new webkpi(uri, nums));
			}
		} catch (SQLException e) {
			e.printStackTrace();
		} finally {
			// 释放资源
			DBUtils.close(connection, ps, resultSet);
		}

		return arrayList;
	}

	// 测试
	public static void main(String[] args) {
		List<webkpi> queryTop5 = new UriDaoImpl().queryTop();
		for (webkpi webkpi : queryTop5) {
			System.out.println(webkpi);
		}
	}
}
4、控制层
package com.atschool.Controller;

import java.io.IOException;
import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.UriDaoImpl;
import com.atschool.Entity.webkpi;

/**
 * Servlet implementation class UriTopServlete
 */
@WebServlet("/UriTopServlete")
public class UriTopServlete extends HttpServlet {
	private static final long serialVersionUID = 1L;
	private UriDaoImpl uriDaoImpl = new UriDaoImpl();

	// service 不管是什么请求都会接收
	@Override
	protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
		// 获取数据
		List<webkpi> queryTop5 = uriDaoImpl.queryTop5();
		
		// 转成json数据
		String jsonString = JSONArray.toJSONString(queryTop5);
		
		resp.getWriter().write(jsonString);
	}
}
5、页面
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>网站访问量统计</title>
<!-- 引入 echarts.js -->
<script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
<script src="https://cdn.bootcdn.net/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
</head>
<body>
	<!-- 为ECharts准备一个具备大小(宽高)的Dom -->
	<div id="main" style="width: 700px; height: 400px;"></div>
	<script type="text/javascript">
		// dom加载后就会执行
		var xData = new Array();
		var yData = new Array();
		$(function() {
			$.ajax({
				//请求方式
				type : "POST",
				//请求地址
				url : "UriTopServlete",	// 这里最好写全,以便页面换位置了,也可以访问
				//规定返回的数据类型
				dataType : "json",
				// 由于不能让图表在没有获取到数据之前就显示出来,所以设置为同步操作
				async : false,
				//请求成功
				success : function(result) {
					console.log(result); // 打印到浏览器控制台

					// 对获取到的数据进行解析
					for (var i = 0; i < result.length; i++) {
						xData.push(result[i].uri);
						yData.push(result[i].nums);
					}
				},
				//请求失败,包含具体的错误信息
				error : function(e) {
					console.log(e.status);
					console.log(e.responseText);
				}
			});

			// 基于准备好的dom,初始化echarts实例
			var myChart = echarts.init(document.getElementById('main'));

			// 指定图表的配置项和数据
			var option = {
				title : {
					text : '网站访问量'
				},
				// 工具栏
				tooltip : {},
				legend : {
					data : [ '访问量' ]
				},
				// x轴
				xAxis : {
					data : xData
				},
				// y轴
				yAxis : {},
				series : [ {
					name : '访问量',
					type : 'bar',	// 图的类型,bar-柱状/条形图,line-线图,pie-饼状图
					data : yData
				} ]
			};

			// 使用刚指定的配置项和数据显示图表。
			myChart.setOption(option);

		});
	</script>
</body>
</html>

6.2.4 在网页上显示 独立Ip访问量统计的Top10

说明:

  • 前面实现了网页访问量Top5的显示,所以这里只需要在其基础上添上几笔
1、实体类
// 由于数据差别不是很大,所以用的还是前面的webkpi类
2、工具类
// 前面写了这里就不写了
3、Dao层
package com.atschool.Dao.Impl;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;

import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.webkpi;

public class AloneIpDaoImpl implements UriDao {
	@Override
	public List<webkpi> queryTop() {
		String sql = "SELECT * FROM alone_ip_kpi ORDER BY count DESC LIMIT 10";
		Connection connection = DBUtils.getConnection();

		PreparedStatement ps = null;
		ResultSet resultSet = null;
		ArrayList<webkpi> arrayList = null;
		try {
			// 预编译sql语句,返回一个PrepareStatem实例
			ps = connection.prepareStatement(sql);

			// 执行sql语句得到结果
			resultSet = ps.executeQuery();

			arrayList = new ArrayList<>();
			while (resultSet.next()) {
				String uri = resultSet.getString("uri");
				int count = resultSet.getInt("count");
				arrayList.add(new webkpi(uri, count));
			}
		} catch (SQLException e) {
			e.printStackTrace();
		} finally {
			// 释放资源
			DBUtils.close(connection, ps, resultSet);
		}

		return arrayList;
	}

	// 测试
	public static void main(String[] args) {
		List<webkpi> queryTop10 = new AloneIpDaoImpl().queryTop();
		for (webkpi webkpi : queryTop10) {
			System.out.println(webkpi);
		}
	}
}
4、控制层
package com.atschool.Controller;

import java.io.IOException;
import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.AloneIpDaoImpl;
import com.atschool.Entity.webkpi;

/**
 * Servlet implementation class AloneIpTopServlet
 */
@WebServlet("/AloneIpTopServlet")
public class AloneIpTopServlet extends HttpServlet {
	private static final long serialVersionUID = 1L;
	private AloneIpDaoImpl aloneIpDaoImpl = new AloneIpDaoImpl();

	// service 不管是什么请求都会接收
	@Override
	protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
		// 获取数据
		List<webkpi> queryTop10 = aloneIpDaoImpl.queryTop();

		// 转成json数据
		String jsonString = JSONArray.toJSONString(queryTop10);

		resp.getWriter().write(jsonString);
	}
}
5、页面
// 这里只需要修改一下前面写的 index.html 页面的,这里使用饼状图看看

6.2.5 每天最高访问量

1、MapReduce统计
package com.atSchool.WebLog.EveryDayTop1;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import com.atSchool.WebLog.AloneIP.LogEntity;
import com.atSchool.utils.HDFSUtils;

public class EveryDayTopJob extends Configured implements Tool {
	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new EveryDayTopJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		// 获取Job
		Configuration configuration = new Configuration();
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
		Job job = Job.getInstance(configuration);

		// 设置需要运行的任务
		job.setJarByClass(EveryDayTopJob.class);

		// 告诉job Map和Reduce在哪
		job.setMapperClass(EveryDayTopMapper.class);
		job.setReducerClass(EveryDayTopReduce.class);

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);

		// 告诉job Reduce输出的key和value的数据类型的是什么
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		// 告诉job输入和输出的路径
		FileInputFormat.addInputPath(job, new Path("/web.log"));
		/**
		 * 因为输出的文件不允许存在,所以需要处理一下
		 */
		FileSystem fileSystem = HDFSUtils.getFileSystem();
		Path path = new Path("/MapReduceOut");
		if (fileSystem.exists(path)) {
			fileSystem.delete(path, true);
			System.out.println("删除成功");
		}
		FileOutputFormat.setOutputPath(job, path);

		// 提交任务
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}

class EveryDayTopMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	private Text outKey = new Text();
	private IntWritable outValue = new IntWritable(1);

	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		LogEntity logEntity = new LogEntity(value.toString());

		if (logEntity.isValid() == true) {
			String date = logEntity.getDate();
			String cs_uri = logEntity.getCs_uri_stem();
			// 由于静态资源不算页面访问量,所以得进行过滤
			if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
				outKey.set(date + "\t" + cs_uri);
				context.write(outKey, outValue);
				// System.out.println("mapper-out:" + outKey.toString() + "\t");
			}
		}
	}
}

class EveryDayTopReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	private IntWritable outValue = new IntWritable();

	@Override
	protected void reduce(Text key, Iterable<IntWritable> value,
			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
		int sum = 0;
		for (IntWritable intWritable : value) {
			sum += intWritable.get();
		}
		outValue.set(sum);
		context.write(key, outValue);
	}
}
2、写出到MySQL中
package com.atSchool.WebLog.EveryDayTop1;

import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.mapred.lib.db.DBWritable;

/**
 * 每天最高访问量输出到MySQL对应的日志类
 */
public class EveryDayTopWritable implements DBWritable {
	private String date;
	private String uri;
	private Integer count;

	public EveryDayTopWritable() {
	}

	public EveryDayTopWritable(String line) {
		String[] split = line.split("\t");
		this.date = split[0];
		this.uri = split[1];
		this.count = Integer.valueOf(split[2]);
	}

	public String getDate() {
		return date;
	}

	public void setDate(String date) {
		this.date = date;
	}

	public String getUri() {
		return uri;
	}

	public void setUri(String uri) {
		this.uri = uri;
	}

	public Integer getCounter() {
		return count;
	}

	public void setCounter(Integer counter) {
		this.count = counter;
	}

	@Override
	public void write(PreparedStatement statement) throws SQLException {
		statement.setString(1, this.date);
		statement.setString(2, this.uri);
		statement.setInt(3, this.count);
	}

	@Override
	public void readFields(ResultSet resultSet) throws SQLException {
		this.date = resultSet.getString(1);
		this.uri = resultSet.getString(2);
		this.count = resultSet.getInt(3);
	}

	@Override
	public String toString() {
		return "date=" + date + "uri=" + uri + ", counter=" + count;
	}
}
package com.atSchool.WebLog.EveryDayTop1;

import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.mapred.lib.db.DBWritable;

/**
 * 每天最高访问量输出到MySQL对应的日志类
 */
public class EveryDayTopWritable implements DBWritable {
	private String date;
	private String uri;
	private Integer count;

	public EveryDayTopWritable() {
	}

	public EveryDayTopWritable(String line) {
		String[] split = line.split("\t");
		this.date = split[0];
		this.uri = split[1];
		this.count = Integer.valueOf(split[2]);
	}

	public String getDate() {
		return date;
	}

	public void setDate(String date) {
		this.date = date;
	}

	public String getUri() {
		return uri;
	}

	public void setUri(String uri) {
		this.uri = uri;
	}

	public Integer getCounter() {
		return count;
	}

	public void setCounter(Integer counter) {
		this.count = counter;
	}

	@Override
	public void write(PreparedStatement statement) throws SQLException {
		statement.setString(1, this.date);
		statement.setString(2, this.uri);
		statement.setInt(3, this.count);
	}

	@Override
	public void readFields(ResultSet resultSet) throws SQLException {
		this.date = resultSet.getString(1);
		this.uri = resultSet.getString(2);
		this.count = resultSet.getInt(3);
	}

	@Override
	public String toString() {
		return "date=" + date + "uri=" + uri + ", counter=" + count;
	}
}
package com.atSchool.WebLog.EveryDayTop1;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * 每天最高访问量
 * 读取HDFS的文件输出到MySQL中
 */
class MRToMysqlMapper extends Mapper<LongWritable, Text, EveryDayTopWritable, NullWritable> {
	@Override
	protected void map(LongWritable key, Text value,
			Mapper<LongWritable, Text, EveryDayTopWritable, NullWritable>.Context context)
			throws IOException, InterruptedException {
		EveryDayTopWritable logWritable = new EveryDayTopWritable(value.toString());
		context.write(logWritable, NullWritable.get());
	}
}

public class MRToMysqlJob extends Configured implements Tool {
	private String className = "com.mysql.cj.jdbc.Driver";
	private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
	private String user = "root";
	private String password = "password";

	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new MRToMysqlJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		/**
		 * 获取job:一个工作对象
		 */
		// 创建一个 配置 对象
		Configuration configuration = new Configuration();

		// 设置 name属性 的值。
		// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
		// 名称将在配置前进行修剪。
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");

		// 在configuration中设置数据库访问相关字段。
		DBConfiguration.configureDB(configuration, className, url, user, password);

		// 根据配置文件创建一个job
		Job job = Job.getInstance(configuration);

		/**
		 * 设置job
		 */
		/**
		 * setOutput(Job job, String tableName, String... fieldNames) throws IOException
		 * 用适当的输出设置初始化作业的缩减部分
		 * 参数:
		 * job:The job
		 * tableName:要插入数据的表
		 * fieldNames:表中的字段名。
		 */
		DBOutputFormat.setOutput(job, "everyday_top", new String[] { "date","uri", "count" });

		// 通过查找给定类的来源来设置Jar。
		job.setJarByClass(MRToMysqlJob.class);

		// 给 job 设置 Map和Reduce
		job.setMapperClass(MRToMysqlMapper.class);
		job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(EveryDayTopWritable.class);
		job.setMapOutputValueClass(NullWritable.class);

		// 给 job 设置InputFormat
		// InputFormat:描述 Map-Reduce job 的输入规范
		// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
		job.setInputFormatClass(TextInputFormat.class);
		job.setOutputFormatClass(DBOutputFormat.class);

		/**
		 * 设置输入路径
		 */
		FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");

		// 将job提交到集群并等待它完成。
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}
3、显示到页面中

这里和前面一样,只需要添加一些类就可以了

1、实体类
package com.atschool.Entity;

/**
 * everyday_top表对应的实体类
 */
public class everyday_top {
	private String date;
	private String uri;
	private Integer count;

	public everyday_top() {
	}

	public everyday_top(String date, String uri, Integer count) {
		this.date = date;
		this.uri = uri;
		this.count = count;
	}

	public String getDate() {
		return date;
	}

	public void setDate(String date) {
		this.date = date;
	}

	public String getUri() {
		return uri;
	}

	public void setUri(String uri) {
		this.uri = uri;
	}

	public Integer getCount() {
		return count;
	}

	public void setCount(Integer count) {
		this.count = count;
	}

	@Override
	public String toString() {
		return "date=" + date +", uri=" + uri + ", count=" + count;
	}
}
2、dao层
package com.atschool.Dao.Impl;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;

import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.everyday_top;

public class EverydayTopImpl implements UriDao<everyday_top> {
	@Override
	public List<everyday_top> queryTop() {
		String sql = "SELECT date,uri,MAX(count) AS count FROM everyday_top GROUP BY date";
		Connection connection = DBUtils.getConnection();

		PreparedStatement ps = null;
		ResultSet resultSet = null;
		ArrayList<everyday_top> arrayList = null;
		try {
			// 预编译sql语句,返回一个PrepareStatem实例
			ps = connection.prepareStatement(sql);

			// 执行sql语句得到结果
			resultSet = ps.executeQuery();

			arrayList = new ArrayList<>();
			while (resultSet.next()) {
				String date = resultSet.getString("date");
				String uri = resultSet.getString("uri");
				int count = resultSet.getInt("count");
				arrayList.add(new everyday_top(date, uri, count));
			}
		} catch (SQLException e) {
			e.printStackTrace();
		} finally {
			// 释放资源
			DBUtils.close(connection, ps, resultSet);
		}

		return arrayList;
	}

	// 测试
	public static void main(String[] args) {
		List<everyday_top> queryTop5 = new EverydayTopImpl().queryTop();
		for (everyday_top webkpi : queryTop5) {
			System.out.println(webkpi);
		}
	}
}
3、控制层
package com.atschool.Controller;

import java.io.IOException;
import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.EverydayTopImpl;
import com.atschool.Entity.everyday_top;

@WebServlet("/EveryDayTopServlete")
public class EveryDayTopServlete extends HttpServlet {
	private static final long serialVersionUID = 1L;
	private EverydayTopImpl everydayTopImpl = new EverydayTopImpl();

	// service 不管是什么请求都会接收
	@Override
	protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
		// 获取数据
		List<everyday_top> queryTop5 = everydayTopImpl.queryTop();

		// 转成json数据
		String jsonString = JSONArray.toJSONString(queryTop5);

		resp.getWriter().write(jsonString);
	}
}
4、页面
// 这里和前面的页面一样,这里使用折线图效果应该会更好

6.2.6 统计用户每小时的页面访问量(PV,page view)

说明:

  • 统计24小时类各个小时 页面的访问量
1、MR统计每个小时的访问量
package com.atSchool.WebLog.Time;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import com.atSchool.WebLog.AloneIP.LogEntity;
import com.atSchool.utils.HDFSUtils;

public class TimeJob extends Configured implements Tool {
	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new TimeJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		// 获取Job
		Configuration configuration = new Configuration();
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
		Job job = Job.getInstance(configuration);

		// 设置需要运行的任务
		job.setJarByClass(TimeJob.class);

		// 告诉job Map和Reduce在哪
		job.setMapperClass(TimeMapper.class);
		job.setReducerClass(TimeReduce.class);

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);

		// 告诉job Reduce输出的key和value的数据类型的是什么
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		// 告诉job输入和输出的路径
		FileInputFormat.addInputPath(job, new Path("/web.log"));
		/**
		 * 因为输出的文件不允许存在,所以需要处理一下
		 */
		FileSystem fileSystem = HDFSUtils.getFileSystem();
		Path path = new Path("/MapReduceOut");
		if (fileSystem.exists(path)) {
			fileSystem.delete(path, true);
			System.out.println("删除成功");
		}
		FileOutputFormat.setOutputPath(job, path);

		// 提交任务
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}

class TimeMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	private Text outKey = new Text();
	private IntWritable outValue = new IntWritable(1);

	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		LogEntity logEntity = new LogEntity(value.toString());

		if (logEntity.isValid() == true) {
			String time = logEntity.getTime().substring(0, 2) + ":00";
			String cs_uri = logEntity.getCs_uri_stem();
			// 由于静态资源不算页面访问量,所以得进行过滤
			if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
				outKey.set(time);
				context.write(outKey, outValue);
				// System.out.println("mapper-out:" + outKey.toString() + "\t");
			}
		}
	}
}

class TimeReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	protected void reduce(Text key, Iterable<IntWritable> value,
			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
		int sum = 0;
		for (IntWritable intWritable : value) {
			sum += intWritable.get();
		}
		context.write(key, new IntWritable(sum));
	}
}
2、录入到MySQL中
  • 建表
  • 创建序列化类
  • MR写出数据
package com.atSchool.WebLog.Time;

import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.mapred.lib.db.DBWritable;

/**
 * 按小时统计访问量 输出到MySQL对应的日志类
 */
public class TimeWritable implements DBWritable {
	private String visit_time;
	private int nums;

	public TimeWritable() {
	}

	public TimeWritable(String line) {
		String[] split = line.split("\t");
		this.visit_time = split[0];
		this.nums = Integer.valueOf(split[1]);
	}

	public String getVisit_time() {
		return visit_time;
	}

	public void setVisit_time(String visit_time) {
		this.visit_time = visit_time;
	}

	public int getNums() {
		return nums;
	}

	public void setNums(int nums) {
		this.nums = nums;
	}

	@Override
	public void write(PreparedStatement statement) throws SQLException {
		statement.setString(1, visit_time);
		statement.setInt(2, nums);
	}

	@Override
	public void readFields(ResultSet resultSet) throws SQLException {
		this.visit_time = resultSet.getString(1);
		this.nums = resultSet.getInt(2);
	}

	@Override
	public String toString() {
		return "visit_time=" + visit_time + ", nums=" + nums;
	}
}
package com.atSchool.WebLog.Time;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * 按小时统计访问量
 * 读取HDFS的文件输出到MySQL中
 */
class MRToMysqlMapper extends Mapper<LongWritable, Text, TimeWritable, NullWritable> {
	@Override
	protected void map(LongWritable key, Text value,
			Mapper<LongWritable, Text, TimeWritable, NullWritable>.Context context)
			throws IOException, InterruptedException {
		TimeWritable logWritable = new TimeWritable(value.toString());
		context.write(logWritable, NullWritable.get());
	}
}

public class MRToMysqlJob extends Configured implements Tool {
	private String className = "com.mysql.cj.jdbc.Driver";
	private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
	private String user = "root";
	private String password = "password";

	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new MRToMysqlJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		/**
		 * 获取job:一个工作对象
		 */
		// 创建一个 配置 对象
		Configuration configuration = new Configuration();

		// 设置 name属性 的值。
		// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
		// 名称将在配置前进行修剪。
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");

		// 在configuration中设置数据库访问相关字段。
		DBConfiguration.configureDB(configuration, className, url, user, password);

		// 根据配置文件创建一个job
		Job job = Job.getInstance(configuration);

		/**
		 * 设置job
		 */
		/**
		 * setOutput(Job job, String tableName, String... fieldNames) throws IOException
		 * 用适当的输出设置初始化作业的缩减部分
		 * 参数:
		 * job:The job
		 * tableName:要插入数据的表
		 * fieldNames:表中的字段名。
		 */
		DBOutputFormat.setOutput(job, "time_count", new String[] { "visit_time","nums" });

		// 通过查找给定类的来源来设置Jar。
		job.setJarByClass(MRToMysqlJob.class);

		// 给 job 设置 Map和Reduce
		job.setMapperClass(MRToMysqlMapper.class);
		job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(TimeWritable.class);
		job.setMapOutputValueClass(NullWritable.class);

		// 给 job 设置InputFormat
		// InputFormat:描述 Map-Reduce job 的输入规范
		// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
		job.setInputFormatClass(TextInputFormat.class);
		job.setOutputFormatClass(DBOutputFormat.class);

		/**
		 * 设置输入路径
		 */
		FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");

		// 将job提交到集群并等待它完成。
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}
3、显示到页面中
1、实体类
package com.atschool.Entity;

public class time_count {
	private String visit_time;
	private int nums;

	public time_count() {
	}

	public time_count(String visit_time, int nums) {
		this.visit_time = visit_time;
		this.nums = nums;
	}

	public String getVisit_time() {
		return visit_time;
	}

	public void setVisit_time(String visit_time) {
		this.visit_time = visit_time;
	}

	public int getNums() {
		return nums;
	}

	public void setNums(int nums) {
		this.nums = nums;
	}

	@Override
	public String toString() {
		return "visit_time=" + visit_time + ", nums=" + nums;
	}
}
2、dao层
package com.atschool.Dao.Impl;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;

import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.time_count;

public class TimeCountDaoImpl implements UriDao<time_count> {

	@Override
	public List<time_count> queryTop() {
		String sql = "SELECT * FROM `time_count`";
		Connection connection = DBUtils.getConnection();

		PreparedStatement ps = null;
		ResultSet resultSet = null;
		ArrayList<time_count> arrayList = null;
		try {
			// 预编译sql语句,返回一个PrepareStatem实例
			ps = connection.prepareStatement(sql);

			// 执行sql语句得到结果
			resultSet = ps.executeQuery();

			arrayList = new ArrayList<>();
			while (resultSet.next()) {
				String visit_time = resultSet.getString("visit_time");
				int nums = resultSet.getInt("nums");
				arrayList.add(new time_count(visit_time, nums));
			}
		} catch (SQLException e) {
			e.printStackTrace();
		} finally {
			// 释放资源
			DBUtils.close(connection, ps, resultSet);
		}

		return arrayList;
	}

	// 测试
	public static void main(String[] args) {
		List<time_count> queryTop5 = new TimeCountDaoImpl().queryTop();
		for (time_count time_count : queryTop5) {
			System.out.println(time_count);
		}
	}
}
3、控制层
package com.atschool.Controller;

import java.io.IOException;
import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.TimeCountDaoImpl;
import com.atschool.Entity.time_count;

@WebServlet("/TimeCountServlete")
public class TimeCountServlete extends HttpServlet {
	private static final long serialVersionUID = 1L;
	private TimeCountDaoImpl timeCountDaoImpl = new TimeCountDaoImpl();

	// service 不管是什么请求都会接收
	@Override
	protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
		// 获取数据
		List<time_count> queryTop = timeCountDaoImpl.queryTop();
		
		// 转成json数据
		String jsonString = JSONArray.toJSONString(queryTop);
		
		resp.getWriter().write(jsonString);
	}
}
4、页面
// 和前面一样,使用折线图效果会更好

6.2.7 统计用户的访问设备

说明:

  • 统计用户访问页面使用的设备
1、MR统计各个访问记录使用的设备,并统计相同设备的数量
package com.atSchool.WebLog.EquipmentPV;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import com.atSchool.WebLog.AloneIP.LogEntity;
import com.atSchool.utils.HDFSUtils;

public class EquipmentPvJob extends Configured implements Tool {
	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new EquipmentPvJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		// 获取Job
		Configuration configuration = new Configuration();
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");
		Job job = Job.getInstance(configuration);

		// 设置需要运行的任务
		job.setJarByClass(EquipmentPvJob.class);

		// 告诉job Map和Reduce在哪
		job.setMapperClass(EquipmentPvMapper.class);
		job.setReducerClass(EquipmentPvReduce.class);

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);

		// 告诉job Reduce输出的key和value的数据类型的是什么
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		// 告诉job输入和输出的路径
		FileInputFormat.addInputPath(job, new Path("/web.log"));
		/**
		 * 因为输出的文件不允许存在,所以需要处理一下
		 */
		FileSystem fileSystem = HDFSUtils.getFileSystem();
		Path path = new Path("/MapReduceOut");
		if (fileSystem.exists(path)) {
			fileSystem.delete(path, true);
			System.out.println("删除成功");
		}
		FileOutputFormat.setOutputPath(job, path);

		// 提交任务
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}

class EquipmentPvMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	private Text outKey = new Text();
	private IntWritable outValue = new IntWritable(1);

	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		LogEntity logEntity = new LogEntity(value.toString());

		if (logEntity.isValid() == true) {
			String cs_User_Agent = logEntity.getCs_User_Agent();
			String cs_uri = logEntity.getCs_uri_stem();
			// 由于静态资源不算页面访问量,所以得进行过滤
			if (cs_uri.endsWith(".asp") || cs_uri.endsWith(".htm")) {
				String[] split = cs_User_Agent.split("/");
				if (!split[0].equals("-")) {
					outKey.set(split[0]);
					context.write(outKey, outValue);
				}
			}
		}
	}
}

class EquipmentPvReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	protected void reduce(Text key, Iterable<IntWritable> value,
			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
		int sum = 0;
		for (IntWritable intWritable : value) {
			sum += intWritable.get();
		}
		context.write(key, new IntWritable(sum));
	}
}
2、MR录入到MySQL中
package com.atSchool.WebLog.EquipmentPV;

import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.mapred.lib.db.DBWritable;

/**
 * 统计用户的访问设备 输出到MySQL对应的日志类
 */
public class EquipmentWritable implements DBWritable {
	private String user_agent;
	private int nums;

	public EquipmentWritable() {
	}

	public EquipmentWritable(String line) {
		String[] split = line.split("\t");
		this.user_agent = split[0];
		this.nums = Integer.valueOf(split[1]);
	}

	public String getuser_agent() {
		return user_agent;
	}

	public void setuser_agent(String user_agent) {
		this.user_agent = user_agent;
	}

	public int getNums() {
		return nums;
	}

	public void setNums(int nums) {
		this.nums = nums;
	}

	@Override
	public void write(PreparedStatement statement) throws SQLException {
		statement.setString(1, user_agent);
		statement.setInt(2, nums);
	}

	@Override
	public void readFields(ResultSet resultSet) throws SQLException {
		this.user_agent = resultSet.getString(1);
		this.nums = resultSet.getInt(2);
	}

	@Override
	public String toString() {
		return "user_agent=" + user_agent + ", nums=" + nums;
	}
}
package com.atSchool.WebLog.EquipmentPV;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * 统计用户的访问设备
 * 读取HDFS的文件输出到MySQL中
 */
class MRToMysqlMapper extends Mapper<LongWritable, Text, EquipmentWritable, NullWritable> {
	@Override
	protected void map(LongWritable key, Text value,
			Mapper<LongWritable, Text, EquipmentWritable, NullWritable>.Context context)
			throws IOException, InterruptedException {
		EquipmentWritable logWritable = new EquipmentWritable(value.toString());
		context.write(logWritable, NullWritable.get());
	}
}

public class MRToMysqlJob extends Configured implements Tool {
	private String className = "com.mysql.cj.jdbc.Driver";
	private String url = "jdbc:mysql://127.0.0.1:3306/webkpi?&charactercEncoding=utf-8&useSSL=false&serverTimezone=UTC";
	private String user = "root";
	private String password = "password";

	public static void main(String[] args) throws Exception {
		new ToolRunner().run(new MRToMysqlJob(), null);
	}

	@Override
	public int run(String[] args) throws Exception {
		/**
		 * 获取job:一个工作对象
		 */
		// 创建一个 配置 对象
		Configuration configuration = new Configuration();

		// 设置 name属性 的值。
		// 如果名称已弃用或有一个弃用的名称与之关联,它会将值设置为两个名称。
		// 名称将在配置前进行修剪。
		configuration.set("fs.defaultFS", "hdfs://192.168.232.129:9000");

		// 在configuration中设置数据库访问相关字段。
		DBConfiguration.configureDB(configuration, className, url, user, password);

		// 根据配置文件创建一个job
		Job job = Job.getInstance(configuration);

		/**
		 * 设置job
		 */
		/**
		 * setOutput(Job job, String tableName, String... fieldNames) throws IOException
		 * 用适当的输出设置初始化作业的缩减部分
		 * 参数:
		 * job:The job
		 * tableName:要插入数据的表
		 * fieldNames:表中的字段名。
		 */
		DBOutputFormat.setOutput(job, "equipment_pv", new String[] { "user_agent","nums" });

		// 通过查找给定类的来源来设置Jar。
		job.setJarByClass(MRToMysqlJob.class);

		// 给 job 设置 Map和Reduce
		job.setMapperClass(MRToMysqlMapper.class);
		job.setNumReduceTasks(0); // 这里用为没有用到reduce所以设置为0

		// 告诉job Map输出的key和value的数据类型的是什么
		job.setMapOutputKeyClass(EquipmentWritable.class);
		job.setMapOutputValueClass(NullWritable.class);

		// 给 job 设置InputFormat
		// InputFormat:描述 Map-Reduce job 的输入规范
		// DBInputFormat:从一个SQL表中读取输入数据的输入格式。
		job.setInputFormatClass(TextInputFormat.class);
		job.setOutputFormatClass(DBOutputFormat.class);

		/**
		 * 设置输入路径
		 */
		FileInputFormat.setInputPaths(job, "/MapReduceOut/part-r-00000");

		// 将job提交到集群并等待它完成。
		boolean waitForCompletion = job.waitForCompletion(true);
		System.out.println(waitForCompletion ? "执行成功" : "执行失败");
		return 0;
	}
}
3、显示到页面中
1、实体类
package com.atschool.Entity;

public class equipment_pv {
	private String user_agent;
	private int nums;

	public equipment_pv() {
	}

	public equipment_pv(String user_agent, int nums) {
		this.user_agent = user_agent;
		this.nums = nums;
	}

	public String getUser_agent() {
		return user_agent;
	}

	public void setUser_agent(String user_agent) {
		this.user_agent = user_agent;
	}

	public int getNums() {
		return nums;
	}

	public void setNums(int nums) {
		this.nums = nums;
	}

	@Override
	public String toString() {
		return "user_agent=" + user_agent + ", nums=" + nums;
	}
}
2、dao层
package com.atschool.Dao.Impl;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;

import com.atschool.DBUtils.DBUtils;
import com.atschool.Dao.UriDao;
import com.atschool.Entity.equipment_pv;

public class EquipmentPvDaoImpl implements UriDao<equipment_pv> {

	@Override
	public List<equipment_pv> queryTop() {
		String sql = "SELECT * FROM `equipment_pv` ORDER BY nums DESC LIMIT 5";
		Connection connection = DBUtils.getConnection();

		PreparedStatement ps = null;
		ResultSet resultSet = null;
		ArrayList<equipment_pv> arrayList = null;
		try {
			// 预编译sql语句,返回一个PrepareStatem实例
			ps = connection.prepareStatement(sql);

			// 执行sql语句得到结果
			resultSet = ps.executeQuery();

			arrayList = new ArrayList<>();
			while (resultSet.next()) {
				String visit_time = resultSet.getString("user_agent");
				int nums = resultSet.getInt("nums");
				arrayList.add(new equipment_pv(visit_time, nums));
			}
		} catch (SQLException e) {
			e.printStackTrace();
		} finally {
			// 释放资源
			DBUtils.close(connection, ps, resultSet);
		}

		return arrayList;
	}

	// 测试
	public static void main(String[] args) {
		List<equipment_pv> queryTop5 = new EquipmentPvDaoImpl().queryTop();
		for (equipment_pv equipment_pv : queryTop5) {
			System.out.println(equipment_pv);
		}
	}
}
3、控制层
package com.atschool.Controller;

import java.io.IOException;
import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.alibaba.fastjson.JSONArray;
import com.atschool.Dao.Impl.EquipmentPvDaoImpl;
import com.atschool.Entity.equipment_pv;

@WebServlet("/EquipmentPvServlet")
public class EquipmentPvServlete extends HttpServlet {
	private static final long serialVersionUID = 1L;
	private EquipmentPvDaoImpl equipmentPvDaoImpl = new EquipmentPvDaoImpl();

	// service 不管是什么请求都会接收
	@Override
	protected void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
		// 获取数据
		List<equipment_pv> queryTop5 = equipmentPvDaoImpl.queryTop();
		
		// 转成json数据
		String jsonString = JSONArray.toJSONString(queryTop5);
		
		resp.getWriter().write(jsonString);
	}
}
4、页面
// 和前面一样,使用饼图效果更佳

7、使用网页模板

使用技巧:

  • 不要一次性将整个网页全部粘贴到项目中,最开始只需要将必要的css/fonts/js/img等文件复制到项目中,然后将index.html复制到项目中,从主页面开始整合,将需要的留下,不需要的删除。然后根据自己的需求逐步将其他的页面复制到自己的项目中使用
  • 修改页面时,不要盲目寻找,利用浏览器的开发工具,找到需要修改的位置的关键词,再到页面中利用Ctrl+F寻找位置。
  • 如果需要使用的技术和模板使用的技术不一致,则可以将自己写的js封装到一个文件中(让js和页面分离:便于修改和维护),然后再页面末尾使用<Script>标签引用。例如前面使用Echars图表1、customer5.js
/**
 * 浏览器终端统计Top5
 */
// dom加载后就会执行
var array = new Array();
$(function() {
	$.ajax({
		// 请求方式
		type : "POST",
		// 请求地址
		url : "http://localhost:8080/WebKpi/EquipmentPvServlet",
		// 规定返回的数据类型
		dataType : "json",
		// 由于不能让图表在没有获取到数据之前就显示出来,所以设置为同步操作
		async : false,
		// 请求成功
		success : function(result) {
			console.log(result); // 打印到浏览器控制台

			// 对获取到的数据进行解析
			for (var i = 0; i < result.length; i++) {
				var object = new Object();
				object.value=result[i].nums;
				object.name=result[i].user_agent;
				array.push(object);
			}
		},
		// 请求失败,包含具体的错误信息
		error : function(e) {
			console.log(e.status);
			console.log(e.responseText);
		}
	});

	// 基于准备好的dom,初始化echarts实例
	var myChart = echarts.init(document.getElementById('main'));

	// 指定图表的配置项和数据
	var option = {
		series : [ {
			name : '数量',
			type : 'pie', // 图的类型,bar-柱状/条形图,line-线图,pie-饼状图
			radius: '55%',
			data : array
		} ]
	};

	// 使用刚指定的配置项和数据显示图表。
	myChart.setOption(option);

});

2、页面中(部分)

... ...

<!-- jquery============================================ -->
<script src="js/vendor/jquery-1.12.4.min.js"></script>

<!-- customer5浏览器终端统计Top5 -->
<script src="https://cdn.staticfile.org/echarts/4.3.0/echarts.min.js"></script>
<script src="js2/customer5.js"></script>

... ...
  • 服务端的代码编写好后不要一股脑的和页面进行整合显示,可以在浏览器中访问一下Servlete看是否拿到了数据。最好是保证每一步都准确无误后再进行整合。