目录
一、需求分析
二、Mysql表结构设计
三、环境准备
四、需求实现
ct.analysis
ct.analysis.io
ct.analysis.kv
ct.analysis.mapper
ct.analysis.reducer
ct.analysis.tool
在 ct.common.util
创建项目ct.cache
五、运行测试
我们的数据已经完整的采集到了HBase集群中,这次我们需要对采集到的数据进行分析,统计出我们想要的结果。注意,在分析的过程中,我们不一定会采取一个业务指标对应一个mapreduce-job的方式,如果情景允许,我们会采取一个mapreduce分析多个业务指标的方式来进行任务。具体何时采用哪种方式,我们后续会详细探讨。
分析模块流程如图3所示:
业务指标:
a) 用户每天主叫通话个数统计,通话时间统计。
b) 用户每月通话记录统计,通话时间统计。
c) 用户之间亲密关系统计。(通话次数与通话时间体现用户亲密关系)
一、需求分析
根据需求目标,设计出下述(3.2.2)表结构。我们需要按照时间范围(年月日),结合MapReduce统计出所属时间范围内所有手机号码的通话次数总和以及通话时长总和。
思路:
a) 维度,即某个角度,某个视角,按照时间维度来统计通话,比如我想统计2017年所有月份所有日子的通话记录,那这个维度我们大概可以表述为2017年*月*日
b) 通过Mapper将数据按照不同维度聚合给Reducer
c) 通过Reducer拿到按照各个维度聚合过来的数据,进行汇总,输出
d) 根据业务需求,将Reducer的输出通过Outputformat把数据
数据输入:HBase
数据输出:Mysql
HBase中数据源结构:
标签 | 举例&说明 |
rowkey | hashregion_call1_datetime_call2_flag_duration 01_15837312345_20170527081033_13766889900_1_0180 |
family | f1列族:存放主叫信息 f2列族:存放被叫信息 |
call1 | 第一个手机号码 |
call2 | 第二个手机号码 |
date_time | 通话建立的时间,例如:20171017081520 |
date_time_ts | date_time对应的时间戳形式 |
duration | 通话时长(单位:秒) |
flag | 标记call1是主叫还是被叫(call1的身份与call2的身份互斥) |
a) 已知目标,那么需要结合目标思考已有数据是否能够支撑目标实现;
b) 根据目标数据结构,构建Mysql表结构,建表;
c) 思考代码需要涉及到哪些功能模块,建立不同功能模块对应的包结构。
d) 描述数据,一定是基于某个维度(视角)的,所以构建维度类。比如按照“年”与“手机号码”的组合作为key聚合所有的数据,便可以统计这个手机号码,这一年的相关结果。
e) 自定义OutputFormat用于对接Mysql,使数据输出。
f) 创建相关工具类。
二、Mysql表结构设计
我们将分析的结果数据保存到Mysql中,以方便Web端进行查询展示。
1) 表7:db_telecom.tb_contacts
用于存放用户手机号码与联系人姓名。
表7 db_telecom.tb_contacts
列 | 备注 | 类型 |
id | 自增主键 | int(11) NOT NULL |
telephone | 手机号码 | varchar(255) NOT NULL |
name | 联系人姓名 | varchar(255) NOT NULL |
2) 表8:db_telecom.tb_call
用于存放某个时间维度下通话次数与通话时长的总和。
表8 db_telecom.tb_call
列 | 备注 | 类型 |
id_date_contact | 复合主键(联系人维度id,时间维度id) | varchar(255) NOT NULL |
id_date_dimension | 时间维度id | int(11) NOT NULL |
id_contact | 查询人的电话号码 | int(11) NOT NULL |
call_sum | 通话次数总和 | int(11) NOT NULL DEFAULT 0 |
call_duration_sum | 通话时长总和 | int(11) NOT NULL DEFAULT 0 |
3) 表9:db_telecom.tb_dimension_date
用于存放时间维度的相关数据
表9 db_telecom.tb_dimension_date
列 | 备注 | 类型 |
id | 自增主键 | int(11) NOT NULL |
year | 年,当前通话信息所在年 | int(11) NOT NULL |
month | 月,当前通话信息所在月,如果按照年来统计信息,则month为-1。 | int(11) NOT NULL |
day | 日,当前通话信息所在日,如果是按照月来统计信息,则day为-1。 | int(11) NOT NULL |
4) 表10:db_telecom.tb_intimacy
用于存放所有用户用户关系的结果数据。(作业中使用)
表10 db_telecom.tb_intimacy
列 | 备注 | 类型 |
id | 自增主键 | int(11) NOT NULL |
intimacy_rank | 好友亲密度排名 | int(11) NOT NULL |
id_contact1 | 联系人1,当前所查询人 | int(11) NOT NULL |
id_contact2 | 联系人2,与联系人为好友 | int(11) NOT NULL |
call_count | 两联系人通话次数 | int(11) NOT NULL DEFAULT 0 |
call_duration_count | 两联系人通话持续时间 | int(11) NOT NULL DEFAULT 0 |
三、环境准备
1) 新建module:ct_analysis
pom文件配置:
<dependencies>
<dependency>
<groupId>org.example</groupId>
<artifactId>ct-common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>3.1.0</version>
</dependency>
</dependencies>
2) 创建包结构,根包:com.atguigu(不同颜色代表不同层级的递进)
表11
analysis | constants | utils |
analysis.converter | ||
analysis.format | ||
analysis.kv | ||
analysis.mapper | ||
analysis.reducer | ||
analysis.runner | ||
analysis.converter.impl | ||
analysis.kv.base | ||
analysis.kv.impl |
3) 类表
表12
类名 | 备注 |
CountDurationMapper | 数据分析的Mapper类,继承自TableMapper |
CountDurationReducer | 数据分析的Reducer类,继承自Reduccer |
CountDurationRunner | 数据分析的驱动类,组装Job |
MySQLOutputFormat | 自定义Outputformat,对接Mysql |
BaseDimension | 维度(key)基类 |
BaseValue | 值(value)基类 |
ComDimension | 时间维度+联系人维度的组合维度 |
ContactDimension | 联系人维度 |
DateDimension | 时间维度 |
CountDurationValue | 通话次数与通话时长的封装 |
JDBCUtil | 连接Mysql的工具类 |
JDBCCacheBean | 单例JDBCConnection |
IConverter | 转化接口,用于根据传入的维度对象,得到该维度对象对应的数据库主键id |
DimensionConverter | IConverter实现类,负责实际的维度转id功能 |
LRUCache | 用于缓存已知的维度id,减少对mysql的操作次数,提高效率 |
Constants | 常量类 |
四、需求实现
ct.analysis
创建AnalysisData
public class AnalysisData {
public static void main(String[] args) throws Exception {
// int result = ToolRunner.run(new AnalysisTextTool(), args);
int result = ToolRunner.run(new AnalysisBeanTool(), args);
}
ct.analysis.io
创建MySQLBeanOutputFormat
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;
/**
* @Author
* @Date 2021/2/26 20:55
* @Version 1.0
* MySQL数据格式化输出对象
*/
public class MySQLBeanOutputFormat extends OutputFormat<AnalysisKey, AnalysisValue> {
protected static class MySQLRecordWriter extends RecordWriter<AnalysisKey, AnalysisValue> {
private Connection connection=null;
Map<String,Integer> userMap=new HashMap<>();
Map<String,Integer> dateMap=new HashMap<>();
public MySQLRecordWriter(){
//获取资源
connection= JDBCUtil.getConnection();
PreparedStatement pstat = null;
ResultSet rs =null;
try {
//读取用户,时间的数据
String queryUserSql="select id,tel from ct_user";
pstat=connection.prepareStatement(queryUserSql);
rs=pstat.executeQuery();
while (rs.next()){
Integer id = rs.getInt(1);
String tel=rs.getString(2);
userMap.put(tel,id);
}
rs.close();
String queryDateSql="select id,year,month,day from ct_date";
pstat=connection.prepareStatement(queryDateSql);
rs=pstat.executeQuery();
while (rs.next()){
Integer id = rs.getInt(1);
String year=rs.getString(2);
String month=rs.getString(3);
String day=rs.getString(4);
if (month.length()==1){
month="0"+month;
}
if (day.length()==1){
day="0"+day;
}
dateMap.put(year+month+day,id);
}
}catch (Exception e){
e.printStackTrace();
}finally {
if(pstat!=null){
try {
pstat.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
if (rs!=null){
try {
rs.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
//输出数据
@Override
public void write(AnalysisKey key, AnalysisValue value) throws IOException, InterruptedException {
PreparedStatement pstat=null;
try {
String insertSQL="insert into ct_call(telid,dateid,sumcall,sumduration) values(?,?,?,?)";
pstat=connection.prepareStatement(insertSQL);
pstat.setInt(1,userMap.get(key.getTel()));
pstat.setInt(2,dateMap.get(key.getDate()));
pstat.setInt(3,Integer.parseInt(value.getSumCall()));
pstat.setInt(4,Integer.parseInt(value.getSumDuration()));
pstat.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}finally {
if (pstat!=null){
try {
pstat.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
//释放资源
@Override
public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
if (connection!=null){
try {
connection.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
@Override
public RecordWriter<AnalysisKey, AnalysisValue> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
return new MySQLRecordWriter();
}
@Override
public void checkOutputSpecs(JobContext jobContext) throws IOException, InterruptedException {
}
private FileOutputCommitter committer = null;
public static Path getOutputPath(JobContext job) {
String name = job.getConfiguration().get("mapreduce.output.fileoutputformat.outputdir");
return name == null ? null : new Path(name);
}
@Override
public OutputCommitter getOutputCommitter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
if (this.committer == null) {
Path output = getOutputPath(taskAttemptContext);
this.committer = new FileOutputCommitter(output, taskAttemptContext);
}
return this.committer;
}
}
创建 MySQLRedisTextOutputFormat
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import redis.clients.jedis.Jedis;
import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
/**
* @Author
* @Date 2021/2/26 20:55
* @Version 1.0
* MySQL数据格式化输出对象
*/
public class MySQLRedisTextOutputFormat extends OutputFormat<Text,Text> {
protected static class MySQLRecordWriter extends RecordWriter<Text, Text> {
private Connection connection=null;
Jedis jedis=null;
public MySQLRecordWriter(){
//获取资源
connection= JDBCUtil.getConnection();
Jedis jedis = new Jedis("192.168.86.101", 6379);
jedis.auth("123456");
}
//输出数据
@Override
public void write(Text key, Text value) throws IOException, InterruptedException {
String[] values=value.toString().split("_");
String sumCall=values[0];
String sumDuration=values[1];
PreparedStatement pstat=null;
try {
String insertSQL="insert into ct_call(telid,dateid,sumcall,duration) values(?,?,?,?)";
pstat=connection.prepareStatement(insertSQL);
String k = key.toString();
String[] ks=k.split("_");
String tel=ks[0];
String date=ks[1];
pstat.setInt(1,Integer.parseInt(jedis.hget("ct_user",tel)));
pstat.setInt(2,Integer.parseInt(jedis.hget("ct_date",date)));
pstat.setInt(3,Integer.parseInt(sumCall));
pstat.setInt(4,Integer.parseInt(sumDuration));
pstat.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}finally {
if (pstat!=null){
try {
pstat.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
//释放资源
@Override
public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
if (connection!=null){
try {
connection.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
if (jedis!=null){
jedis.close();
}
}
}
@Override
public RecordWriter<Text, Text> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
return new MySQLRecordWriter();
}
@Override
public void checkOutputSpecs(JobContext jobContext) throws IOException, InterruptedException {
}
private FileOutputCommitter committer = null;
public static Path getOutputPath(JobContext job) {
String name = job.getConfiguration().get("mapreduce.output.fileoutputformat.outputdir");
return name == null ? null : new Path(name);
}
@Override
public OutputCommitter getOutputCommitter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
if (this.committer == null) {
Path output = getOutputPath(taskAttemptContext);
this.committer = new FileOutputCommitter(output, taskAttemptContext);
}
return this.committer;
}
}
创建 MySQLTextOutputFormat
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;
/**
* MySQL数据格式化输入对象
*/
public class MySQLTextOutputFormat extends OutputFormat<Text, Text> {
protected static class MySQLRecordWriter extends RecordWriter<Text, Text> {
private Connection connection = null;
Map<String, Integer> userMap = new HashMap<String, Integer>();
Map<String, Integer> dateMap = new HashMap<String, Integer>();
public MySQLRecordWriter() {
// 获取资源
connection = JDBCUtil.getConnection();
PreparedStatement pstat = null;
ResultSet rs = null;
try {
String queryUserSql = "select id, tel from ct_user";
pstat = connection.prepareStatement(queryUserSql);
rs = pstat.executeQuery();
while ( rs.next() ) {
Integer id = rs.getInt(1);
String tel = rs.getString(2);
userMap.put(tel, id);
}
rs.close();
String queryDateSql = "select id, year, month, day from ct_date";
pstat = connection.prepareStatement(queryDateSql);
rs = pstat.executeQuery();
while ( rs.next() ) {
Integer id = rs.getInt(1);
String year = rs.getString(2);
String month = rs.getString(3);
if ( month.length() == 1 ) {
month = "0" + month;
}
String day = rs.getString(4);
if ( day.length() == 1 ) {
day = "0" + day;
}
dateMap.put(year + month + day, id);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if ( rs != null ) {
try {
rs.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
if ( pstat != null ) {
try {
pstat.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
/**
* 输出数据
* @param key
* @param value
* @throws IOException
* @throws InterruptedException
*/
public void write(Text key, Text value) throws IOException, InterruptedException {
String[] values = value.toString().split("_");
String sumCall = values[0];
String sumDuration = values[1];
PreparedStatement pstat = null;
try {
String insertSQL = "insert into ct_call ( telid, dateid, sumcall, sumduration ) values ( ?, ?, ?, ? )";
pstat = connection.prepareStatement(insertSQL);
String k = key.toString();
String[] ks = k.split("_");
String tel = ks[0];
String date = ks[1];
pstat.setInt(1, userMap.get(tel));
pstat.setInt(2, dateMap.get(date));
pstat.setInt(3, Integer.parseInt(sumCall) );
pstat.setInt(4, Integer.parseInt(sumDuration));
pstat.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
} finally {
if ( pstat != null ) {
try {
pstat.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
/**
* 释放资源
* @param context
* @throws IOException
* @throws InterruptedException
*/
public void close(TaskAttemptContext context) throws IOException, InterruptedException {
if ( connection != null ) {
try {
connection.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
public RecordWriter<Text, Text> getRecordWriter(TaskAttemptContext context) throws IOException, InterruptedException {
return new MySQLRecordWriter();
}
public void checkOutputSpecs(JobContext context) throws IOException, InterruptedException {
}
private FileOutputCommitter committer = null;
public static Path getOutputPath(JobContext job) {
String name = job.getConfiguration().get(FileOutputFormat.OUTDIR);
return name == null ? null: new Path(name);
}
public OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException, InterruptedException {
if (committer == null) {
Path output = getOutputPath(context);
committer = new FileOutputCommitter(output, context);
}
return committer;
}
}
ct.analysis.kv
创建AnalysisKey
import org.apache.hadoop.io.WritableComparable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
/**
* 自定义分析数据Key
*/
public class AnalysisKey implements WritableComparable<AnalysisKey> {
public AnalysisKey() {
}
public AnalysisKey( String tel, String date ) {
this.tel = tel;
this.date = date;
}
private String tel;
private String date;
private String tel2;
public String getTel() {
return tel;
}
public void setTel(String tel) {
this.tel = tel;
}
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
/**
* 比较:tel, date
* @param key
* @return
*/
public int compareTo(AnalysisKey key) {
int result = tel.compareTo(key.getTel());
if ( result == 0 ) {
result = date.compareTo(key.getDate());
}
return result;
}
/**
* 写数据
* @param out
* @throws IOException
*/
public void write(DataOutput out) throws IOException {
out.writeUTF(tel);
out.writeUTF(date);
}
/**
* 读数据
* @param in
* @throws IOException
*/
public void readFields(DataInput in) throws IOException {
tel = in.readUTF();
date = in.readUTF();
}
}
创建 AnalysisValue
import org.apache.hadoop.io.Writable;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
public class AnalysisValue implements Writable {
private String sumCall;
private String sumDuration;
public String getSumCall() {
return sumCall;
}
public void setSumCall(String sumCall) {
this.sumCall = sumCall;
}
public String getSumDuration() {
return sumDuration;
}
public void setSumDuration(String sumDuration) {
this.sumDuration = sumDuration;
}
public AnalysisValue(){}
public AnalysisValue(String sumCall, String sumDuration){
this.sumCall=sumCall;
this.sumDuration=sumDuration;
}
//序列化
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeUTF(sumCall);
dataOutput.writeUTF(sumDuration);
}
//反序列化
@Override
public void readFields(DataInput dataInput) throws IOException {
sumCall=dataInput.readUTF();
sumDuration=dataInput.readUTF();
}
}
ct.analysis.mapper
创建AnalysisBeanMapper
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import java.io.IOException;
public class AnalysisBeanMapper extends TableMapper<AnalysisKey, Text> {
@Override
protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
// 5_19154926260_20180802160747_16574556259_0054_1
String rowkey= Bytes.toString(key.get());
String[] values=rowkey.split("_");
String call1=values[1];
String call2=values[3];
String callTime=values[2];
String duration=values[4];
String year=callTime.substring(0,4);
String month=callTime.substring(0,6);
String day=callTime.substring(0,8);
//主叫用户-年
context.write(new AnalysisKey(call1,year),new Text(duration));
//主叫用户-月
context.write(new AnalysisKey(call1,month),new Text(duration));
//主叫用户-日
context.write(new AnalysisKey(call1,day),new Text(duration));
//被叫用户-年
context.write(new AnalysisKey(call2,year),new Text(duration));
//被叫用户-月
context.write(new AnalysisKey(call2,month),new Text(duration));
//被叫用户-日
context.write(new AnalysisKey(call2,day),new Text(duration));
}
}
创建 AnalysisTextMapper
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
/**
* @program: IntelliJ IDEA
* @description: 分析数据Mapper
* @author:
* @create: 2022-10-23 10:36
*/
public class AnalysisTextMapper extends TableMapper<Text, Text> {
@Override
protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
String rowkey = Bytes.toString(key.get());
// 5_19154926260_20180802160747_16574556259_0054_1
String[] values = rowkey.split("_");
String call1 = values[1];
String call2 = values[3];
String calltime = values[2];
String duration = values[4];
String year = calltime.substring(0, 4);
String month = calltime.substring(0, 6);
String date = calltime.substring(0, 8);
// 主叫用户 - 年
context.write(new Text(call1 + "_" + year), new Text(duration));
// 主叫用户 - 月
context.write(new Text(call1 + "_" + month), new Text(duration));
// 主叫用户 - 日
context.write(new Text(call1 + "_" + date), new Text(duration));
// 被叫用户 - 年
context.write(new Text(call2 + "_" + year), new Text(duration));
// 被叫用户 - 月
context.write(new Text(call2 + "_" + month), new Text(duration));
// 被叫用户 - 日
context.write(new Text(call2 + "_" + date), new Text(duration));
}
}
ct.analysis.reducer
创建AnalysisBeanReducer
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/**
* @Author
* @Date 2021/2/26 19:18
* @Version 1.0
* 分析数据的reducer
*/
public class AnalysisBeanReducer extends Reducer<AnalysisKey, Text, AnalysisKey, AnalysisValue> {
@Override
protected void reduce(AnalysisKey key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
int sumCall = 0;
int sumDuration = 0;
for (Text value : values) {
int duration = Integer.parseInt(value.toString());
sumDuration = sumDuration + duration;
sumCall++;
}
context.write(key, new AnalysisValue(""+sumCall,""+sumDuration));
}
}
创建 AnalysisTextReducer
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/**
* 分析数据Reducer
*/
public class AnalysisTextReducer extends Reducer<Text, Text, Text, Text >{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
int call_sum = 0;
int call_duration_sum = 0;
for (Text value : values) {
int duration = Integer.parseInt(value.toString());
call_duration_sum = call_duration_sum + duration;
call_sum++;
}
context.write(key, new Text(call_sum + "_" + call_duration_sum));
}
}
ct.analysis.tool
创建AnalysisBeanTool
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.util.Tool;
/**
* @Author
* @Date 2021/2/26 16:51
* @Version 1.0
* 分析数据的工具类
*/
public class AnalysisBeanTool implements Tool {
@Override
public int run(String[] strings) throws Exception {
Job job = Job.getInstance();
job.setJarByClass(AnalysisBeanTool.class);
Scan scan =new Scan();
//只要一半的数据(主叫),避免重复
scan.addFamily(Bytes.toBytes(Names.CF_CALLER.getValue()));
//mapper
TableMapReduceUtil.initTableMapperJob(
Names.TABLE.getValue(),
scan,
AnalysisBeanMapper.class,
AnalysisKey.class,
Text.class,
job
);
//reducer
job.setReducerClass(AnalysisBeanReducer.class);
job.setOutputKeyClass(AnalysisKey.class);
job.setOutputValueClass(AnalysisValue.class);
//outputformat
job.setOutputFormatClass(MySQLBeanOutputFormat.class);
boolean flg = job.waitForCompletion(true);
if (flg){
return JobStatus.State.SUCCEEDED.getValue();
}else {
return JobStatus.State.FAILED.getValue();
}
}
@Override
public void setConf(Configuration configuration) {
}
@Override
public Configuration getConf() {
return null;
}
}
创建 AnalysisTextTool
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.util.Tool;
/**
* @Author
* @Date 2021/2/26 16:51
* @Version 1.0
* 分析数据的工具类
*/
public class AnalysisTextTool implements Tool {
@Override
public int run(String[] strings) throws Exception {
Job job = Job.getInstance();
job.setJarByClass(AnalysisTextTool.class);
Scan scan =new Scan();
//只要一半的数据(主叫),避免重复
scan.addFamily(Bytes.toBytes(Names.CF_CALLER.getValue()));
//mapper
TableMapReduceUtil.initTableMapperJob(
Names.TABLE.getValue(),
scan,
AnalysisBeanMapper.class,
Text.class,
Text.class,
job
);
//reducer
job.setReducerClass(AnalysisBeanReducer.class);
job.setOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
//outputformat
job.setOutputFormatClass(MySQLBeanOutputFormat.class);
boolean flg = job.waitForCompletion(true);
if (flg){
return JobStatus.State.SUCCEEDED.getValue();
}else {
return JobStatus.State.FAILED.getValue();
}
// return 0;
}
@Override
public void setConf(Configuration configuration) {
}
@Override
public Configuration getConf() {
return null;
}
}
在 ct.common.util
修改JDBCUtil
import java.sql.Connection;
import java.sql.DriverManager;
public class JDBCUtil {
private static final String MYSQL_DRIVER_CLASS = "com.mysql.jdbc.Driver";
private static final String MYSQL_URL = "jdbc:mysql://hadoop01:3306/ct221023?useUnicode=true&characterEncoding=UTF-8";
private static final String MYSQL_USERNAME = "root";
private static final String MYSQL_PASSWORD = "123456";
public static Connection getConnection(){
Connection conn=null;
try{
Class.forName(MYSQL_DRIVER_CLASS);
conn= DriverManager.getConnection(MYSQL_URL,MYSQL_USERNAME,MYSQL_PASSWORD);
}catch (Exception e){
e.printStackTrace();
}
return conn;
}
}
创建项目ct.cache
pom.xml
<dependencies>
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.46</version>
</dependency>
<dependency>
<groupId>org.example</groupId>
<artifactId>ct-common</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
</dependencies>
创建Bootstrap
package ct.cache;
import com.sun.tools.doclets.formats.html.SourceToHTMLConverter;
import redis.clients.jedis.Jedis;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
/**
* 启动缓存客户端,向redis中增加缓存数据
*/
public class Bootstrap {
public static void main(String[] args) {
// 读取Mysql中的数据
Map<String, Integer> userMap = new HashMap<String, Integer>();
Map<String, Integer> dateMap = new HashMap<String, Integer>();
// 读取用户,时间数据
Connection connection = null;
PreparedStatement pstat = null;
ResultSet rs = null;
try {
connection = JDBCUtil.getConnection();
String queryUserSql = "select id, tel from ct_user";
pstat = connection.prepareStatement(queryUserSql);
rs = pstat.executeQuery();
while ( rs.next() ) {
Integer id = rs.getInt(1);
String tel = rs.getString(2);
userMap.put(tel, id);
}
rs.close();
String queryDateSql = "select id, year, month, day from ct_date";
pstat = connection.prepareStatement(queryDateSql);
rs = pstat.executeQuery();
while ( rs.next() ) {
Integer id = rs.getInt(1);
String year = rs.getString(2);
String month = rs.getString(3);
if ( month.length() == 1 ) {
month = "0" + month;
}
String day = rs.getString(4);
if ( day.length() == 1 ) {
day = "0" + day;
}
dateMap.put(year + month + day, id);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
if ( rs != null ) {
try {
rs.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
if ( pstat != null ) {
try {
pstat.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
if ( connection != null ) {
try {
connection.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
// System.out.println(userMap.size());
// System.out.println(dateMap.size());
// 向redis中存储数据
Jedis jedis = new Jedis("192.168.86.101", 6379);
jedis.auth("123456");
Iterator<String> keyIterator = userMap.keySet().iterator();
while ( keyIterator.hasNext() ) {
String key = keyIterator.next();
Integer value = userMap.get(key);
jedis.hset("ct_user", key, "" + value);
}
keyIterator = dateMap.keySet().iterator();
while ( keyIterator.hasNext() ) {
String key = keyIterator.next();
Integer value = dateMap.get(key);
jedis.hset("ct_date", key, "" + value);
}
}
}
五、运行测试
1) 将mysql驱动包放入到hadoop根目录的lib目录下
|
2) 提交任务
|
观察Mysql中的结果。