目录

一、需求分析

二、Mysql表结构设计

三、环境准备

四、需求实现

ct.analysis

ct.analysis.io

 ct.analysis.kv

 ct.analysis.mapper

ct.analysis.reducer

ct.analysis.tool

在 ct.common.util

 创建项目ct.cache

五、运行测试


我们的数据已经完整的采集到了HBase集群中,这次我们需要对采集到的数据进行分析,统计出我们想要的结果。注意,在分析的过程中,我们不一定会采取一个业务指标对应一个mapreduce-job的方式,如果情景允许,我们会采取一个mapreduce分析多个业务指标的方式来进行任务。具体何时采用哪种方式,我们后续会详细探讨。

分析模块流程如图3所示:

客服数据分析怎么做 客服部数据分析_大数据

业务指标:

a) 用户每天主叫通话个数统计,通话时间统计。

b) 用户每月通话记录统计,通话时间统计。

c) 用户之间亲密关系统计。(通话次数与通话时间体现用户亲密关系)

一、需求分析

根据需求目标,设计出下述(3.2.2)表结构。我们需要按照时间范围(年月日),结合MapReduce统计出所属时间范围内所有手机号码的通话次数总和以及通话时长总和。

思路:

a) 维度,即某个角度,某个视角,按照时间维度来统计通话,比如我想统计2017年所有月份所有日子的通话记录,那这个维度我们大概可以表述为2017年*月*日

b) 通过Mapper将数据按照不同维度聚合给Reducer

c) 通过Reducer拿到按照各个维度聚合过来的数据,进行汇总,输出

d) 根据业务需求,将Reducer的输出通过Outputformat把数据

数据输入:HBase

数据输出:Mysql

HBase中数据源结构:

标签

举例&说明

rowkey

hashregion_call1_datetime_call2_flag_duration

01_15837312345_20170527081033_13766889900_1_0180

family

f1列族:存放主叫信息

f2列族:存放被叫信息

call1

第一个手机号码

call2

第二个手机号码

date_time

通话建立的时间,例如:20171017081520

date_time_ts

date_time对应的时间戳形式

duration

通话时长(单位:秒)

flag

标记call1是主叫还是被叫(call1的身份与call2的身份互斥)

a) 已知目标,那么需要结合目标思考已有数据是否能够支撑目标实现;

b) 根据目标数据结构,构建Mysql表结构,建表;

c) 思考代码需要涉及到哪些功能模块,建立不同功能模块对应的包结构。

d) 描述数据,一定是基于某个维度(视角)的,所以构建维度类。比如按照“年”与“手机号码”的组合作为key聚合所有的数据,便可以统计这个手机号码,这一年的相关结果。

e) 自定义OutputFormat用于对接Mysql,使数据输出。

f) 创建相关工具类。

二、Mysql表结构设计

我们将分析的结果数据保存到Mysql中,以方便Web端进行查询展示。

1) 表7:db_telecom.tb_contacts

用于存放用户手机号码与联系人姓名。

表7 db_telecom.tb_contacts

备注

类型

id

自增主键

int(11) NOT NULL

telephone

手机号码

varchar(255) NOT NULL

name

联系人姓名

varchar(255) NOT NULL

2) 表8:db_telecom.tb_call

用于存放某个时间维度下通话次数与通话时长的总和。

表8 db_telecom.tb_call

备注

类型

id_date_contact

复合主键(联系人维度id,时间维度id)

varchar(255) NOT NULL

id_date_dimension

时间维度id

int(11) NOT NULL

id_contact

查询人的电话号码

int(11) NOT NULL

call_sum

通话次数总和

int(11) NOT NULL DEFAULT 0

call_duration_sum

通话时长总和

int(11) NOT NULL DEFAULT 0

3) 表9:db_telecom.tb_dimension_date

用于存放时间维度的相关数据

表9 db_telecom.tb_dimension_date

备注

类型

id

自增主键

int(11) NOT NULL

year

年,当前通话信息所在年

int(11) NOT NULL

month

月,当前通话信息所在月,如果按照年来统计信息,则month为-1。

int(11) NOT NULL

day

日,当前通话信息所在日,如果是按照月来统计信息,则day为-1。

int(11) NOT NULL

4) 表10:db_telecom.tb_intimacy

用于存放所有用户用户关系的结果数据。(作业中使用)

表10 db_telecom.tb_intimacy

备注

类型

id

自增主键

int(11) NOT NULL

intimacy_rank

好友亲密度排名

int(11) NOT NULL

id_contact1

联系人1,当前所查询人

int(11) NOT NULL

id_contact2

联系人2,与联系人为好友

int(11) NOT NULL

call_count

两联系人通话次数

int(11) NOT NULL DEFAULT 0

call_duration_count

两联系人通话持续时间

int(11) NOT NULL DEFAULT 0

三、环境准备

1) 新建module:ct_analysis

pom文件配置:

<dependencies>
        <dependency>
            <groupId>org.example</groupId>
            <artifactId>ct-common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>3.1.0</version>
        </dependency>

    </dependencies>

2) 创建包结构,根包:com.atguigu(不同颜色代表不同层级的递进)

表11

analysis

constants

utils

analysis.converter



analysis.format

analysis.kv

analysis.mapper

analysis.reducer

analysis.runner

analysis.converter.impl



analysis.kv.base

analysis.kv.impl

3) 类表

表12

类名

备注

CountDurationMapper

数据分析的Mapper类,继承自TableMapper

CountDurationReducer

数据分析的Reducer类,继承自Reduccer

CountDurationRunner

数据分析的驱动类,组装Job

MySQLOutputFormat

自定义Outputformat,对接Mysql

BaseDimension

维度(key)基类

BaseValue

值(value)基类

ComDimension

时间维度+联系人维度的组合维度

ContactDimension

联系人维度

DateDimension

时间维度

CountDurationValue

通话次数与通话时长的封装

JDBCUtil

连接Mysql的工具类

JDBCCacheBean

单例JDBCConnection

IConverter

转化接口,用于根据传入的维度对象,得到该维度对象对应的数据库主键id

DimensionConverter

IConverter实现类,负责实际的维度转id功能

LRUCache

用于缓存已知的维度id,减少对mysql的操作次数,提高效率

Constants

常量类

四、需求实现

ct.analysis

创建AnalysisData

public class AnalysisData {
    public static void main(String[] args) throws Exception {
//        int result = ToolRunner.run(new AnalysisTextTool(), args);
        int result = ToolRunner.run(new AnalysisBeanTool(), args);

    }

ct.analysis.io

创建MySQLBeanOutputFormat

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;

import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;

/**
 * @Author 
 * @Date 2021/2/26 20:55
 * @Version 1.0
 * MySQL数据格式化输出对象
 */
public class MySQLBeanOutputFormat extends OutputFormat<AnalysisKey, AnalysisValue> {

    protected static class MySQLRecordWriter extends RecordWriter<AnalysisKey, AnalysisValue> {

        private Connection connection=null;
        Map<String,Integer> userMap=new HashMap<>();
        Map<String,Integer> dateMap=new HashMap<>();

        public MySQLRecordWriter(){
            //获取资源
            connection= JDBCUtil.getConnection();
            PreparedStatement pstat = null;
            ResultSet rs =null;
            try {
                //读取用户,时间的数据
                String queryUserSql="select id,tel from ct_user";
                pstat=connection.prepareStatement(queryUserSql);
                rs=pstat.executeQuery();
                while (rs.next()){
                    Integer id = rs.getInt(1);
                    String tel=rs.getString(2);
                    userMap.put(tel,id);
                }
                rs.close();

                String queryDateSql="select id,year,month,day from ct_date";
                pstat=connection.prepareStatement(queryDateSql);
                rs=pstat.executeQuery();
                while (rs.next()){
                    Integer id = rs.getInt(1);
                    String year=rs.getString(2);
                    String month=rs.getString(3);
                    String day=rs.getString(4);
                    if (month.length()==1){
                        month="0"+month;
                    }
                    if (day.length()==1){
                        day="0"+day;
                    }
                    dateMap.put(year+month+day,id);
                }
            }catch (Exception e){
                e.printStackTrace();
            }finally {
                if(pstat!=null){
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
                if (rs!=null){
                    try {
                        rs.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
            }



        }

        //输出数据
        @Override
        public void write(AnalysisKey key, AnalysisValue value) throws IOException, InterruptedException {

            PreparedStatement pstat=null;
            try {
                String insertSQL="insert into ct_call(telid,dateid,sumcall,sumduration) values(?,?,?,?)";
                pstat=connection.prepareStatement(insertSQL);

                pstat.setInt(1,userMap.get(key.getTel()));
                pstat.setInt(2,dateMap.get(key.getDate()));
                pstat.setInt(3,Integer.parseInt(value.getSumCall()));
                pstat.setInt(4,Integer.parseInt(value.getSumDuration()));
                pstat.executeUpdate();
            } catch (SQLException e) {
                e.printStackTrace();
            }finally {

                if (pstat!=null){
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }


            }
        }

        //释放资源
        @Override
        public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
            if (connection!=null){
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }

        }

    }

    @Override
    public RecordWriter<AnalysisKey, AnalysisValue> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        return new MySQLRecordWriter();
    }

    @Override
    public void checkOutputSpecs(JobContext jobContext) throws IOException, InterruptedException {

    }


    private FileOutputCommitter committer = null;

    public static Path getOutputPath(JobContext job) {
        String name = job.getConfiguration().get("mapreduce.output.fileoutputformat.outputdir");
        return name == null ? null : new Path(name);
    }

    @Override
    public OutputCommitter getOutputCommitter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        if (this.committer == null) {
            Path output = getOutputPath(taskAttemptContext);
            this.committer = new FileOutputCommitter(output, taskAttemptContext);
        }

        return this.committer;
    }
}

创建 MySQLRedisTextOutputFormat

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import redis.clients.jedis.Jedis;

import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;

/**
 * @Author 
 * @Date 2021/2/26 20:55
 * @Version 1.0
 * MySQL数据格式化输出对象
 */
public class MySQLRedisTextOutputFormat extends OutputFormat<Text,Text> {

    protected static class MySQLRecordWriter extends RecordWriter<Text, Text> {

        private Connection connection=null;
        Jedis jedis=null;

        public MySQLRecordWriter(){
            //获取资源
            connection= JDBCUtil.getConnection();
            Jedis jedis = new Jedis("192.168.86.101", 6379);
            jedis.auth("123456");
        }

        //输出数据
        @Override
        public void write(Text key, Text value) throws IOException, InterruptedException {

            String[] values=value.toString().split("_");
            String sumCall=values[0];
            String sumDuration=values[1];
            PreparedStatement pstat=null;
            try {
                String insertSQL="insert into ct_call(telid,dateid,sumcall,duration) values(?,?,?,?)";
                pstat=connection.prepareStatement(insertSQL);
                String k = key.toString();
                String[] ks=k.split("_");

                String tel=ks[0];
                String date=ks[1];

                pstat.setInt(1,Integer.parseInt(jedis.hget("ct_user",tel)));
                pstat.setInt(2,Integer.parseInt(jedis.hget("ct_date",date)));
                pstat.setInt(3,Integer.parseInt(sumCall));
                pstat.setInt(4,Integer.parseInt(sumDuration));
                pstat.executeUpdate();
            } catch (SQLException e) {
                e.printStackTrace();
            }finally {
                if (pstat!=null){
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }


            }
        }

        //释放资源
        @Override
        public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
            if (connection!=null){
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if (jedis!=null){
                jedis.close();
            }

        }

    }

    @Override
    public RecordWriter<Text, Text> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        return new MySQLRecordWriter();
    }

    @Override
    public void checkOutputSpecs(JobContext jobContext) throws IOException, InterruptedException {

    }


    private FileOutputCommitter committer = null;

    public static Path getOutputPath(JobContext job) {
        String name = job.getConfiguration().get("mapreduce.output.fileoutputformat.outputdir");
        return name == null ? null : new Path(name);
    }

    @Override
    public OutputCommitter getOutputCommitter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        if (this.committer == null) {
            Path output = getOutputPath(taskAttemptContext);
            this.committer = new FileOutputCommitter(output, taskAttemptContext);
        }

        return this.committer;
    }
}

创建 MySQLTextOutputFormat

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;

/**
 * MySQL数据格式化输入对象
 */
public class MySQLTextOutputFormat extends OutputFormat<Text, Text> {

    protected static class MySQLRecordWriter extends RecordWriter<Text, Text> {

        private Connection connection = null;
        Map<String, Integer> userMap = new HashMap<String, Integer>();
        Map<String, Integer> dateMap = new HashMap<String, Integer>();

        public MySQLRecordWriter() {
            // 获取资源
            connection = JDBCUtil.getConnection();
            PreparedStatement pstat = null;
            ResultSet rs = null;

            try {

                String queryUserSql = "select id, tel from ct_user";
                pstat = connection.prepareStatement(queryUserSql);
                rs = pstat.executeQuery();
                while ( rs.next() ) {
                    Integer id = rs.getInt(1);
                    String tel = rs.getString(2);
                    userMap.put(tel, id);
                }

                rs.close();

                String queryDateSql = "select id, year, month, day from ct_date";
                pstat = connection.prepareStatement(queryDateSql);
                rs = pstat.executeQuery();
                while ( rs.next() ) {
                    Integer id = rs.getInt(1);
                    String year = rs.getString(2);
                    String month = rs.getString(3);
                    if ( month.length() == 1 ) {
                        month = "0" + month;
                    }
                    String day = rs.getString(4);
                    if ( day.length() == 1 ) {
                        day = "0" + day;
                    }
                    dateMap.put(year + month + day, id);
                }

            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if ( rs != null ) {
                    try {
                        rs.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
                if ( pstat != null ) {
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
            }
        }

        /**
         * 输出数据
         * @param key
         * @param value
         * @throws IOException
         * @throws InterruptedException
         */
        public void write(Text key, Text value) throws IOException, InterruptedException {

            String[] values = value.toString().split("_");
            String sumCall = values[0];
            String sumDuration = values[1];

            PreparedStatement pstat = null;
            try {
                String insertSQL = "insert into ct_call ( telid, dateid, sumcall, sumduration ) values ( ?, ?, ?, ? )";
                pstat = connection.prepareStatement(insertSQL);

                String k = key.toString();
                String[] ks = k.split("_");

                String tel = ks[0];
                String date = ks[1];

                pstat.setInt(1, userMap.get(tel));
                pstat.setInt(2, dateMap.get(date));
                pstat.setInt(3, Integer.parseInt(sumCall) );
                pstat.setInt(4, Integer.parseInt(sumDuration));
                pstat.executeUpdate();
            } catch (SQLException e) {
                e.printStackTrace();
            } finally {
                if ( pstat != null ) {
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
            }
        }

        /**
         * 释放资源
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        public void close(TaskAttemptContext context) throws IOException, InterruptedException {
            if ( connection != null ) {
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    public RecordWriter<Text, Text> getRecordWriter(TaskAttemptContext context) throws IOException, InterruptedException {
        return new MySQLRecordWriter();
    }

    public void checkOutputSpecs(JobContext context) throws IOException, InterruptedException {

    }
    private FileOutputCommitter committer = null;
    public static Path getOutputPath(JobContext job) {
        String name = job.getConfiguration().get(FileOutputFormat.OUTDIR);
        return name == null ? null: new Path(name);
    }
    public OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException, InterruptedException {
        if (committer == null) {
            Path output = getOutputPath(context);
            committer = new FileOutputCommitter(output, context);
        }
        return committer;
    }
}

 ct.analysis.kv

创建AnalysisKey

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * 自定义分析数据Key
 */
public class AnalysisKey implements WritableComparable<AnalysisKey> {

    public AnalysisKey() {

    }

    public AnalysisKey( String tel, String date ) {
        this.tel = tel;
        this.date = date;
    }

    private String tel;
    private String date;
    private String tel2;

    public String getTel() {
        return tel;
    }

    public void setTel(String tel) {
        this.tel = tel;
    }

    public String getDate() {
        return date;
    }

    public void setDate(String date) {
        this.date = date;
    }

    /**
     * 比较:tel, date
     * @param key
     * @return
     */
    public int compareTo(AnalysisKey key) {
        int result = tel.compareTo(key.getTel());

        if ( result == 0 ) {
            result = date.compareTo(key.getDate());
        }

        return result;
    }

    /**
     * 写数据
     * @param out
     * @throws IOException
     */
    public void write(DataOutput out) throws IOException {
        out.writeUTF(tel);
        out.writeUTF(date);
    }

    /**
     * 读数据
     * @param in
     * @throws IOException
     */
    public void readFields(DataInput in) throws IOException {
        tel = in.readUTF();
        date = in.readUTF();
    }
}

创建 AnalysisValue

import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class AnalysisValue implements Writable {

    private String sumCall;
    private String sumDuration;

    public String getSumCall() {
        return sumCall;
    }

    public void setSumCall(String sumCall) {
        this.sumCall = sumCall;
    }

    public String getSumDuration() {
        return sumDuration;
    }

    public void setSumDuration(String sumDuration) {
        this.sumDuration = sumDuration;
    }

    public AnalysisValue(){}

    public AnalysisValue(String sumCall, String sumDuration){
        this.sumCall=sumCall;
        this.sumDuration=sumDuration;
    }

    //序列化
    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeUTF(sumCall);
        dataOutput.writeUTF(sumDuration);
    }

    //反序列化
    @Override
    public void readFields(DataInput dataInput) throws IOException {
        sumCall=dataInput.readUTF();
        sumDuration=dataInput.readUTF();
    }
}

 ct.analysis.mapper

创建AnalysisBeanMapper

import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;

import java.io.IOException;

public class AnalysisBeanMapper extends TableMapper<AnalysisKey, Text> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {

        // 5_19154926260_20180802160747_16574556259_0054_1
        String rowkey= Bytes.toString(key.get());
        String[] values=rowkey.split("_");

        String call1=values[1];
        String call2=values[3];
        String callTime=values[2];
        String duration=values[4];

        String year=callTime.substring(0,4);
        String month=callTime.substring(0,6);
        String day=callTime.substring(0,8);

        //主叫用户-年
        context.write(new AnalysisKey(call1,year),new Text(duration));
        //主叫用户-月
        context.write(new AnalysisKey(call1,month),new Text(duration));
        //主叫用户-日
        context.write(new AnalysisKey(call1,day),new Text(duration));

        //被叫用户-年
        context.write(new AnalysisKey(call2,year),new Text(duration));
        //被叫用户-月
        context.write(new AnalysisKey(call2,month),new Text(duration));
        //被叫用户-日
        context.write(new AnalysisKey(call2,day),new Text(duration));

    }
}

创建 AnalysisTextMapper

import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
/**
 * @program: IntelliJ IDEA
 * @description: 分析数据Mapper
 * @author:
 * @create: 2022-10-23 10:36
 */
public class AnalysisTextMapper extends TableMapper<Text, Text> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {

        String rowkey = Bytes.toString(key.get());
        // 5_19154926260_20180802160747_16574556259_0054_1

        String[] values = rowkey.split("_");

        String call1 = values[1];
        String call2 = values[3];
        String calltime = values[2];
        String duration = values[4];

        String year = calltime.substring(0, 4);
        String month = calltime.substring(0, 6);
        String date = calltime.substring(0, 8);

        // 主叫用户 - 年
        context.write(new Text(call1 + "_" + year), new Text(duration));
        // 主叫用户 - 月
        context.write(new Text(call1 + "_" + month), new Text(duration));
        // 主叫用户 - 日
        context.write(new Text(call1 + "_" + date), new Text(duration));

        // 被叫用户 - 年
        context.write(new Text(call2 + "_" + year), new Text(duration));
        // 被叫用户 - 月
        context.write(new Text(call2 + "_" + month), new Text(duration));
        // 被叫用户 - 日
        context.write(new Text(call2 + "_" + date), new Text(duration));
    }
}

ct.analysis.reducer

 创建AnalysisBeanReducer

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * @Author
 * @Date 2021/2/26 19:18
 * @Version 1.0
 * 分析数据的reducer
 */
public class AnalysisBeanReducer extends Reducer<AnalysisKey, Text, AnalysisKey, AnalysisValue> {
    @Override
    protected void reduce(AnalysisKey key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

        int sumCall = 0;
        int sumDuration = 0;
        for (Text value : values) {
            int duration = Integer.parseInt(value.toString());
            sumDuration = sumDuration + duration;

            sumCall++;
        }

        context.write(key, new AnalysisValue(""+sumCall,""+sumDuration));
    }
}

创建 AnalysisTextReducer

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * 分析数据Reducer
 */
public class AnalysisTextReducer extends Reducer<Text, Text, Text, Text >{
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {


        int call_sum = 0;
        int call_duration_sum = 0;

        for (Text value : values) {
            int duration = Integer.parseInt(value.toString());
            call_duration_sum = call_duration_sum + duration;

            call_sum++;
        }

        context.write(key, new Text(call_sum + "_" + call_duration_sum));
    }
}

ct.analysis.tool

 创建AnalysisBeanTool

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.util.Tool;

/**
 * @Author 
 * @Date 2021/2/26 16:51
 * @Version 1.0
 * 分析数据的工具类
 */
public class AnalysisBeanTool implements Tool {
    @Override
    public int run(String[] strings) throws Exception {

        Job job = Job.getInstance();
        job.setJarByClass(AnalysisBeanTool.class);

        Scan scan =new Scan();
        //只要一半的数据(主叫),避免重复
        scan.addFamily(Bytes.toBytes(Names.CF_CALLER.getValue()));

        //mapper
        TableMapReduceUtil.initTableMapperJob(
                Names.TABLE.getValue(),
                scan,
                AnalysisBeanMapper.class,
                AnalysisKey.class,
                Text.class,
                job
        );

        //reducer
        job.setReducerClass(AnalysisBeanReducer.class);
        job.setOutputKeyClass(AnalysisKey.class);
        job.setOutputValueClass(AnalysisValue.class);

        //outputformat
        job.setOutputFormatClass(MySQLBeanOutputFormat.class);
        boolean flg = job.waitForCompletion(true);

        if (flg){
            return JobStatus.State.SUCCEEDED.getValue();
        }else {
            return JobStatus.State.FAILED.getValue();
        }

    }

    @Override
    public void setConf(Configuration configuration) {

    }

    @Override
    public Configuration getConf() {
        return null;
    }
}

创建 AnalysisTextTool

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.util.Tool;

/**
 * @Author 
 * @Date 2021/2/26 16:51
 * @Version 1.0
 * 分析数据的工具类
 */
public class AnalysisTextTool implements Tool {
    @Override
    public int run(String[] strings) throws Exception {

        Job job = Job.getInstance();
        job.setJarByClass(AnalysisTextTool.class);

        Scan scan =new Scan();
        //只要一半的数据(主叫),避免重复
        scan.addFamily(Bytes.toBytes(Names.CF_CALLER.getValue()));

        //mapper
        TableMapReduceUtil.initTableMapperJob(
                Names.TABLE.getValue(),
                scan,
                AnalysisBeanMapper.class,
                Text.class,
                Text.class,
                job
        );

        //reducer
        job.setReducerClass(AnalysisBeanReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        //outputformat
        job.setOutputFormatClass(MySQLBeanOutputFormat.class);
        boolean flg = job.waitForCompletion(true);

        if (flg){
            return JobStatus.State.SUCCEEDED.getValue();
        }else {
            return JobStatus.State.FAILED.getValue();
        }

//        return 0;
    }

    @Override
    public void setConf(Configuration configuration) {

    }

    @Override
    public Configuration getConf() {
        return null;
    }
}

在 ct.common.util

修改JDBCUtil

import java.sql.Connection;
import java.sql.DriverManager;

public class JDBCUtil {

    private static final String MYSQL_DRIVER_CLASS = "com.mysql.jdbc.Driver";
    private static final String MYSQL_URL = "jdbc:mysql://hadoop01:3306/ct221023?useUnicode=true&characterEncoding=UTF-8";
    private static final String MYSQL_USERNAME = "root";
    private static final String MYSQL_PASSWORD = "123456";

    public static Connection getConnection(){
        Connection conn=null;
        try{
            Class.forName(MYSQL_DRIVER_CLASS);
            conn= DriverManager.getConnection(MYSQL_URL,MYSQL_USERNAME,MYSQL_PASSWORD);
        }catch (Exception e){
            e.printStackTrace();
        }
        return conn;

    }
}

 创建项目ct.cache

pom.xml

<dependencies>
        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>3.1.0</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.46</version>
        </dependency>
        <dependency>
            <groupId>org.example</groupId>
            <artifactId>ct-common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>
    </dependencies>

创建Bootstrap

package ct.cache;


import com.sun.tools.doclets.formats.html.SourceToHTMLConverter;
import redis.clients.jedis.Jedis;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

/**
 * 启动缓存客户端,向redis中增加缓存数据
 */
public class Bootstrap {
    public static void main(String[] args) {

        // 读取Mysql中的数据
        Map<String, Integer> userMap = new HashMap<String, Integer>();
        Map<String, Integer> dateMap = new HashMap<String, Integer>();

        // 读取用户,时间数据

        Connection connection = null;
        PreparedStatement pstat = null;
        ResultSet rs = null;

        try {

            connection = JDBCUtil.getConnection();

            String queryUserSql = "select id, tel from ct_user";
            pstat = connection.prepareStatement(queryUserSql);
            rs = pstat.executeQuery();
            while ( rs.next() ) {
                Integer id = rs.getInt(1);
                String tel = rs.getString(2);
                userMap.put(tel, id);
            }

            rs.close();

            String queryDateSql = "select id, year, month, day from ct_date";
            pstat = connection.prepareStatement(queryDateSql);
            rs = pstat.executeQuery();
            while ( rs.next() ) {
                Integer id = rs.getInt(1);
                String year = rs.getString(2);
                String month = rs.getString(3);
                if ( month.length() == 1 ) {
                    month = "0" + month;
                }
                String day = rs.getString(4);
                if ( day.length() == 1 ) {
                    day = "0" + day;
                }
                dateMap.put(year + month + day, id);
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if ( rs != null ) {
                try {
                    rs.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if ( pstat != null ) {
                try {
                    pstat.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if ( connection != null ) {
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
        }

//        System.out.println(userMap.size());
//        System.out.println(dateMap.size());

        // 向redis中存储数据
        Jedis jedis = new Jedis("192.168.86.101", 6379);
        jedis.auth("123456");

        Iterator<String> keyIterator = userMap.keySet().iterator();
        while ( keyIterator.hasNext() ) {
            String key = keyIterator.next();
            Integer value = userMap.get(key);
            jedis.hset("ct_user", key, "" + value);
        }

        keyIterator = dateMap.keySet().iterator();
        while ( keyIterator.hasNext() ) {
            String key = keyIterator.next();
            Integer value = dateMap.get(key);
            jedis.hset("ct_date", key, "" + value);
        }

    }
}

 

五、运行测试

1) 将mysql驱动包放入到hadoop根目录的lib目录下

$ cp -a /opt/module/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar ./lib

2) 提交任务

$ bin/yarn jar ~/call/jars/ct_analysis-1.0-SNAPSHOT.jar
 com.atguigu.analysis.runner.CountDurationRunner -libjars ./lib/*

观察Mysql中的结果。