一、背景

      目前实时抽数系统不太稳定,数据库、kafka、网络等问题,导致实时抽取的数据存在误差。

二、方案

      方案一:监控kafka采集的数量与spark streaming从kafka消费的数量(不能保证源库和目标库的数量一致);

      方案二:监控源库当天截止到某时的数量与目标库当日分区截止某时的数量;

      方案三:监控源库截止到某时的数量与目标库增量合并后截止到某时的数量(数据量会比较大);

三、设计

采用方案二

1、创建两张表:数据库配置、表配置

  • 数据库配置:主要包括源库ip、端口、库名、用户名、密码(加密),通常配从库的;
  • 表配置:主要包括源表名、目标表名、数据库配置id、主键、增量时间、是否开启监控;

2、开发加解密工具类--使用AES算法;

3、开发定时任务

  • jdbc连接配置库查询需要监控的表;
  • 线程池开启多线程处理业务逻辑;
  • 查询源库数量;
  • 查询目标库hive数量;
  • 比较两者,若不同调告警接口;

四、实现

AES加解密工具类:

package com.aaaaa.realtimeappmonitor.jobstatetrack.util;

import sun.misc.BASE64Decoder;
import sun.misc.BASE64Encoder;

import javax.crypto.Cipher;
import javax.crypto.KeyGenerator;
import java.security.Key;
import java.security.SecureRandom;

/**
 * Description AES加密解密工具类
 *
 * @author Bob
 * @date 2020/4/26
 **/
public class AESUtil {

    public static String CIPHER_ALGORITHM = "AES"; // optional value AES/DES/DESede

    public static Key getKey(String strKey) {
        try {
            if (strKey == null) {
                strKey = "";
            }
            KeyGenerator _generator = KeyGenerator.getInstance("AES");
            SecureRandom secureRandom = SecureRandom.getInstance("SHA1PRNG");
            secureRandom.setSeed(strKey.getBytes());
            _generator.init(128, secureRandom);
            return _generator.generateKey();
        } catch (Exception e) {
            throw new RuntimeException(" 初始化密钥出现异常 ");
        }
    }


    public static String encrypt(String data, String key) throws Exception {
        SecureRandom sr = new SecureRandom();
        Key secureKey = getKey(key);
        Cipher cipher = Cipher.getInstance(CIPHER_ALGORITHM);
        cipher.init(Cipher.ENCRYPT_MODE, secureKey, sr);
        byte[] bt = cipher.doFinal(data.getBytes());
        String strS = new BASE64Encoder().encode(bt);
        return strS;
    }


    public static String decrypt(String message, String key) {
        try {
            SecureRandom sr = new SecureRandom();
            Cipher cipher = Cipher.getInstance(CIPHER_ALGORITHM);
            Key secureKey = getKey(key);
            cipher.init(Cipher.DECRYPT_MODE, secureKey, sr);
            byte[] res = new BASE64Decoder().decodeBuffer(message);
            res = cipher.doFinal(res);
            return new String(res);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }

}

定时任务:

package com.aaaa.realtimeappmonitor.jobstatetrack.task;

import com.google.common.collect.Lists;
import com.aaaa.realtimeappmonitor.jobstatetrack.model.TableDbConfigInfo;
import com.aaaa.realtimeappmonitor.jobstatetrack.util.AESUtil;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import org.springframework.web.client.RestClientException;
import org.springframework.web.client.RestTemplate;

import java.sql.*;
import java.text.SimpleDateFormat;
import java.util.*;
import java.util.Date;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

/**
 * Description 源库与落地库数据量监控(方案一)
 * 方案一:统计源库当天截止某时数量,比较增量当日分区截止某时数量
 * 方案二:统计源库截止某时数量,比较合并后全库截止某时数量
 * 说明:只能用create_time筛选,加上update_time的话基本对不上
 *
 * @author Bob
 * @date 2020/4/23
 **/
@Slf4j
@Component
public class StatisticMonitorTask {

    @Value("${data.validation.monitor.driver}")
    private String driver;
    @Value("${data.validation.monitor.url}")
    private String url;
    @Value("${data.validation.monitor.username}")
    private String username;
    @Value("${data.validation.monitor.password}")
    private String password;
    @Value("${data.validation.monitor.schema}")
    private String schema;
    @Value("${data.validation.monitor.target.driver}")
    private String targetDriver;
    @Value("${data.validation.monitor.target.url}")
    private String targetUrl;
    @Value("${data.validation.monitor.target.username}")
    private String targetUsername;
    @Value("${data.validation.monitor.target.password}")
    private String targetPassword;
    @Value("${data.validation.monitor.aes.key}")
    private String AES_KEY;
    @Value("${data.validation.monitor.work.no}")
    private String workNo;
    private static final int CORE_POOL_SIZE = 5;
    private static final int MAXIMUM_POOL_SIZE = 5;
    private static final long KEEP_ALIVE_TIME = 0L;

    @Scheduled(cron = "${data.validation.monitor.cron}")
    public void statisticMonitorTask() {
        log.info("Start timing task:monitor statistic...");
        Date date = new Date();
        SimpleDateFormat sdf1 = new SimpleDateFormat("yyyyMMdd");
        SimpleDateFormat sdf2 = new SimpleDateFormat("yyyyMMddHH");
        SimpleDateFormat sdf3 = new SimpleDateFormat("HH");
        String day = sdf1.format(date);
        String dayHour = sdf2.format(date);
        String hour = sdf3.format(date);
        //0点比较前一日数量
        Calendar calendar = Calendar.getInstance();
        calendar.setTime(date);
        if ("00".equals(hour)) {
            calendar.add(Calendar.DAY_OF_YEAR, -1);
            day = sdf1.format(calendar.getTime());
        } else {
            calendar.add(Calendar.HOUR_OF_DAY, -1);
            dayHour = sdf2.format(calendar.getTime());
        }
        List<TableDbConfigInfo> monitorTables = getMonitorTables();
        // 开启多线程
        ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(CORE_POOL_SIZE, MAXIMUM_POOL_SIZE, KEEP_ALIVE_TIME, TimeUnit.MILLISECONDS, new LinkedBlockingDeque<Runnable>());
        for (TableDbConfigInfo info : monitorTables) {
            try {
                threadPoolExecutor.execute(new CountCheck(info, day, dayHour, hour));
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        threadPoolExecutor.shutdown();
        log.info("End timing task:monitor statistic...");
    }

    /**
     * @description 查询需要监控的表
     * @author Bob
     * @date 2020/4/28
     */
    public List<TableDbConfigInfo> getMonitorTables() {
        // 查询需要统计的数据
        Connection conn = null;
        PreparedStatement preparedStatement = null;
        ResultSet rs = null;
        List<TableDbConfigInfo> list = Lists.newArrayList();
        String sql = "select a.source_schema_name as sourceSchemaName,a.source_table_name as sourceTableName,a.integration_db_alias as integrationDbAlias,a.row_key as rowKey,a.integration_sys_name as integrationSysName,\n" +
                "a.target_schema_name as targetSchemaName,a.target_table_name as targetTableName,a.src_db_id as srcDbId,a.presto_view_name as prestoViewName,a.alias_create_time as aliasCreateTime,a.alias_update_time as aliasUpdateTime,\n" +
                "b.src_db_name as srcDbName,b.src_db_type as srcDbType,b.src_db_ip_master as srcDbIpMaster,b.src_db_port_master as srcDbPortMaster,\n" +
                "b.src_db_ip_slave as srcDbIpSlave,b.src_db_port_slave as srcDbPortSlave,b.integration_db_alias as integrationDbAlias,b.integration_sys_name as integrationSysName,\n" +
                "b.master_db_username as masterDbUsername,b.master_db_passwd as masterDbPasswd,b.slave_db_username as slaveDbUsername,b.slave_db_passwd as slaveDbPasswd\n" +
                "from " + schema + ".integration_job_info a\n" +
                "join " + schema + ".integration_src_db_info b\n" +
                "on a.src_db_id=b.id\n" +
                "where a.is_monitor='1';";
        try {
            conn = getConn(driver, url, username, password);
            preparedStatement = conn.prepareStatement(sql);
            rs = preparedStatement.executeQuery();
            while (rs.next()) {
                TableDbConfigInfo info = new TableDbConfigInfo();
                info.setSrcDbName(rs.getString("srcDbName"));
                info.setSrcDbType(rs.getString("srcDbType"));
                info.setSrcDbIpSlave(rs.getString("srcDbIpSlave"));
                info.setSrcDbPortSlave(rs.getString("srcDbPortSlave"));
                info.setSlaveDbUsername(rs.getString("slaveDbUsername"));
                info.setSlaveDbPasswd(rs.getString("slaveDbPasswd"));
                info.setSourceSchemaName(rs.getString("sourceSchemaName"));
                info.setSourceTableName(rs.getString("sourceTableName"));
                info.setTargetSchemaName(rs.getString("targetSchemaName"));
                info.setTargetTableName(rs.getString("targetTableName"));
                info.setSrcDbId(rs.getLong("srcDbId"));
                info.setRowKey(rs.getString("rowKey"));
                info.setAliasCreateTime(rs.getString("aliasCreateTime"));
                info.setAliasUpdateTime(rs.getString("aliasUpdateTime"));
                list.add(info);
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            closeConn(rs, preparedStatement, conn);
        }
        return list;
    }

    class CountCheck implements Runnable {
        private TableDbConfigInfo info;
        private String day;
        private String dayHour;
        private String hour;

        public CountCheck(TableDbConfigInfo info, String day, String dayHour, String hour) {
            this.info = info;
            this.day = day;
            this.dayHour = dayHour;
            this.hour = hour;
        }

        @Override
        public void run() {
            // 查询源库表数据量s1
            Long sCount = countSource(info, day, dayHour, hour);
            // 查询落地库数据量s2
            Long tCount = countTarget(info, day, dayHour, hour);
            StringBuffer sb = new StringBuffer();
            // 比较s1、s2
            if (sCount.compareTo(tCount) != 0) {
                log.warn("源库和目标库数量不一致,源表:" + info.getSourceTableName() + ",数量:" + sCount + "目标表:" + info.getTargetTableName() + ",数量:" + tCount);
                sb.append("源库表名:").append(info.getSourceSchemaName()).append(".").append(info.getSourceTableName()).append(",数量:").append(sCount).append(",目标库表名:").append(info.getTargetSchemaName()).append(".").append(info.getTargetTableName()).append(",数量:").append(tCount).append(",相差:").append(Math.abs(sCount - tCount)).append("\n");
                // 调用告警接口
                send("实时数据集成监控告警:实时数据不一致", sb.toString());
            }
        }
    }

    /**
     * @description 统计源库数量
     * @author Bob
     * @date 2020/4/28
     */
    public Long countSource(TableDbConfigInfo info, String day, String dayHour, String hour) {
        log.info("-------------countSource table:" + info.getSourceSchemaName() + "." + info.getSourceTableName() + " start---------------");
        Long sCount = 0L;
        Connection conn = null;
        PreparedStatement preparedStatement = null;
        ResultSet rs = null;
        String sourceDriver = null;
        String sourceUrl = null;
        String srcDbIpSlave = info.getSrcDbIpSlave();
        String srcDbPortSlave = info.getSrcDbPortSlave();
        String srcDbName = info.getSrcDbName();
        String sourceUsername = info.getSlaveDbUsername();
        String sourcePassword = AESUtil.decrypt(info.getSlaveDbPasswd(), AES_KEY);
        String srcDbType = info.getSrcDbType();
        String sourceTableName = info.getSourceTableName();
        String aliasCreateTime = info.getAliasCreateTime();
        String aliasUpdateTime = info.getAliasUpdateTime();
        StringBuffer sb = new StringBuffer();
        sb.append("select count(1) as cn from ");
        if (info.getSourceSchemaName() != null && !"".equals(info.getSourceSchemaName())) {
            sb.append(info.getSourceSchemaName()).append(".");
        }
        sb.append(sourceTableName);
        sb.append(" where 1=1 ");
        if ("postgresql".equalsIgnoreCase(srcDbType)) {
            sourceDriver = "org.postgresql.Driver";
            sourceUrl = "jdbc:postgresql://" + srcDbIpSlave + ":" + srcDbPortSlave + "/" + srcDbName;
            if (aliasCreateTime != null) {
                if ("00".equals(hour)) {
                    sb.append(" and to_char(").append(aliasCreateTime).append(",'yyyymmdd')").append("='").append(day).append("'");
                } else {
                    sb.append(" and to_char(").append(aliasCreateTime).append(",'yyyymmddhh24')").append("='").append(dayHour).append("'");
                }
            }
        } else if ("mysql".equalsIgnoreCase(srcDbType)) {
            sourceDriver = "com.mysql.jdbc.Driver";
            sourceUrl = "jdbc:mysql://" + srcDbIpSlave + ":" + srcDbPortSlave + "/" + srcDbName;
            if (aliasCreateTime != null) {
                if ("00".equals(hour)) {
                    sb.append(" and to_char(").append(aliasCreateTime).append(",'yyyyMMdd')").append("='").append(day).append("'");
                } else {
                    sb.append(" and to_char(").append(aliasCreateTime).append(",'yyyyMMddHH')").append("='").append(dayHour).append("'");
                }
            }
        } else if ("oracle".equalsIgnoreCase(srcDbType)) {
            sourceDriver = "oracle.jdbc.driver.OracleDriver";
            sourceUrl = "jdbc:oracle:thin:@" + srcDbIpSlave + ":" + srcDbPortSlave + "/" + srcDbName;
            if (aliasCreateTime != null) {
                if ("00".equals(hour)) {
                    sb.append(" and to_char(").append(aliasCreateTime).append(",'yyyyMMdd')").append("='").append(day).append("'");
                } else {
                    sb.append(" and to_char(").append(aliasCreateTime).append(",'yyyyMMddHH')").append("='").append(dayHour).append("'");
                }
            }
        }
        log.info(sb.toString());
        try {
            conn = getConn(sourceDriver, sourceUrl, sourceUsername, sourcePassword);
            preparedStatement = conn.prepareStatement(sb.toString());
            rs = preparedStatement.executeQuery();
            while (rs.next()) {
                sCount = rs.getLong("cn");
            }
        } catch (Exception e) {
            send("实时数据集成监控告警:源数据库连接异常", e.getMessage());
            e.printStackTrace();
        } finally {
            closeConn(rs, preparedStatement, conn);
        }
        log.info("-------------countSource table:" + info.getSourceSchemaName() + "." + info.getSourceTableName() + ",sCount=" + sCount + " end---------------");
        return sCount;
    }

    /**
     * @description 统计目标库数量
     * @author Bob
     * @date 2020/4/28
     */
    public Long countTarget(TableDbConfigInfo info, String day, String dayHour, String hour) {
        log.info("-------------countTarget table:" + info.getTargetSchemaName() + "." + info.getTargetTableName() + " start---------------");
        Long tCount = 0L;
        Connection conn = null;
        PreparedStatement preparedStatement = null;
        ResultSet rs = null;
        StringBuffer sb = new StringBuffer();
        sb.append("select count(distinct ").append(info.getRowKey()).append(") as cn from ");
        if (info.getTargetSchemaName() != null && !"".equals(info.getTargetSchemaName())) {
            sb.append(info.getTargetSchemaName()).append(".");
        }
        sb.append(info.getTargetTableName());
        sb.append(" where dt='").append(day).append("'");
        if (info.getAliasCreateTime() != null) {
            if ("00".equals(hour)) {
                sb.append(" and from_unixtime(unix_timestamp(").append(info.getAliasCreateTime()).append("),'yyyyMMdd')").append("='").append(day).append("'");
            } else {
                sb.append(" and from_unixtime(unix_timestamp(").append(info.getAliasCreateTime()).append("),'yyyyMMddHH')").append("='").append(dayHour).append("'");
            }
        }
        log.info(sb.toString());
        try {
            conn = getConn(targetDriver, targetUrl, targetUsername, targetPassword);
            preparedStatement = conn.prepareStatement(sb.toString());
            rs = preparedStatement.executeQuery();
            while (rs.next()) {
                tCount = rs.getLong("cn");
            }
        } catch (Exception e) {
            send("实时数据集成监控告警:目标数据库连接异常", e.getMessage());
            e.printStackTrace();
        } finally {
            closeConn(rs, preparedStatement, conn);
        }
        log.info("-------------countTarget table:" + info.getTargetSchemaName() + "." + info.getTargetTableName() + ",tCount=" + tCount + " end---------------");
        return tCount;
    }

    /**
     * @description 发送告警
     * @author Bob
     * @date 2020/5/7
     */
    public void send(String title, String message) {
        RestTemplate restTemplate = new RestTemplate();
        String send = "http://aaaa.net/message/qy/send/?work_no=" + workNo + "&title=" + title + "&message=" + message;
        try {
            restTemplate.getForObject(send, String.class);
        } catch (RestClientException e) {
            e.printStackTrace();
        }
    }

    /**
     * @description 获取数据库连接
     * @author Bob
     * @date 2020/4/28
     */
    public Connection getConn(String driver, String url, String username, String password) {
        Connection conn = null;
        try {
            Class.forName(driver);
            conn = DriverManager.getConnection(url, username, password);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return conn;
    }

    /**
     * @description 关闭数据库连接
     * @author Bob
     * @date 2020/4/28
     */
    public void closeConn(ResultSet rs, PreparedStatement preparedStatement, Connection conn) {
        if (rs != null) {
            try {
                rs.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        if (preparedStatement != null) {
            try {
                preparedStatement.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        if (conn != null) {
            try {
                conn.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

五、存在问题

1、由于通过hive查询目标库太慢,监控表一旦多了,排队的就会很多,后续考虑用presto;

2、目前是一个小时监控一次,实时性较差;

3、库、表需手动刷数据配置;

4、开启监控任务太多的话,可能导致监控系统蹦掉,这个待验证;

5、一个严重问题:源库数据创建后,随时可以update,但是目标库会保存所有的历史记录,这样就会导致目标库数据会一直大于等于源库(未合并之前),待优化。。==>已优化:去掉update_time条件,只保留create_time,原则上除非源库有数据删除,否则数据量应该是一致的