flume实时搜集mysql增量数据到kafka整理

原创

mb5fdb12e4adbb2 2021-03-07 20:48:25 ©著作权

文章标签 java 文章分类 Java 后端开发

©著作权归作者所有：来自51CTO博客作者mb5fdb12e4adbb2的原创作品，请联系作者获取转载授权，否则将追究法律责任

软件安装服务器：

flume实时搜集mysql增量数据到kafka整理_java

一：安装kafka

----------------

0.选择三台主机安装kafka

1.准备zk

2.jdk

3.tar解压文件kafka_2.11-2.2.0.tgz

4.环境变量

/etc/profile文件内容

exportKAFKA_HOME=/opt/kafka

exportPATH=$PATH:$KAFKA_HOME/bin

5.配置kafka

[kafka/config/server.properties]

...

broker.id=0

...

listeners=PLAINTEXT://:9092

...

log.dirs=/opt/kafka/kafka-logs

...

zookeeper.connect=server2:2181,server3:2181,server4:2181

6.分发server.properties到其它主机，同时修改每个文件的broker.id 分别0，1，2

7.启动kafka服务器

a)先启动zk

b)启动kafka

[server1 server3 server4]

$>bin/kafka-server-start.shconfig/server.properties

c)验证kafka服务器是否启动

$>netstat-anop | grep 9092

8.创建主题

$>bin/kafka-topics.sh--create --zookeeper server3:2181 --replication-factor 3 --partitions 3 --topictest

9.查看主题列表

$>bin/kafka-topics.sh--list --zookeeper server1:2181

10.启动控制台生产者

$>bin/kafka-console-producer.sh--broker-list server3:9092 --topic test

11.启动控制台消费者

$>bin/kafka-console-consumer.sh--bootstrap-server server3:9092 --topic test --from-beginning --zookeeperserver3:2181

12.在生产者控制台输入helloworld, 消费者可收到消息

13.删除主题

$>bin/kafka-topics--delete --topic test --zookeeper server3:2181

二：安装flume

-------------------------------------

1.下载apache-flume-1.7.0-bin.tar.gz

2.解压到安装目录

3.配置环境变量

/etc/profile文件内容

exportFLUME_HOME=/opt/flume

exportPATH=$PATH:$FLUME_HOME/bin

4.验证flume是否成功

$>flume-ngversion

二：配置flume获取mysql增量数据下沉到kafka，注意要先启动zk集群和kafka集群

------------------------------------

1.下载mysql-connector-java-5.1.48.jar，flume-ng-sql-source-json-1.0.jar 两个jar包上传到 FLUME_HOME/lib/文件夹下

2.copy模板配置文件flume-conf.properties.template为mysql_kafka_test.conf 文件内容如下：

#数据来源

sync.sources= s-1

#数据通道

sync.channels= c-1

#数据去处，这里配置了failover，根据下面的优先级配置，会先启用k-1，k-1挂了后再启用k-2

sync.sinks= k-1 k-2

#这个是配置failover的关键，需要有一个sinkgroup

sync.sinkgroups= g-1

sync.sinkgroups.g-1.sinks= k-1 k-2

#处理的类型是failover

sync.sinkgroups.g-1.processor.type= failover

#优先级，数字越大优先级越高，每个sink的优先级必须不相同

sync.sinkgroups.g-1.processor.priority.k-1= 5

sync.sinkgroups.g-1.processor.priority.k-2= 10

#设置为10秒，当然可以根据你的实际状况更改成更快或者很慢

sync.sinkgroups.g-1.processor.maxpenalty= 10000

##########数据通道的定义

#数据量不大，直接放内存。其实还可以放在JDBC，kafka或者磁盘文件等

sync.channels.c-1.type= memory

#通道队列的最大长度

sync.channels.c-1.capacity= 100000

#putList和takeList队列的最大长度，sink从capacity中抓取batchsize个event，放到这个队列。所以此参数最好比capacity小，比sink的batchsize大。

#官方定义：The maximum number of events the channel will take from a source orgive to a sink per transaction.

sync.channels.c-1.transactionCapacity= 1000

sync.channels.c-1.byteCapacityBufferPercentage= 20

###默认值的默认值等于JVM可用的最大内存的80%，可以不配置

#sync.channels.c-1.byteCapacity = 800000

#########sqlsource#################

#source s-1用到的通道，和sink的通道要保持一致，否则就GG了

sync.sources.s-1.channels=c-1

#########For each one of the sources, the type is defined

sync.sources.s-1.type= org.keedio.flume.source.SQLSource

sync.sources.s-1.hibernate.connection.url= jdbc:mysql://10.22.20.70:3306/avatar

#########Hibernate Database connection properties

sync.sources.s-1.hibernate.connection.user= root

sync.sources.s-1.hibernate.connection.password= root

sync.sources.s-1.hibernate.connection.autocommit= true

sync.sources.s-1.hibernate.dialect= org.hibernate.dialect.MySQL5Dialect

sync.sources.s-1.hibernate.connection.driver_class= com.mysql.jdbc.Driver

sync.sources.s-1.run.query.delay=10000

sync.sources.s-1.status.file.path= /opt/flume/record-datas/avatar/test_status

#使用方式见上面的启动说明

sync.sources.s-1.status.file.name= sqlSource.status

########Custom query

sync.sources.s-1.start.from= 0

#sql语句自定义,但是要注意:增量只能针对id字段即主键列,经测试系统默认如此.而且必须要将主键查询出来,

#因为如果不查询主键,flume无法记录上一次查询的位置.$@$表示增量列上一次查询的值，记录在test_status文件中

sync.sources.s-1.custom.query= select a.id, a.cusId, b.flag, b.num from a_test a inner join a_test1 b ona.id=b.id where a.id > $@$ order by a.id asc

sync.sources.s-1.batch.size= 100

sync.sources.s-1.max.rows= 100

sync.sources.s-1.hibernate.connection.provider_class= org.hibernate.connection.C3P0ConnectionProvider

sync.sources.s-1.hibernate.c3p0.min_size=5

sync.sources.s-1.hibernate.c3p0.max_size=20

#########sinks 1

#sink k-1用到的通道，和source的通道要保持一致，否则取不到数据

sync.sinks.k-1.channel= c-1

sync.sinks.k-1.type= org.apache.flume.sink.kafka.KafkaSink

sync.sinks.k-1.kafka.topic= avatar_test

sync.sinks.k-1.kafka.bootstrap.servers= server3:9092

sync.sinks.k-1.kafka.producer.acks= 1

#每批次处理的event数量

sync.sinks.k-1.kafka.flumeBatchSize = 100

#########sinks 2

#sink k-2用到的通道，和source的通道要保持一致，否则取不到数据

sync.sinks.k-2.channel= c-1

sync.sinks.k-2.type= org.apache.flume.sink.kafka.KafkaSink

sync.sinks.k-2.kafka.topic= avatar_test

sync.sinks.k-2.kafka.bootstrap.servers= server4:9092

sync.sinks.k-2.kafka.producer.acks= 1

sync.sinks.k-2.kafka.flumeBatchSize = 100

3.注意创建记录文件夹sql查询记录文件夹/opt/flume/record-datas/avatar/test_status

4.启动flume,在FLUME_HOME下面执行以下命令：

flume-ngagent -c conf –f conf/mysql_kafka_test.conf –n sync-Dflume.root.logger=INFO,console

5.启动kafka消费者进行查看mysql增量数据：

kafka-console-consumer.sh--bootstrap-server server3:9092 --topic avatar_test --from-beginning--zookeeper server3:2181

6.可利用Java编写代码，kafka客户端创建消费者接收数据，利用hadoop客户端将数据实时写入hdfs

三. 配置文件 flume对接kafka数据下沉到hdfs ---- kafka_hdfs.conf

a1.sources= r1

a1.sinks= k1

a1.channels= c1

a1.sources.r1.type= org.apache.flume.source.kafka.KafkaSource

a1.sources.r1.batchSize= 5000

a1.sources.r1.batchDurationMillis= 2000

a1.sources.r1.kafka.bootstrap.servers= server3:9092

a1.sources.r1.kafka.topics= test

a1.sources.r1.kafka.consumer.group.id= g1

a1.sinks.k1.type= hdfs

a1.sinks.k1.hdfs.path= /user/test/flume/%y/%m/%d/%H/%M/%S

a1.sinks.k1.hdfs.filePrefix= events-

a1.sinks.k1.hdfs.round= true

a1.sinks.k1.hdfs.roundValue= 20

a1.sinks.k1.hdfs.roundUnit= second

a1.sinks.k1.hdfs.useLocalTimeStamp=true

a1.sinks.k1.hdfs.rollInterval=10

a1.sinks.k1.hdfs.rollSize=10

a1.sinks.k1.hdfs.rollCount=3

a1.channels.c1.type=memory

a1.sources.r1.channels= c1

a1.sinks.k1.channel= c1

四. 配置文件 flume实时对接日志文件下沉到hdfs

a1.sources= r1

a1.sinks= k1

a1.channels= c1

a1.sources.r1.type=exec

a1.sources.r1.command=tail-F /home/server/test.log

a1.sinks.k1.type= hdfs

a1.sinks.k1.hdfs.path= /user/test/flume/%y/%m/%d/%H/%M/%S

a1.sinks.k1.hdfs.filePrefix= events-

a1.sinks.k1.hdfs.round= true

a1.sinks.k1.hdfs.roundValue= 20

a1.sinks.k1.hdfs.roundUnit= second

a1.sinks.k1.hdfs.useLocalTimeStamp=true

a1.sinks.k1.hdfs.rollInterval=10

a1.sinks.k1.hdfs.rollSize=10

a1.sinks.k1.hdfs.rollCount=3

a1.channels.c1.type=memory

a1.sources.r1.channels= c1

a1.sinks.k1.channel= c1

五. 配置文件 flume实时对接日志文件夹下沉到hdfs

a1.sources= r1

a1.sinks= k1

a1.channels= c1

a1.sources.r1.type=spooldir

a1.sources.r1.spoolDir=/home/server/spool

a1.sources.r1.fileHeader=true

a1.sinks.k1.type= hdfs

a1.sinks.k1.hdfs.path= /user/test/flume/%y/%m/%d/%H/%M/%S

a1.sinks.k1.hdfs.filePrefix= events-

a1.sinks.k1.hdfs.round= true

a1.sinks.k1.hdfs.roundValue= 20

a1.sinks.k1.hdfs.roundUnit= second

a1.sinks.k1.hdfs.useLocalTimeStamp=true

a1.sinks.k1.hdfs.rollInterval=10

a1.sinks.k1.hdfs.rollSize=10

a1.sinks.k1.hdfs.rollCount=3

a1.channels.c1.type=memory

a1.sources.r1.channels= c1

a1.sinks.k1.channel= c1

上一篇：技术人自己的KPI

下一篇：flume实时搜集mysql增量数据到kafka整理

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯