flume 读取端口数据到kafka

转载

mob6454cc749e02 2024-09-12 08:46:45

文章标签 flume 读取端口数据到kafka flume 网络服务器默认值 文章分类 架构后端开发

flume的框架：

source
interceptor
selector
channel
sink

flume 读取端口数据到kafka_flume

Agent的配置

Agent的基本概念及应用

定义sources、channels、sinks组件名称
配置sources、channels、sinks
连接sources、channels、sinks

案例：采集指定主机的端⼝44444⽇志数据：

编辑配置文件：

# 定义agent中各组件的名字
# a1为agent名称
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 描述和配置source组件：r1
# r1的类型为netcat,⽤来监听⼀个指定端⼝，并接收监听到的数据。
a1.sources.r1.type = netcat
# r1绑定本地IP地址
a1.sources.r1.bind = node01
# r1采集端⼝44444
a1.sources.r1.port = 44444
# 描述和配置channel组件，此处使⽤是内存缓存的⽅式
# c1的类型为内存
a1.channels.c1.type = memory
# c1最多保存1000个事件
a1.channels.c1.capacity = 1000
#在每次从source中获取数据或者将数据sink出去的⼀次事务操作中，最多处理的event数为100。
a1.channels.c1.transactionCapacity = 100
# 描述和配置sink组件：k1
# k1的类型为logger,记录⽇志，通常⽤于测试或调试。
a1.sinks.k1.type = logger

# 描述和配置source channel sink之间的连接关系
# ⼀个source可以连接多个channel，所以是channels
a1.sources.r1.channels = c1
# ⼀个sink只可以连接⼀个channel，所以是channel
a1.sinks.k1.channel = c1

启动Flume Agent：

flume-ng agent -n a1 -c /usr/local/software/flume/conf -f /opt/flumeconf/netcat-memory-logger.conf -Dflume.root.logger=INFO,console

上述代码参数解析：

-n a1 指定这个 agent 的名字
-c conf 指定 flume ⾃身的配置⽂件所在⽬录
-f conf/netcat-logger.conf 指定所描述的采集⽅案
-Dflume.root.logger=INFO,console 设置⽇志等级

常见的Source类型及应用

netcat Source：监听指定的一个端口

参数	默认值	说明
type	-	netcat
bind	-	IP地址或者主机号
port	-	指定的端口
channels	-	连接的channel

avro Source：监听指定的端口（使用avro协议传输数据）

参数	默认值	说明
type	-	avro
bind	-	ip地址或主机号
port	-	指定的端口
channels	-	连接的channel

spooldir Source：用于监听目录中的文件。读完后及时文件内容发生改变也不会再次读取。

参数	默认值	说明
type	-	spooldir
spoolDir	-	监听的目录
fileHeader	-	是否添加文件的绝对路径到Header中
batchsize	100	每次写入channel的event数量
fileSuffix	.COMPLETED	读取后的文件后缀
channels	-	连接的channel

exec Source：用于监听某一个指令。常见的是tail -F 文件，即只要在文件后面添加数据，即可获取数据。

参数	默认值	说明
type	-	exec
command	-	监听的命令（tail -F /opt/flumedata/log.01）
channels	-	连接的channels

taildir Source：实时监控指定⽬录下新增或修改的文件。每个文件的tail位置都被记录在一个json文件中。

参数	默认值	说明
type	-	TAILDIR
filegroups	-	目录的集合
filegroups.	-	文件夹的绝对路径
positionFile	-	json文件的路径
channels	-	连接的channel

http Source：将指定主机端口的http请求转换为event数据。

参数	默认值	说明
type	-	http
bind	-	ip地址或主机名
port	-	指定端口
channels	-	连接的channel

kafka：从kafka中获取数据。

参数	默认值	说明
type	-	org.apache.flume.source.kafka.KafkaSource
batchsize	-	写入channel的最大event数
kafka.bootstrap.servers	-	服务器
kafka.topics	-	主题
kafka.topics.regex	-	正则表达式匹配
kafka.consumer.group.id	flume	kafka消费组

常见的channel类型：

memory channel：将数据存放在内存中。

参数	默认值	说明
type	-	memory
capacity	100	在channel中的最大event数量
transactionCapacity	100	每次处理的event最大数量
keep-alive	3	channel写入或读取的最大超时时间

file channel：将数据存储在磁盘中。

参数	默认值	说明
type	-	file
checkpointDir	家目录	检查点目录
dataDir	-	存储在磁盘的绝对路径
capacity	1000000	channel中最大event数量
transactionCapacity	1000	每次处理的event数量
keep-live	3	写入或读取的最大超时时间

常见的sink类型：

logger sink：用于测试。

参数	默认值	说明
type	-	logger
channel	-	连接的channel

filerool sink：将得到的数据存储在磁盘中。

参数	默认值	说明
type	-	file_rool
sink.directory	-	保存数据的目录
channel	-	连接的channel

avro sink：将数据发给另外一个agent。

参数	默认值	说明
type	-	avro
hostname	-	ip地址或主机名
port	-	指定端口
channel	-	连接的channel

常见的拦截器：

时间戳拦截器：将时间戳插入到flume的事件报头中。

参数	默认值	说明
type	-	timestamp
preserveExisting	flase	如果设置为true，若事件中报头时间戳信息已经存在，不会替换时间戳报头的值

a1.sources = r1
# 配置⼀个拦截器，保留含‘spark’或‘hadoop’的事件
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors = ts
a1.sources.r1.interceptors.ts.type = timestamp
a1.sources.r1.interceptors.ts.preserveExsisting=false

正则过滤拦截器：过滤不满⾜正则的事件或收集满足正则的事件。

参数	默认值	说明
type	-	REGEX_FILTER
reger	.*	匹配除"\n"之外的任何字符
excludeEvents	false	默认收集匹配到的事件。如果为true，则会删除匹配的event，收集未匹配的事。

a1.sources = r1
# 配置⼀个拦截器，保留含‘spark’或‘hadoop’的事件
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = REGEX_FILTER
a1.sources.r1.interceptors.i1.regex = (spark)|(hadoop)
a1.sources.r1.interceptors.i1.execludeEvents = false

静态拦截器：将k/v键值对插⼊到flume的事件的报头（头信息）中。

参数	默认值	说明
type	-	static
perserverExisting	true	如果设置为true，若事件中报头 k/v 键值对信息已经存在，不会替换其值
key	key	创建的报头键名
value	value	创建的报头值

a1.sources = r1
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = city
a1.sources.r1.interceptors.i1.value = cs

主机拦截器：插⼊服务器的ip地址或者主机名到flume的事件报头（头信息）中。

参数	默认值	说明
type	-	host
preserveExisting	false	如果设置为true，若事件中报头服务器信息已经存在，不会替换其值
useIP	true	若设置为true，则使⽤ip地址；若设置为false，则使⽤主机名
hostHeader	host	报头名

a1.sources = r1
a1.channels = c1
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = host

希望这篇文章对大家有所帮助，在这里致谢我的flume老师：向老师。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：android ProgressDialog 组件已过期

下一篇：alibaba druid版本适配springboot

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

flume 读取端口数据到kafka

flume 读取端口数据到kafka

flume的框架：

Agent的配置

51CTO博客