解压

tar -zxvf apache-flume-1.7.0-bin.tar.gz


修改 flume-env.sh 配置文件,主要是JAVA_HOME变量设置


# Enviroment variables can be set here.
export JAVA_HOME=/usr/java/jdk1.8.0_91


验证是否安装成功


$ ./bin/flume-ng version
Flume 1.7.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
From source with checksum 0d21b3ffdc55a07e1d08875872c00523

出现上面的信息,表示安装成功了


案例 1:start case (single-node configuration)

创建agent配置文件

#文件名:case1_example.conf
#配置内容:
# case1_example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1


#开始命令

./bin/flume-ng agent -c conf -f conf/case1_example.conf -n a1 -Dflume.root.logger=INFO,console


#启动参数说明

-c conf 指定配置目录为conf
-f conf/case1_example.conf 指定配置文件为conf/case1_example.conf
-n a1 指定agent名字为a1,需要与case1_example.conf中的一致
-Dflume.root.logger=INFO,console 指定DEBUF模式在console输出INFO信息


#在另一个终端进行测试(安装telnet:yum install -y telnet)

# telnet 127.0.0.1 44444
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
hello world!
OK


#在启动的终端查看console输出

2017-02-09 11:34:36,369 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:169)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
2017-02-09 11:40:50,462 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)]
Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D          hello world!. }


案例2:Avro案例

Avro可以发送一个给定的文件给Flume,Avro 源使用AVRO RPC机制。

创建agent配置文件

#文件名:case2_avro.conf
#配置内容:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


#Start flume agent a1

./bin/flume-ng agent -c . -f conf/case2_avro.conf -n a1 -Dflume.root.logger=INFO,console


#创建指定文件


echo "hello world" > log.10

#使用avro-client发送文件

./bin/flume-ng avro-client -c . -H localhost -p 4141 -F log.10

#在启动的终端查看console输出

sink.LoggerSink: Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64                hello world }



案例3:Exec 

EXEC执行一个给定的命令获得输出的源,如果要使用tail命令,必选使得file足够大才能看到输出内容

创建agent配置文件

Test Exec Source

#文件名:case3_exec.conf
#配置内容:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/apache-flume-1.7.0-bin/log.10
a1.sources.r1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


#启动flume agent a1

./bin/flume-ng agent -c . -f conf/case3_exec.conf -n a1 -Dflume.root.logger=INFO,console


#生成足够多的内容在文件里


for i in {1..100};do echo "exec test$i" >> log.10;echo $i;done


#在启动的终端查看console输出

17/02/09 14:30:33 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 65 73 74 37 34                exec test1 }

...
...
...

17/02/09 14:30:35 INFO sink.LoggerSink: Event: { headers:{} body: 65 78 65 63 20 74 65 73 74 31 30 30 exec test100 }


案例4:Spool

Spool监测配置的目录下新增的文件,并将文件中的数据读取出来。需要注意两点:

    1) 拷贝到spool目录下的文件不可以再打开编辑。

    2) spool目录下不可包含相应的子目录

创建agent配置文件

#文件名:case4_spool.conf
#配置内容:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/logs/flumeSpool
a1.sources.r1.fileHeader = true
a1.sources.r1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


#启动flume agent a1


./bin/flume-ng agent -c . -f conf/case4_spool.conf -n a1 -Dflume.root.logger=INFO,console


追加文件到/home/hadoop/logs/flumeSpool目录


echo "spool test1" > /home/hadoop/logs/flumeSpool/spool_text.log


#在启动的终端查看console输出


17/02/09 14:55:31 INFO sink.LoggerSink: Event: { headers:{file=/home/hadoop/logs/flumeSpool/spool_text.log} body: 73 70 6F 6F 6C 20 74 65 73 74 31                spool test1 }
17/02/09 14:55:31 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
17/02/09 14:55:31 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/hadoop/logs/flumeSpool/spool_text.log to /home/hadoop/logs/flumeSpool/spool_text.log.COMPLETED

spool_text.log文件中的数据被读取出来后名字变成spool_text.log.COMPLETED


案例5:Syslogtcp

Syslogtcp监听TCP的端口做为数据源

创建agent配置文件

案例5:Test Syslog tcp source
#文件名:case5_syslog.conf
#配置内容:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


启动flume agent a1

./bin/flume-ng agent -c . -f conf/case5_syslog.conf -n a1 -Dflume.root.logger=INFO,console


测试产生syslog(安装nc:yum install -y nc)

echo "hello idoall.org syslog" | nc localhost 5140


#在启动的终端查看console输出

17/02/09 15:20:11 WARN source.SyslogUtils: Event created from Invalid Syslog data.
17/02/09 15:20:16 INFO sink.LoggerSink: Event: { headers:{Severity=0, Facility=0, flume.syslog.status=Invalid} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }


案例6:Syslogudp

创建agent配置文件

案例6:Test Syslog udp source
#文件名:case6_syslogudp.conf
#配置内容:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = syslogudp
a1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


#启动flume agent a1

./bin/flume-ng agent -c . -f conf/case6_syslogudp.conf -n a1 -Dflume.root.logger=INFO,console


#测试产生syslog

echo "<37>hello via syslogudp" | nc -u localhost 5140


#在启动的终端查看console输出

2013-05-27 23:39:10,755 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{Severity=5, Facility=4} body: 68 65 6C 6C 6F 20 76 69 61 20 73 79 73 6C 6F 67 hello via syslogudp }



案例7:HTTP source JSONHandler

创建agent配置文件

#文件名:case7_httppost.conf
#配置内容:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


#启动flume agent a1

./bin/flume-ng agent -c . -f conf/case7_httppost.conf -n a1 -Dflume.root.logger=INFO,console


#生成JSON 格式的POST request


curl -X POST -d '[{ "headers" :{"namenode" : "namenode.example.com","datanode" : "random_datanode.example.com"},"body" : "really_random_body"}]' http://localhost:5140


#在启动的终端查看console输出


17/02/09 17:16:51 INFO sink.LoggerSink: Event: { headers:{namenode=namenode.example.com, datanode=random_datanode.example.com} body: 72 65 61 6C 6C 79 5F 72 61 6E 64 6F 6D 5F 62 6F really_random_bo }