一、kafka简介
Kafka 被称为下一代分布式-订阅消息系统,是非营利性组织ASF(Apache Software Foundation,简称为ASF)基金会中的一个开源项目,比如HTTP Server、Hadoop、ActiveMQ、Tomcat等开源软件都属于Apache基金会的开源软件,类似的消息系统还有RbbitMQ、ActiveMQ、ZeroMQ,最主要的优势是其具备分布式功能、并且结合zookeeper可以实现动态扩容。
Apache Kafka 与传统消息系统相比,有以下不同:
1)它被设计为一个分布式系统,易于向外扩展;
2)它同时为发布和订阅提供高吞吐量;
3)它支持多订阅者,当失败时能自动平衡消费者;
4)它将消息持久化到磁盘,因此可用于批量消费,例如ETL,以及实时应用程序。
安装环境:
三台服务器IP分别是:
IP:192.168.56.11
IP:192.168.56.12
IP:192.168.56.13
三台服务器分别配置hosts文件:
[root@localhost ~]# cat /etc/hosts
- 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
- ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
- 192.168.56.11 linux-host1.exmaple.com
- 192.168.56.12 linux-host2.exmaple.com
- 192.168.56.13 linux-host3.exmaple.com
1.1:下载安装并验证zookeeper:
1.1.1:kafka下载地址:
http://kafka.apache.org/downloads.html
1.1.2:zookeeper 下载地址:
http://zookeeper.apache.org/releases.html
1.1.3:安装zookeeper:
zookeeper集群特性:整个集群中只要有超过集群数量一半的zookeeper工作只正常的,那么整个集群对外就是可用的,假如有2台服务器做了一个zookeeper集群,只要有任何一台故障或宕机,那么这个zookeeper集群就不可用了,因为剩下的一台没有超过集群一半的数量,但是假如有三台zookeeper组成一个集群,那么损坏一台就还剩两台,大于3台的一半,所以损坏一台还是可以正常运行的,但是再损坏一台就只剩一台集群就不可用了。那么要是4台组成一个zookeeper集群,损坏一台集群肯定是正常的,那么损坏两台就还剩两台,那么2台不大于集群数量的一半,所以3台的zookeeper集群和4台的zookeeper集群损坏两台的结果都是集群不可用,一次类推5台和6台以及7台和8台都是同理,所以这也就是为什么集群一般都是奇数的原因。
下载后的安装文件上传到各服务器的/usr/local/src目录然后分别执行以下操作。
1.1.3.1:Server1配置:
1)安装JDK-1.8(3台机器都需要安装jdk)
[root@linux-host1 ~]# cd /usr/local/src/
[root@linux-host1 src]# wget -c https://mirrors.yangxingzhen.com/jdk/jdk-8u144-linux-x64.tar.gz
[root@linux-host1 src]# tar zxf jdk-8u144-linux-x64.gz -C /usr/local
2)配置环境变量,添加以下内容
[root@linux-host1 src]# vim /etc/profile
- export JAVA_HOME=/usr/local/jdk1.8.0_144
- export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
- export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOMR/bin
3)执行source /etc/profile使其生效
[root@linux-host1 src]# source /etc/profile
[root@linux-host1 src]# java -version
4)安装Zookeeper
1、下载zookeeper包
[root@linux-host1 ~]# wget -c https://mirrors.yangxingzhen.com/zookeeper/zookeeper-3.4.10.tar.gz
2、解压安装配置Zookeeper
[root@linux-host1 ~]# tar zxf zookeeper-3.4.10.tar.gz
[root@linux-host1 ~]# mv zookeeper-3.4.10 /usr/local/zookeeper
[root@linux-host1 ~]# cd /usr/local/zookeeper/
3、创建快照日志存放目录:
[root@linux-host1 zookeeper]# mkdir -p data
4、创建事务日志存放目录:
[root@linux-host1 zookeeper]# mkdir -p logs
【注意】:如果不配置dataLogDir,那么事务日志也会写在data目录中。这样会严重影响zookeeper的性能。因为在zookeeper吞吐量很高的时候,产生的事务日志和快照日志太多。
[root@linux-host1 zookeeper]# cd conf/
[root@linux-host1 conf]# cp zoo_sample.cfg zoo.cfg
[root@linux-host1 conf]# vim zoo.cfg
#配置内容
- #服务器之间或客户端与服务器之间的单次心跳检测时间间隔,单位为毫秒
- tickTime=2000
- #集群中leader服务器与follower服务器第一次连接最多次数
- initLimit=10
- #集群中leader服务器与follower服务器第一次连接最多次数
- syncLimit=5
- #客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求
- clientPort=2181
- #存放数据文件
- dataDir=/usr/local/zookeeper/data
- #存放日志文件
- dataLogDir=/usr/local/zookeeper/logs
- #Zookeeper cluster,2888为选举端口,3888为心跳端口
- #服务器编号=服务器IP:LF数据同步端口:LF选举端口
- server.1=192.168.56.11:2888:3888
- server.2=192.168.56.12:2888:3888
- server.3=192.168.56.13:2888:3888
[root@linux-host1 conf]# echo "1" > /usr/local/zookeeper/data/myid
1.1.3.2:Server2 配置:
1)安装JDK-1.8(3台机器都需要安装JDK)
[root@linux-host2 ~]# cd /usr/local/src/
[root@linux-host2 src]# wget -c https://mirrors.yangxingzhen.com/jdk/jdk-8u144-linux-x64.tar.gz
[root@linux-host2 src]# tar zxf jdk-8u144-linux-x64.gz -C /usr/local
2)配置环境变量,添加以下内容
[root@linux-host2 src]# vim /etc/profile
- export JAVA_HOME=/usr/local/jdk1.8.0_144
- export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
- export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOMR/bin
3)执行source /etc/profile使其生效
[root@linux-host2 src]# source /etc/profile
[root@linux-host2 src]# java -version
4)安装Zookeeper
1、下载Zookeeper软件包
[root@linux-host2 ]# wget -c https://mirrors.yangxingzhen.com/zookeeper/zookeeper-3.4.10.tar.gz
2、解压安装配置Zookeeper
[root@linux-host2 ~]# tar zxf zookeeper-3.4.10.tar.gz
[root@linux-host2 ~]# mv zookeeper-3.4.10 /usr/local/zookeeper
[root@linux-host2 ~]# cd /usr/local/zookeeper/
3、创建快照日志存放目录:
[root@linux-host2 zookeeper]# mkdir -p data
4、创建事务日志存放目录:
[root@linux-host2 zookeeper]# mkdir -p logs
【注意】:如果不配置dataLogDir,那么事务日志也会写在data目录中。这样会严重影响zookeeper的性能。因为在zookeeper吞吐量很高的时候,产生的事务日志和快照日志太多。
[root@linux-host2 zookeeper]# cd conf/
[root@linux-host2 conf]# cp zoo_sample.cfg zoo.cfg
[root@linux-host2 conf]# vim zoo.cfg
#配置内容
- #服务器之间或客户端与服务器之间的单次心跳检测时间间隔,单位为毫秒
- tickTime=2000
- #集群中leader服务器与follower服务器第一次连接最多次数
- initLimit=10
- #集群中leader服务器与follower服务器第一次连接最多次数
- syncLimit=5
- #客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求
- clientPort=2181
- #存放数据文件
- dataDir=/usr/local/zookeeper/data
- #存放日志文件
- dataLogDir=/usr/local/zookeeper/logs
- #Zookeeper cluster,2888为选举端口,3888为心跳端口
- #服务器编号=服务器IP:LF数据同步端口:LF选举端口
- server.1=192.168.56.11:2888:3888
- server.2=192.168.56.12:2888:3888
- server.3=192.168.56.13:2888:3888
[root@linux-host2 conf]# echo "2" > /usr/local/zookeeper/data/myid
1.1.3.3:Server3 配置:
1)安装JDK-1.8(3台机器都需要安装JDK)
[root@linux-host3 ~]# cd /usr/local/src/
[root@linux-host3 src]# wget -c https://mirrors.yangxingzhen.com/jdk/jdk-8u144-linux-x64.tar.gz
[root@linux-host3 src]# tar zxf jdk-8u144-linux-x64.gz -C /usr/local
2)配置环境变量,添加以下内容
[root@linux-host3 src]# vim /etc/profile
- export JAVA_HOME=/usr/local/jdk1.8.0_144
- export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
- export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOMR/bin
3)执行source /etc/profile使其生效
[root@linux-host3 src]# source /etc/profile
[root@linux-host3 src]# java -version
4)安装Zookeeper
1、下载Zookeeper软件包
[root@linux-host3 ]# wget -c https://mirrors.yangxingzhen.com/zookeeper/zookeeper-3.4.10.tar.gz
2、解压安装配置Zookeeper
[root@linux-host3 ~]# tar zxf zookeeper-3.4.10.tar.gz
[root@linux-host3 ~]# mv zookeeper-3.4.10 /usr/local/zookeeper
[root@linux-host3 ~]# cd /usr/local/zookeeper/
3、创建快照日志存放目录:
[root@linux-host3 zookeeper]# mkdir -p data
4、创建事务日志存放目录:
[root@linux-host3 zookeeper]# mkdir -p logs
【注意】:如果不配置dataLogDir,那么事务日志也会写在data目录中。这样会严重影响zookeeper的性能。因为在zookeeper吞吐量很高的时候,产生的事务日志和快照日志太多。
[root@linux-host3 zookeeper]# cd conf/
[root@linux-host3 conf]# cp zoo_sample.cfg zoo.cfg
[root@linux-host3 conf]# vim zoo.cfg
#配置内容
- #服务器之间或客户端与服务器之间的单次心跳检测时间间隔,单位为毫秒
- tickTime=2000
- #集群中leader服务器与follower服务器第一次连接最多次数
- initLimit=10
- #集群中leader服务器与follower服务器第一次连接最多次数
- syncLimit=5
- #客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求
- clientPort=2181
- #存放数据文件
- dataDir=/usr/local/zookeeper/data
- #存放日志文件
- dataLogDir=/usr/local/zookeeper/logs
- #Zookeeper cluster,2888为选举端口,3888为心跳端口
- #服务器编号=服务器IP:LF数据同步端口:LF选举端口
- server.1=192.168.56.11:2888:3888
- server.2=192.168.56.12:2888:3888
- server.3=192.168.56.13:2888:3888
[root@linux-host3 conf]# echo "3" > /usr/local/zookeeper/data/myid
1.1.3.4:各服务器启动Zookeeper:
[root@linux-host1 ~]# /usr/local/zookeeper/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@linux-host2 src]# /usr/local/zookeeper/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@linux-host3 src]# /usr/local/zookeeper/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
1.1.3.5:查看各Zookeeper状态:
[root@linux-host1 ~]# /usr/local/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[root@linux-host2 ~]# /usr/local/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[root@linux-host3 ~]# /usr/local/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
1.1.3.6:Zookeeper简单操作命令:
#连接到任意节点生成数据:
[root@linux-host3 src]# /usr/local/zookeeper/bin/zkCli.sh -server 192.168.56.11:2181
[zk: 192.168.56.11:2181(CONNECTED) 3] create /test "hello"
#在其他Zookeeper节点验证数据:
[root@linux-host2 src]# /usr/local/zookeeper/bin/zkCli.sh -server 192.168.56.12:2181
[zk: 192.168.56.12:2181(CONNECTED) 0] get /test
hello
cZxid = 0x100000004
ctime = Fri Dec 15 11:14:07 CST 2017
mZxid = 0x100000004
mtime = Fri Dec 15 11:14:07 CST 2017
pZxid = 0x100000004
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 5
numChildren = 0
1.2:安装并测试kafka:
1.2.1:Server1安装kafka:
[root@linux-host1 src]# wget -c https://archive.apache.org/dist/kafka/2.0.1/kafka_2.12-2.0.1.tgz
[root@linux-host1 src]# tar xf kafka_2.12-2.0.1.tgz
[root@linux-host1 src]# mv kafka_2.12-2.0.1 /usr/local/kafka
[root@linux-host1 src]# vim /usr/local/kafka/config/server.properties
- broker.id=1
- listeners=PLAINTEXT://192.168.56.11:9092
- #保留指定小时的日志内容
- log.retention.hours=24
- #所有的Zookeeper地址
- zookeeper.connect=192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181
1.2.2:Server2安装kafka:
[root@linux-host2 src]# wget -c https://archive.apache.org/dist/kafka/2.0.1/kafka_2.12-2.0.1.tgz
[root@linux-host2 src]# tar xf kafka_2.12-2.0.1.tgz
[root@linux-host2 src]# mv kafka_2.12-2.0.1 /usr/local/kafka
[root@linux-host2 src]# vim /usr/local/kafka/config/server.properties
- broker.id=2
- listeners=PLAINTEXT://192.168.56.12:9092
- zookeeper.connect=192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181
1.2.3:Server3安装kafka:
[root@linux-host3 src]# wget -c https://archive.apache.org/dist/kafka/2.0.1/kafka_2.12-2.0.1.tgz
[root@linux-host3 src]# tar xf kafka_2.12-2.0.1.tgz
[root@linux-host3 src]# mv kafka_2.12-2.0.1 /usr/local/kafka
[root@linux-host3 src]# vim /usr/local/kafka/config/server.properties
- broker.id=3
- listeners=PLAINTEXT://192.168.56.13:9092
- zookeeper.connect=192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181
1.2.4:分别启动kafka:
1.2.4.1:Serevr1启动kafka:
[root@linux-host1 src]# /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties #以守护进程的方式启动
1.2.4.2:Serevr2启动kafka:
[root@linux-host2 src]# /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties
1.2.4.3:Serevr3启动kafka:
[root@linux-host3 src]# /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties
#/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties & #此方式zookeeper会在shell断开后关闭
1.2.5:测试kafka:
1.2.5.1:验证进程:
[root@linux-host1 ~]# jps
10578 QuorumPeerMain
11572 Jps
11369 Kafka
[root@linux-host2 ~]# jps
2752 QuorumPeerMain
8229 Kafka
8383 Jps
[root@linux-host3 ~]# jps
12626 Kafka
2661 QuorumPeerMain
12750 Jps
1.2.5.2:测试创建topic:
创建名为logstashtest,partitions(分区)为3,replication(复制)为3的topic(主题):
在任意kafaka服务器操作:
[root@linux-host2 ~]# /usr/local/kafka/bin/kafka-topics.sh --create --zookeeper 192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181 --partitions 3 --replication-factor 3 --topic logstashtest
Created topic "logstashtest".
1.2.5.3:测试获取topic:
可以在任意一台kafka服务器进行测试:
[root@linux-host3 ~]# /usr/local/kafka/bin/kafka-topics.sh --describe --zookeeper 192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181 --topic logstashtest
状态说明:logstashtest有三个分区分别为1、2、3,分区0的leader是3(broker.id),分区0有三个副本,并且状态都为lsr(ln-sync,表示可以参加选举成为leader)。
1.2.5.4:删除topic:
[root@linux-host3 ~]# /usr/local/kafka/bin/kafka-topics.sh --delete --zookeeper 192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181 --topic logstashtest
Topic logstashtest is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.
1.2.5.5:获取所有topic:
[root@linux-host1 ~]# /usr/local/kafka/bin/kafka-topics.sh --list --zookeeper 192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181
__consumer_offsets
nginx-accesslog-5612
system-log-5612
1.2.6:kafka命令测试消息发送:
1.2.6.1:创建topic:
[root@linux-host3 ~]# /usr/local/kafka/bin/kafka-topics.sh --create --zookeeper 192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181 --partitions 3 --replication-factor 3 --topic messagetest
Created topic "messagetest".
1.2.6.2:发送消息:
[root@linux-host2 ~]# /usr/local/kafka/bin/kafka-console-producer.sh --broker-list 192.168.56.11:9092,192.168.56.12:9092,192.168.56.13:9092 --topic messagetest
>hello
>kafka
>logstash
>ss
>oo
1.2.6.3:其他kafka服务器测试获取数据:
#Server1:
[root@linux-host1 ~]# /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper 192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181 --topic messagetest --from-beginning
#Server2:
#Server3:
1.2.7:使用logstash测试向kafka写入数据:
1.2.7.1:编辑logstash配置文件:
[root@linux-host3 ~]# vim /etc/logstash/conf.d/logstash-to-kafka.sh
- input {
- stdin {}
- }
- output {
- kafka {
- topic_id => "hello"
- bootstrap_servers => "192.168.56.11:9092"
- batch_size => 5
- }
- stdout {
- codec => rubydebug
- }
- }
1.2.7.2:验证kafka收到logstash数据:
[root@linux-host1 ~]# /usr/local/kafka/bin/kafka-console-consumer.sh --zookeeper 192.168.56.11:2181,192.168.56.12:2181,192.168.56.13:2181 --topic hello --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] inste
ad of [zookeeper].
2017-12-15T14:33:00.684Z linux-host3.exmaple.com hello
2017-12-15T14:33:31.127Z linux-host3.exmaple.com test
[root@linux-host2 ~]# /usr/local/kafka/bin/kafka-console-producer.sh --broker-list 192.168.56.11:9
092,192.168.56.12:9092,192.168.56.13:9092 --topic messagetest>hello
>kafka
>logstash
- 输入编号:4034,直达文章
- 输入m|M,直达目录列表