文章目录
- 一、安装Kafka集群
- 1.前言:
- 2.我一般爱装在hadoop用户下面:
- 3.将kafka拷贝到其他两台机器:
- 4.修改server.properties中的broker.id,三台机器都修改:
- 5.首先启动zookeeper集群,然后三台机器都执行以下命令:
- 6.创建消息主题topic:
- 7.查看创建的Topic,执行如下命令:
- 8.自带压力测试:
- 9.启动Producer,并向我们上面创建的Topic中生产消息:
- 10.新开窗口,创建消费者consumer:
- 11.查看kafka版本号:
- 二、kafka_2.11-2.2.1单机快速安装
- 1.前言:
- 2.启动kafka:
- 三、删除topic
- 1.准备:
- 2.进入zookeeper客户端查看对应topic:
- 3.重启kafka集群后进行删除操作:
一、安装Kafka集群
1.前言:
我们使用3台机器搭建Kafka集群:
192.168.4.142 h40
192.168.4.143 h41
192.168.4.144 h42
kafka_2.10-0.8.2.0下载地址:http://mirror.bit.edu.cn/apache/kafka/0.8.2.0/kafka_2.10-0.8.2.0.tgz
我安装的这个版本有点老了,你也可以下载较新的0.10.1.0版本:http://mirror.bit.edu.cn/apache/kafka/0.10.1.0/kafka_2.10-0.10.1.0.tgz
注:0.10.1.0版本安装方法和旧版本0.8一样,就是在启动时会比旧版本0.8在界面多输出配置信息,在默认的/tmp/kafka-logs目录下也会多出cleaner-offset-checkpoint和meta.properties这两个文件,在zookeeper客户端中的/kafka/brokers下多出个seqid目录,在/kafka/config下多出个clients目录。
2.我一般爱装在hadoop用户下面:
创建hadoop用户和组(所有虚拟机):
groupadd hadoop
useradd -g hadoop hadoop
passwd hadoop
[hadoop@h40 ~]$ tar -zxvf kafka_2.10-0.8.2.0.tgz
[hadoop@h40 ~]$ cd kafka_2.10-0.8.2.0
修改配置文件kafka_2.10-0.8.2.0/config/server.properties,修改如下内容为:zookeeper.connect=h40:2181,h41:2181,h42:2181/kafka
这里需要说明的是,默认Kafka会使用ZooKeeper默认的/
路径,这样有关Kafka的ZooKeeper配置就会散落在根路径下面,如果你有其他的应用也在使用ZooKeeper集群,查看ZooKeeper中数据可能会不直观,所以强烈建议指定一个chroot路径,直接在zookeeper.connect配置项中指定。
而且,需要手动在ZooKeeper中创建路径/kafka(后来发现不手动创建的话它也会自动生成),使用如下命令连接到任意一台ZooKeeper服务器:
[hadoop@h40 ~]$ cd zookeeper-3.4.5/
[hadoop@h40 zookeeper-3.4.5]$ bin/zkCli.sh
# 在ZooKeeper执行如下命令创建chroot路径:
[zk: localhost:2181(CONNECTED) 0] create /kafka ''
这样,每次连接Kafka集群的时候(使用–zookeeper选项),也必须使用带chroot路径的连接字符串,后面会看到。
注:我不太明白''
有什么作用,可参考zookeeper学习笔记1和ZooKeeper实践四:客户端脚本了解zookeeper客户端命令的相关知识。
3.将kafka拷贝到其他两台机器:
[hadoop@h40 ~]$ scp -r kafka_2.10-0.8.2.0/ h41:/home/hadoop/
[hadoop@h40 ~]$ scp -r kafka_2.10-0.8.2.0/ h42:/home/hadoop/
4.修改server.properties中的broker.id,三台机器都修改:
因为Kafka集群需要保证各个Broker的id在整个集群中必须唯一,需要调整这个配置项的值(如果在单机上,可以通过建立多个Broker进程来模拟分布式的Kafka集群,但也需要Broker的id唯一,还需要修改一些配置目录的信息)。
[hadoop@h40 ~]$ vi kafka_2.10-0.8.2.0/config/server.properties
broker.id=0
[hadoop@h41 ~]$ vi kafka_2.10-0.8.2.0/config/server.properties
broker.id=1
[hadoop@h42 ~]$ vi kafka_2.10-0.8.2.0/config/server.properties
broker.id=2
5.首先启动zookeeper集群,然后三台机器都执行以下命令:
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-server-start.sh config/server.properties
6.创建消息主题topic:
可以通过查看日志,或者检查进程状态,保证Kafka集群启动成功。我们创建一个名称为test的Topic,5个分区,并且复制因子为3,执行如下命令:
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-topics.sh --create --zookeeper h40:2181,h41:2181,h42:2181/kafka --replication-factor 3 --partitions 5 --topic test
Created topic "test".
# zookeeper只指定一个ip也可以的,比如:
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-topics.sh --create --zookeeper h40:2181/kafka --replication-factor 2 --partitions 2 --topic hui
Created topic "hui".
7.查看创建的Topic,执行如下命令:
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-topics.sh --describe --zookeeper h40:2181,h41:2181,h42:2181/kafka --topic test
Topic:test PartitionCount:5 ReplicationFactor:3 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
Topic: test Partition: 1 Leader: 0 Replicas: 1,0,2 Isr: 0,2,1
Topic: test Partition: 2 Leader: 2 Replicas: 2,1,0 Isr: 2,0,1
Topic: test Partition: 3 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1
Topic: test Partition: 4 Leader: 2 Replicas: 1,2,0 Isr: 2,0,1
注:上面Leader、Replicas、Isr的含义如下:
- Partition: 分区
- Leader : 负责读写指定分区的节点
- Replicas : 复制该分区log的节点列表
- Isr : “in-sync” replicas,当前活跃的副本列表(是一个子集),并且可能成为Leader
8.自带压力测试:
注:我这个是在单节点安装的kafka上测试的,等有时间再在集群上测试吧。
下面使用perf命令来测试topic的性能,50W条消息,每条1000字节,batch大小1000,topic为test,4个线程:bin/kafka-topics.sh --create --zookeeper h153:2181 --replication-factor 1 --partitions 2 --topic test
# 生产者测试:
[hadoop@h153 kafka_2.10-0.8.2.0]$ bin/kafka-producer-perf-test.sh --messages 500000 --message-size 1000 --batch-size 1000 --topics test --threads 4 --broker-list h153:9092
start.time, end.time, compression, message.size, batch.size, total.data.sent.in.MB, MB.sec, total.data.sent.in.nMsg, nMsg.sec
2017-09-27 01:33:54:610, 2017-09-27 01:34:10:381, 0, 1000, 1000, 476.84, 30.2351, 500000, 31703.7601
# 消费者:
[hadoop@h153 kafka_2.10-0.8.2.0]$ bin/kafka-consumer-perf-test.sh --zookeeper h153 --messages 500000 --topic test --threads 4
start.time, end.time, fetch.size, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
2017-09-27 01:36:41:066, 2017-09-27 01:36:46:068, 1048576, 0.0000, 0.0000, 0, 0.0000
注意:新版本运行命令和输出结果都有所变化(参考Kafka-2.11-1.1.0版本)
可参考网址:Kafka压力测试(写入MQ消息压测和消费MQ消息压测)、Kafka压力测试(自带测试脚本)(单机版)、http://muxiulin.cn/archives/435
./kafka-producer-perf-test.sh --topic test_perf --num-records 100000 --record-size 1000 --throughput 2000 --producer-props bootstrap.servers=localhost:9092
# 输出:
records sent, 1202.4 records/sec (1.15 MB/sec), 1678.8 ms avg latency, 2080.0 max latency.
records sent, 2771.8 records/sec (2.64 MB/sec), 1300.4 ms avg latency, 2344.0 max latency.
records sent, 2061.6 records/sec (1.97 MB/sec), 17.1 ms avg latency, 188.0 max latency.
records sent, 1976.6 records/sec (1.89 MB/sec), 10.0 ms avg latency, 177.0 max latency.
records sent, 2025.2 records/sec (1.93 MB/sec), 15.4 ms avg latency, 253.0 max latency.
records sent, 2000.8 records/sec (1.91 MB/sec), 6.1 ms avg latency, 163.0 max latency.
records sent, 1929.7 records/sec (1.84 MB/sec), 3.7 ms avg latency, 128.0 max latency.
records sent, 2072.0 records/sec (1.98 MB/sec), 14.1 ms avg latency, 163.0 max latency.
records sent, 2001.6 records/sec (1.91 MB/sec), 4.5 ms avg latency, 116.0 max latency.
records sent, 1997.602877 records/sec (1.91 MB/sec), 290.41 ms avg latency, 2344.00 ms max latency, 2 ms 50th, 1992 ms 95th, 2177 ms 99th, 2292 ms 99.9th.
./kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test_perf --fetch-size 1048576 --messages 100000 --threads 1
# 输出:
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2018-12-06 05:50:41:276, 2018-12-06 05:50:45:281, 95.3674, 23.8121, 100000, 24968.7890, 78, 3927, 24.2851, 25464.7313
注意:在执行命令参数--zookeeper h40:2181,h41:2181,h42:2181/kafka
中的/kafka
一定要和server.properties中的zookeeper.connect=h40:2181,h41:2181,h42:2181/kafka
配置项一致。一开始server.properties中配置为zookeeper.connect=h40:2181,h41:2181,h42:2181
,而我执行命令的时候却加了/kafka报这个错:
Error while executing topic command org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids
org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
at kafka.utils.ZkUtils$.getChildren(ZkUtils.scala:468)
at kafka.utils.ZkUtils$.getSortedBrokerList(ZkUtils.scala:78)
at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:170)
at kafka.admin.TopicCommand$.createTopic(TopicCommand.scala:93)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:55)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/ids
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:99)
at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 8 more
9.启动Producer,并向我们上面创建的Topic中生产消息:
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-console-producer.sh --broker-list h40:9092,h41:9092,h42:9092 --topic test
10.新开窗口,创建消费者consumer:
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-console-consumer.sh --zookeeper h40:2181,h41:2181,h42:2181/kafka --topic test --from-beginning
# 注:--from-beginning参数为将该topic中的消息从头全部读出,不加的话只会读出新增加的消息
注意:上面这个命令是0.8版本的,如果你是高版本的话不再支持此消费命令
新的消费命令如下:
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-console-consumer.sh --bootstrap-server h40:9092/kafka --topic test --from-beginning
注:参数--bootstrap-server
后面跟的端口配置在kafka的config/server.properties
文件listeners
参数中。broker-list、bootstrap-server以及zookeeper的区别可以参考这篇文章:
11.查看kafka版本号:
kafka并没有什么命令可以查看具体的版本,那么怎么去查看安装的kafka版本呢?进到kafka的安装目录,执行下列语句:
find ./libs/ -name \*kafka_\* | head -1 | grep -o '\kafka[^\n]*'
二、kafka_2.11-2.2.1单机快速安装
1.前言:
安装Kafka之前得先安装jdk,最好就是1.8及以上。运行kafka需要使用Zookeeper,先启动Zookeeper,如果没有Zookeeper,可以使用kafka自带打包和配置好的Zookeeper。我这里为了快速安装使用的是kafka自带的Zookeeper,下载解压相应的kafka的tar包,配置文件啥的基本都不用改动,进入kafka安装目录启动Zookeeper:
前台运行:
bin/zookeeper-server-start.sh config/zookeeper.properties
后台运行:
nohup bin/zookeeper-server-start.sh config/zookeeper.properties > zookeeper-run.log 2>&1 &
2.启动kafka:
前台运行:
bin/kafka-server-start.sh config/server.properties
后台运行:
nohup bin/kafka-server-start.sh config/server.properties > kafka-run.log 2>&1 &
三、删除topic
1.准备:
在kafka集群的所有三台机器中修改配置文件server.properties,添加如下配置:delete.topic.enable=true
当执行创建命令bin/kafka-topics.sh --create --zookeeper h40:2181/kafka --replication-factor 2 --partitions 2 --topic hui
后,可以查看每个节点的/tmp/kafka-logs
(server.properties文件log.dirs配置,默认为"/tmp/kafka-logs",不同broker下存储的topic不一定相同,所有broker都要看一下)目录的变化:
[root@h40 ~]# ll /tmp/kafka-logs/
total 16
drwxrwxr-x 2 hadoop hadoop 4096 Jun 20 16:35 hui-0
drwxrwxr-x 2 hadoop hadoop 4096 Jun 20 16:35 hui-1
-rw-rw-r-- 1 hadoop hadoop 20 Jun 20 16:35 recovery-point-offset-checkpoint
-rw-rw-r-- 1 hadoop hadoop 20 Jun 20 16:35 replication-offset-checkpoint
[root@h41 ~]# ll /tmp/kafka-logs/
total 12
drwxrwxr-x 2 hadoop hadoop 4096 Jun 20 16:35 hui-1
-rw-rw-r-- 1 hadoop hadoop 12 Jun 20 16:36 recovery-point-offset-checkpoint
-rw-rw-r-- 1 hadoop hadoop 12 Jun 20 16:36 replication-offset-checkpoint
[root@h42 ~]# ll /tmp/kafka-logs/
total 12
drwxrwxr-x 2 hadoop hadoop 4096 Jun 20 16:36 hui-0
-rw-rw-r-- 1 hadoop hadoop 12 Jun 20 16:36 recovery-point-offset-checkpoint
-rw-rw-r-- 1 hadoop hadoop 12 Jun 20 16:37 replication-offset-checkpoint
2.进入zookeeper客户端查看对应topic:
[hadoop@h40 ~]$ cd zookeeper-3.4.5/
[hadoop@h40 zookeeper-3.4.5]$ bin/zkCli.sh
# 找到topic所在的目录:
[zk: localhost:2181(CONNECTED) 17] ls /kafka/brokers/topics
[hui]
[zk: localhost:2181(CONNECTED) 51] ls /kafka/config/topics
[hui]
查看Toptics:
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-topics.sh --list --zookeeper h40:2181/kafka
hui
3.重启kafka集群后进行删除操作:
[hadoop@h40 kafka_2.10-0.8.2.0]$ ./bin/kafka-topics.sh --delete --zookeeper h40:2181/kafka --topic hui
再执行查看Toptics命令会发现该Toptic已被删除。
如果kafaka启动时加载的配置文件中server.properties没有配置delete.topic.enable=true
,那么此时的删除并不是真正的删除,而是把topic标记为:marked for deletion
[hadoop@h40 kafka_2.10-0.8.2.0]$ ./bin/kafka-topics.sh --delete --zookeeper h40:2181/kafka --topic hui
Topic hui is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.
[hadoop@h40 kafka_2.10-0.8.2.0]$ bin/kafka-topics.sh --list --zookeeper h40:2181/kafka
hui - marked for deletion
注:被标记为marked for deletion的topic你可以在zookeeper客户端中通过命令获得:ls /admin/delete_topics/【topic name】,如果你删除了此处的topic,那么marked for deletion标记消失。
注意:网上总是说有第二种删除Topic的方法,那就是在配置文件中server.properties没有配置delete.topic.enable=true
,然后再删除kafka存储目录(/tmp/kafka-logs)下对应的topic,并且在zookeeper客户端中的/kafka/brokers/topics目录下删掉对应topic:rmr /kafka/brokers/topics/【topic name】
可是我按上述做法后是可以正常删除相应的Topic,但是在删除后再创建一个一模一样的Topic的话(再次执行bin/kafka-topics.sh --create --zookeeper h40:2181/kafka --replication-factor 2 --partitions 2 --topic hui
命令,如果再次创建的Topic名称不一样的话也倒不会出现这个问题),在/tmp/kafka-logs目录下不会再次产生相应的Topic目录,虽然该Topic可以正常生产和消费,但是在启动kafka进程窗口的控制台会报下面这个错:
[2017-06-20 19:03:25,799] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler)
java.io.FileNotFoundException: /tmp/kafka-logs/hui-0/00000000000000000000.index (No such file or directory)
0.10.1.0版本则会一开始在控制台一直循环报这个错:
[2017-06-20 16:13:20,327] ERROR [KafkaApi-0] Error when handling request Name: FetchRequest; Version: 3; CorrelationId: 179; ClientId: ReplicaFetcherThread-0-0; ReplicaId: 2; MaxWait: 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: ([hui,1],PartitionFetchInfo(0,1048576)) (kafka.server.KafkaApis)
kafka.common.NotAssignedReplicaException: Leader 0 failed to record follower 2's position 0 since the replica is not recognized to be one of the assigned replicas 0,1 for partition [hui,1].
解决:将kafka集群重启后又就不会报这个错并且一切正常,并且在/tmp/kafka-logs目录下也会生成相应的Topic。
第二种方法虽然可行,但是比较繁琐,所以说我并不建议使用这种方法。既然修改配置文件简单可行,那你为什么又非要舍近求远呢???!