- Kafka概述
- 官网Apache Kafka
- 传统上的认知,Kafka是一个消息队列这样的工具。随着发展,Kafka可以作为流处理平台。
- 但是主流的流处理平台:spark、flink、storm等
- Kafka可以实时处理。
- Kafka的吞吐率是很高的,而且可以构建在廉价的机器上,和hadoop是一样的。
- Kafka分布式、副本、容错性集群机制存储数据。可以持久化,落到一个磁盘上,不用担心数据丢失。
- Kafka可以和流式数据对接。在离线的处理场景中,也可以使用Kafka。
- Kafka核心术语(重要)
- 1)官网Apache Kafka
- 2)五个核心API:Producer API 、Consumer API 、Admin API 、Kafka Streams API、Kafka Connect API
- 3)Broker:
- 一台Kafka服务器节点,负责消息或数据的读写请求,并存储信息。
- Kafka Cluster是由多个Broker构成。
- 4)Topic:主题。
- 在Kafka中根据业务,不同的数据存放在不同的主题。
- 即不同类别的消息存放在不同的topic里面,更清晰更方便下游的数据处理。
- 5)Partition:分区。
- 一个主题可以分为多个partition。
- 后期可以对partition进行扩展。
- 一个topic的多个分区的数据,是分布式存储在多个Broker上的。
- 每个分区内部是有序的,但是一个topic(多个partition)不一定是有序的。
- 一个partition对应一个broker,一个broker可以管理多个partition。
- 每个partition都可以设置副本系数,创建topic的时候,指定副本系数。
- 6)Producer:消息的生产者
- 向Kafka Broker发消息的客户端。
- 可以指定消息,按照某种规则,发送到topic指定的分区中去。
- 7)Consumer:消息的消费者
- 向Kafka Broker取消息的客户端
- 每个消费者都要维护自己读取数据的offset/偏移量
- 每个Consumer都有自己的Consumer Group
- 每一个Consumer Group的Consumer消费同一个topic时,每个topic中相同的数据,只会被消费一次。
- 所以不同Consumer Group消费同一个topic是互不影响的。
- 8)每一条消息数据进入partition,都是有自己的offset。
- Kafka单Broker部署
- 1)下载:Apache Kafka
- 大版本:2.5.0
- 小版本:scala 2.12 -kafka_2.12-2.5.0.tgz
- 2)software文件夹
[hadoop@spark000 software]$ ll
-rw-rw-r-- 1 hadoop hadoop 61604633 Apr 16 2020 kafka_2.12-2.5.0.tgz
- 3)解压至app文件夹下
- bin文件:存放脚本
- config文件:配置文件
- lib文件:依赖包
- logs文件:日志文件
[hadoop@spark000 app]$ ll
drwxr-xr-x 7 hadoop hadoop 101 Jul 14 2020 kafka_2.12-2.5.0
- 4)开始配置
- 将kafka的配置目录,添加至环境变量中
[hadoop@spark000 config]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0/config
[hadoop@spark000 config]$ vi ~/.bash_profile
export KAFKA_HOME=/home/hadoop/app/kafka_2.12-2.5.0
export PATH=$KAFKA_HOME/bin:$PATH
[hadoop@spark000 bin]$ source ~/.bash_profile
[hadoop@spark000 bin]$ echo $KAFKA_HOME
/home/hadoop/app/kafka_2.12-2.5.0
- 修改server.properties
[hadoop@spark000 config]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0/config
[hadoop@spark000 config]$ vi server.properties
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs
# root directory for all kafka znodes.
zookeeper.connect=spark000:2181
- 5)启动
- 切入zookeeper,并启动
[hadoop@spark000 config]$ cd $ZK_HOME
[hadoop@spark000 zookeeper-3.4.5-cdh5.16.2]$ cd bin
[hadoop@spark000 bin]$ ./zkServer.sh start //6840 QuorumPeerMain
JMX enabled by default
Using config: /home/hadoop/app/zookeeper-3.4.5-cdh5.16.2/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@spark000 bin]$ jps
6887 Jps
6840 QuorumPeerMain
- 启动Kafka服务器(这种前台方法不推荐)
[hadoop@spark000 bin]$ cd $KAFKA_HOME
[hadoop@spark000 kafka_2.12-2.5.0]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0
[hadoop@spark000 kafka_2.12-2.5.0]$ bin/kafka-server-start.sh config/server.properties
[hadoop@spark000 ~]$ jps
6840 QuorumPeerMain
7019 Kafka
7518 Jps
- 启动Kafka服务器(推荐)
[hadoop@spark000 kafka_2.12-2.5.0]$ bin/kafka-server-start.sh -daemon config/server.properties
[hadoop@spark000 ~]$ kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
[hadoop@spark000 kafka_2.12-2.5.0]$ jps
7986 Jps
6840 QuorumPeerMain
7914 Kafka
- 创建topic
[hadoop@spark000 ~]$ kafka-topics.sh --create --bootstrap-server spark000:9092 --replication-factor 1 --partitions 1 --topic testzhang
Created topic testzhang.
- 展现所有topic
- 学习时,建议使用相同版本,因为不同版本的命令参数,不同
[hadoop@spark000 ~]$ kafka-topics.sh --list --bootstrap-server spark000:9092
查看偏移量
[hadoop@spark000 bin]$ kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 127.0.0.1:9092 --topic access-topic-prod --time -1
access-topic-prod:0:2995
[hadoop@spark000 bin]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0/bin
- 接下来,使用一个生产者生产数据,使用一个消费者消费数据。
- 启动生产者
- 这里是使用一个控制台的生产者kafka-console-producer.sh
- 然后连到kafka的地址bootstrap-server spark000:9092
- 最后数据是产生在某某topic中topic testzhang
[hadoop@spark000 ~]$ kafka-console-producer.sh --bootstrap-server spark000:9092 --topic testzhang
>
- 启动消费者
- 数据从头开始from-beginning
[hadoop@spark000 ~]$ kafka-console-consumer.sh --bootstrap-server spark000:9092 --topic testzhang --from-beginning
- Kafka多Broker部署
- 1)在单Broker启动时,是指定了server.properties。所以多指定几个server.properties就ok。
- 2)复制配置文件
[hadoop@spark000 config]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0/config
[hadoop@spark000 config]$ cp server.properties server-zhang0.properties
[hadoop@spark000 config]$ cp server.properties server-zhang1.properties
[hadoop@spark000 config]$ cp server.properties server-zhang2.properties
- 3)修改配置文件,否则会冲突
- 指定唯一id
- 修改kafka端口
- 修改log目录
[hadoop@spark000 config]$ vi server-zhang0.properties
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092
# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs-0
[hadoop@spark000 config]$ vi server-zhang1.properties
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9093
# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs-1
[hadoop@spark000 config]$ vi server-zhang2.properties
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=2
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9094
# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs-2
- 4)启动
- 注意先启动zookeeper
[hadoop@spark000 config]$ kafka-server-start.sh -daemon $KAFKA_HOME/config/server-zhang0.properties //kafaka
[hadoop@spark000 config]$ kafka-server-start.sh -daemon $KAFKA_HOME/config/server-zhang1.properties //kafaka
[hadoop@spark000 config]$ kafka-server-start.sh -daemon $KAFKA_HOME/config/server-zhang2.properties //kafaka
- 5)建立有3副本的topic
[hadoop@spark000 config]$ kafka-topics.sh --create --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --replication-factor 3 --partitions 1 --topic zhang-replicated-topic
Created topic zhang-replicated-topic.
- 6)查看broker上topic相关情况
[hadoop@spark000 config]$ kafka-topics.sh --describe --bootstrap-server spark000:9092 --topic zhang-replicated-topic
Topic: zhang-replicated-topic PartitionCount: 1 ReplicationFactor: 3 Configs: segment.bytes=1073741824
Topic: zhang-replicated-topic Partition: 0 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2
[hadoop@spark000 config]$ kafka-topics.sh --describe --bootstrap-server spark000:9093 --topic zhang-replicated-topic
Topic: zhang-replicated-topic PartitionCount: 1 ReplicationFactor: 3 Configs: segment.bytes=1073741824
Topic: zhang-replicated-topic Partition: 0 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2
[hadoop@spark000 config]$ kafka-topics.sh --describe --bootstrap-server spark000:9094 --topic zhang-replicated-topic
Topic: zhang-replicated-topic PartitionCount: 1 ReplicationFactor: 3 Configs: segment.bytes=1073741824
Topic: zhang-replicated-topic Partition: 0 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2
- 7)查看数据如何存储
- topic+分区号
[hadoop@spark000 ~]$ cd app/tmp/kafka-logs-0
[hadoop@spark000 kafka-logs-0]$ ls
zhang-replicated-topic-0
[hadoop@spark000 tmp]$ cd kafka-logs-1
[hadoop@spark000 kafka-logs-1]$ ls
zhang-replicated-topic-0
[hadoop@spark000 tmp]$ cd kafka-logs-2
[hadoop@spark000 kafka-logs-2]$ ls
zhang-replicated-topic-0
- 8)使用
- 生产者:向topic输出消息
[hadoop@spark000 ~]$ kafka-console-producer.sh --bootstrap-server spark000:9092 --topic zhang-replicated-topic
>test
>pk
>kafka
>spark
>zhangjieqiong
>
- 消费者:接收信息
[hadoop@spark000 ~]$ kafka-console-consumer.sh --bootstrap-server spark000:9092 --from-beginning --topic zhang-replicated-topic
test
pk
kafka
spark
zhangjieqiong
- 多broker容错性测试(集群)
- 1)查看kafka状态
- 副本是012,Isr:目前012处于正常状态。
[hadoop@spark000 ~]$ kafka-topics.sh --describe --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
Topic: zhang-replicated-topic PartitionCount: 1 ReplicationFactor: 3 Configs: segment.bytes=1073741824
Topic: zhang-replicated-topic Partition: 0 Leader: 0 Replicas: 1,0,2 Isr: 0,1,2
- 2)启动生产者
[hadoop@spark000 ~]$ kafka-console-producer.sh --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
>
- 3)启动消费者
[hadoop@spark000 ~]$ kafka-console-consumer.sh --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
- 4)在生产者中输入数据,进行测试。
- 5)查看运行状态
[hadoop@spark000 ~]$ jps -m
7569 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang1.properties
6786 QuorumPeerMain /home/hadoop/app/zookeeper-3.4.5-cdh5.16.2/bin/../conf/zoo.cfg
8771 ConsoleProducer --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
10259 Jps -m
9589 ConsoleConsumer --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
7990 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang2.properties
7149 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang0.properties
- 6)终止副本2的进程
[hadoop@spark000 ~]$ kill -9 7990
[hadoop@spark000 ~]$ jps -m
7569 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang1.properties
6786 QuorumPeerMain /home/hadoop/app/zookeeper-3.4.5-cdh5.16.2/bin/../conf/zoo.cfg
10290 Jps -m
8771 ConsoleProducer --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
9589 ConsoleConsumer --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
7149 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang0.properties
[hadoop@spark000 ~]$ kafka-topics.sh --describe --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
Topic: zhang-replicated-topic PartitionCount: 1 ReplicationFactor: 3 Configs: segment.bytes=1073741824
Topic: zhang-replicated-topic Partition: 0 Leader: 1 Replicas: 1,0,2 Isr: 0,1
- 7)在生产者中输入数据,继续进行测试。(正常接收数据)
- 8)测试,停止leader0的运行
- 发现副本中的leader0停止运行后,是不能接收数据了。
[hadoop@spark000 ~]$ kill -9 7149
[hadoop@spark000 ~]$ jps
7569 Kafka
6786 QuorumPeerMain
8771 ConsoleProducer
9589 ConsoleConsumer
10730 Jps
[hadoop@spark000 ~]$ kafka-topics.sh --describe --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
[10:51:57,310] WARN [AdminClient clientId=adminclient-1] Connection to node -1 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,315] WARN [AdminClient clientId=adminclient-1] Connection to node -3 (spark000/192.168.131.66:9094) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,488] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,545] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,661] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,872] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:58,266] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:59,207] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:52:00,109] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:52:01,037] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:52:02,216] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Topic: zhang-replicated-topic PartitionCount: 1 ReplicationFactor: 3 Configs: segment.bytes=1073741824
Topic: zhang-replicated-topic Partition: 0 Leader: 1 Replicas: 1,0,2 Isr: 1