• Kafka概述
  • 官网Apache Kafka
  • 传统上的认知,Kafka是一个消息队列这样的工具。随着发展,Kafka可以作为流处理平台。
  • 但是主流的流处理平台:spark、flink、storm等
  • Kafka可以实时处理。
  • Kafka的吞吐率是很高的,而且可以构建在廉价的机器上,和hadoop是一样的。
  • Kafka分布式、副本、容错性集群机制存储数据。可以持久化,落到一个磁盘上,不用担心数据丢失。
  • Kafka可以和流式数据对接。在离线的处理场景中,也可以使用Kafka。

 

  • Kafka核心术语(重要)
  • 1)官网Apache Kafka
  • 2)五个核心API:Producer API 、Consumer API 、Admin API 、Kafka Streams API、Kafka Connect API

kafka版本是不是要和spark一致 kafka spark_hadoop

 

  • 3)Broker
  • 一台Kafka服务器节点,负责消息或数据的读写请求,并存储信息。
  • Kafka Cluster是由多个Broker构成。
  • 4)Topic:主题。
  • 在Kafka中根据业务,不同的数据存放在不同的主题。
  • 即不同类别的消息存放在不同的topic里面,更清晰更方便下游的数据处理。
  • 5)Partition:分区。
  • 一个主题可以分为多个partition。
  • 后期可以对partition进行扩展。
  • 一个topic的多个分区的数据,是分布式存储在多个Broker上的。
  • 每个分区内部是有序的,但是一个topic(多个partition)不一定是有序的。
  • 一个partition对应一个broker,一个broker可以管理多个partition。
  • 每个partition都可以设置副本系数,创建topic的时候,指定副本系数。

 

kafka版本是不是要和spark一致 kafka spark_kafka版本是不是要和spark一致_02

 

  • 6)Producer:消息的生产者
  • 向Kafka Broker发消息的客户端。
  • 可以指定消息,按照某种规则,发送到topic指定的分区中去。
  • 7)Consumer:消息的消费者
  • 向Kafka Broker取消息的客户端
  • 每个消费者都要维护自己读取数据的offset/偏移量
  • 每个Consumer都有自己的Consumer Group
  • 每一个Consumer Group的Consumer消费同一个topic时,每个topic中相同的数据,只会被消费一次。
  • 所以不同Consumer Group消费同一个topic是互不影响的。
  • 8)每一条消息数据进入partition,都是有自己的offset。

 

  • Kafka单Broker部署
  • 1)下载:Apache Kafka
  • 大版本:2.5.0
  • 小版本:scala 2.12 -kafka_2.12-2.5.0.tgz
  • 2)software文件夹
[hadoop@spark000 software]$ ll
-rw-rw-r-- 1 hadoop hadoop  61604633 Apr 16  2020 kafka_2.12-2.5.0.tgz
  • 3)解压至app文件夹下
  • bin文件:存放脚本
  • config文件:配置文件
  • lib文件:依赖包
  • logs文件:日志文件
[hadoop@spark000 app]$ ll
drwxr-xr-x  7 hadoop hadoop  101 Jul 14  2020 kafka_2.12-2.5.0
  • 4)开始配置
  • 将kafka的配置目录,添加至环境变量中
[hadoop@spark000 config]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0/config
[hadoop@spark000 config]$ vi ~/.bash_profile
export KAFKA_HOME=/home/hadoop/app/kafka_2.12-2.5.0
export PATH=$KAFKA_HOME/bin:$PATH
[hadoop@spark000 bin]$ source ~/.bash_profile
[hadoop@spark000 bin]$ echo $KAFKA_HOME
/home/hadoop/app/kafka_2.12-2.5.0
  • 修改server.properties
[hadoop@spark000 config]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0/config
[hadoop@spark000 config]$ vi server.properties
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs

# root directory for all kafka znodes.
zookeeper.connect=spark000:2181
  • 5)启动
  • 切入zookeeper,并启动
[hadoop@spark000 config]$ cd $ZK_HOME
[hadoop@spark000 zookeeper-3.4.5-cdh5.16.2]$ cd bin
[hadoop@spark000 bin]$ ./zkServer.sh start         //6840 QuorumPeerMain
JMX enabled by default
Using config: /home/hadoop/app/zookeeper-3.4.5-cdh5.16.2/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@spark000 bin]$ jps
6887 Jps
6840 QuorumPeerMain
  • 启动Kafka服务器(这种前台方法不推荐)
[hadoop@spark000 bin]$ cd $KAFKA_HOME
[hadoop@spark000 kafka_2.12-2.5.0]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0
[hadoop@spark000 kafka_2.12-2.5.0]$ bin/kafka-server-start.sh config/server.properties
[hadoop@spark000 ~]$ jps
6840 QuorumPeerMain
7019 Kafka
7518 Jps
  • 启动Kafka服务器(推荐)
[hadoop@spark000 kafka_2.12-2.5.0]$ bin/kafka-server-start.sh -daemon config/server.properties
[hadoop@spark000 ~]$ kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
[hadoop@spark000 kafka_2.12-2.5.0]$ jps
7986 Jps
6840 QuorumPeerMain
7914 Kafka
  •  创建topic
[hadoop@spark000 ~]$ kafka-topics.sh --create --bootstrap-server spark000:9092 --replication-factor 1 --partitions 1 --topic testzhang
Created topic testzhang.
  • 展现所有topic
  • 学习时,建议使用相同版本,因为不同版本的命令参数,不同
[hadoop@spark000 ~]$ kafka-topics.sh --list --bootstrap-server spark000:9092

查看偏移量

[hadoop@spark000 bin]$ kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 127.0.0.1:9092 --topic access-topic-prod --time -1
access-topic-prod:0:2995
[hadoop@spark000 bin]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0/bin

  • 接下来,使用一个生产者生产数据,使用一个消费者消费数据。
  • 启动生产者
  • 这里是使用一个控制台的生产者kafka-console-producer.sh
  • 然后连到kafka的地址bootstrap-server spark000:9092
  • 最后数据是产生在某某topic中topic testzhang
[hadoop@spark000 ~]$ kafka-console-producer.sh --bootstrap-server spark000:9092 --topic testzhang
>
  • 启动消费者
  • 数据从头开始from-beginning
[hadoop@spark000 ~]$ kafka-console-consumer.sh --bootstrap-server spark000:9092 --topic testzhang --from-beginning

 

  • Kafka多Broker部署
  • 1)在单Broker启动时,是指定了server.properties。所以多指定几个server.properties就ok。
  • 2)复制配置文件
[hadoop@spark000 config]$ pwd
/home/hadoop/app/kafka_2.12-2.5.0/config
[hadoop@spark000 config]$ cp server.properties server-zhang0.properties
[hadoop@spark000 config]$ cp server.properties server-zhang1.properties
[hadoop@spark000 config]$ cp server.properties server-zhang2.properties
  • 3)修改配置文件,否则会冲突
  • 指定唯一id
  • 修改kafka端口
  • 修改log目录
[hadoop@spark000 config]$ vi server-zhang0.properties

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092

# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs-0
[hadoop@spark000 config]$ vi server-zhang1.properties

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1

# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9093

# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs-1
[hadoop@spark000 config]$ vi server-zhang2.properties

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=2

# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9094

# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/app/tmp/kafka-logs-2
  •  4)启动
  • 注意先启动zookeeper
[hadoop@spark000 config]$ kafka-server-start.sh -daemon $KAFKA_HOME/config/server-zhang0.properties    //kafaka
[hadoop@spark000 config]$ kafka-server-start.sh -daemon $KAFKA_HOME/config/server-zhang1.properties    //kafaka
[hadoop@spark000 config]$ kafka-server-start.sh -daemon $KAFKA_HOME/config/server-zhang2.properties    //kafaka
  •  5)建立有3副本的topic
[hadoop@spark000 config]$ kafka-topics.sh --create --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --replication-factor 3 --partitions 1 --topic zhang-replicated-topic
Created topic zhang-replicated-topic.
  •  6)查看broker上topic相关情况
[hadoop@spark000 config]$ kafka-topics.sh --describe --bootstrap-server spark000:9092 --topic zhang-replicated-topic
Topic: zhang-replicated-topic    PartitionCount: 1    ReplicationFactor: 3    Configs: segment.bytes=1073741824
    Topic: zhang-replicated-topic    Partition: 0    Leader: 1    Replicas: 1,0,2    Isr: 1,0,2
[hadoop@spark000 config]$ kafka-topics.sh --describe --bootstrap-server spark000:9093 --topic zhang-replicated-topic
Topic: zhang-replicated-topic    PartitionCount: 1    ReplicationFactor: 3    Configs: segment.bytes=1073741824
    Topic: zhang-replicated-topic    Partition: 0    Leader: 1    Replicas: 1,0,2    Isr: 1,0,2
[hadoop@spark000 config]$ kafka-topics.sh --describe --bootstrap-server spark000:9094 --topic zhang-replicated-topic
Topic: zhang-replicated-topic    PartitionCount: 1    ReplicationFactor: 3    Configs: segment.bytes=1073741824
    Topic: zhang-replicated-topic    Partition: 0    Leader: 1    Replicas: 1,0,2    Isr: 1,0,2
  • 7)查看数据如何存储
  • topic+分区号
[hadoop@spark000 ~]$ cd app/tmp/kafka-logs-0
[hadoop@spark000 kafka-logs-0]$ ls
zhang-replicated-topic-0

[hadoop@spark000 tmp]$ cd kafka-logs-1
[hadoop@spark000 kafka-logs-1]$ ls
zhang-replicated-topic-0

[hadoop@spark000 tmp]$ cd kafka-logs-2
[hadoop@spark000 kafka-logs-2]$ ls
zhang-replicated-topic-0
  • 8)使用
  • 生产者:向topic输出消息
[hadoop@spark000 ~]$ kafka-console-producer.sh --bootstrap-server spark000:9092 --topic zhang-replicated-topic
>test
>pk
>kafka
>spark
>zhangjieqiong
>
  • 消费者:接收信息
[hadoop@spark000 ~]$ kafka-console-consumer.sh --bootstrap-server spark000:9092 --from-beginning --topic zhang-replicated-topic
test
pk
kafka
spark
zhangjieqiong

 

  • 多broker容错性测试(集群)
  • 1)查看kafka状态
  • 副本是012,Isr:目前012处于正常状态。
[hadoop@spark000 ~]$ kafka-topics.sh --describe --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
Topic: zhang-replicated-topic    PartitionCount: 1    ReplicationFactor: 3    Configs: segment.bytes=1073741824
    Topic: zhang-replicated-topic    Partition: 0    Leader: 0    Replicas: 1,0,2    Isr: 0,1,2
  • 2)启动生产者
[hadoop@spark000 ~]$ kafka-console-producer.sh --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
>
  • 3)启动消费者
[hadoop@spark000 ~]$ kafka-console-consumer.sh --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
  • 4)在生产者中输入数据,进行测试。
  • 5)查看运行状态
[hadoop@spark000 ~]$ jps -m
7569 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang1.properties
6786 QuorumPeerMain /home/hadoop/app/zookeeper-3.4.5-cdh5.16.2/bin/../conf/zoo.cfg
8771 ConsoleProducer --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
10259 Jps -m
9589 ConsoleConsumer --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
7990 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang2.properties
7149 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang0.properties
  • 6)终止副本2的进程
[hadoop@spark000 ~]$ kill -9 7990
[hadoop@spark000 ~]$ jps -m
7569 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang1.properties
6786 QuorumPeerMain /home/hadoop/app/zookeeper-3.4.5-cdh5.16.2/bin/../conf/zoo.cfg
10290 Jps -m
8771 ConsoleProducer --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
9589 ConsoleConsumer --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
7149 Kafka /home/hadoop/app/kafka_2.12-2.5.0/config/server-zhang0.properties
[hadoop@spark000 ~]$ kafka-topics.sh --describe --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
Topic: zhang-replicated-topic    PartitionCount: 1    ReplicationFactor: 3    Configs: segment.bytes=1073741824
    Topic: zhang-replicated-topic    Partition: 0    Leader: 1    Replicas: 1,0,2    Isr: 0,1
  • 7)在生产者中输入数据,继续进行测试。(正常接收数据)
  • 8)测试,停止leader0的运行
  • 发现副本中的leader0停止运行后,是不能接收数据了。
[hadoop@spark000 ~]$ kill -9 7149
[hadoop@spark000 ~]$ jps
7569 Kafka
6786 QuorumPeerMain
8771 ConsoleProducer
9589 ConsoleConsumer
10730 Jps
[hadoop@spark000 ~]$ kafka-topics.sh --describe --bootstrap-server spark000:9092,spark000:9093,spark000:9094 --topic zhang-replicated-topic
[10:51:57,310] WARN [AdminClient clientId=adminclient-1] Connection to node -1 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,315] WARN [AdminClient clientId=adminclient-1] Connection to node -3 (spark000/192.168.131.66:9094) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,488] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,545] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,661] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:57,872] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:58,266] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:51:59,207] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:52:00,109] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:52:01,037] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[10:52:02,216] WARN [AdminClient clientId=adminclient-1] Connection to node 0 (spark000/192.168.131.66:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Topic: zhang-replicated-topic    PartitionCount: 1    ReplicationFactor: 3    Configs: segment.bytes=1073741824
    Topic: zhang-replicated-topic    Partition: 0    Leader: 1    Replicas: 1,0,2    Isr: 1