kafka概述:

    kafka是一个分布式的基于发布/订阅模式的消息队列,主要应用于大数据实时处理领域。

消息队列概述:

    传统消息队列的应用场景 (图片来自尚硅谷哔哩哔哩kafka课程文档)

kafka为啥要集群 kafka集群作用_kafka

 

     使用消息队列的好处:

1. 解耦:允许你独立的扩展或者修改两边的处理过程,只要确保它们遵守同样的接口约束。

2. 可恢复性: 系统的一部分组件失效时,不会影响到整个系统。消息队列降低了进程间的耦合度,所以即使一个处理消息的进程挂掉,加入队列中的消息仍任可以在系统恢复后被处理。

3. 缓冲: 有助于控制和优化数据流经过系统的速度,解决生产消息和消费消息的处理速度不一致的情况。

4. 灵活性&&峰值处理能力: 在访问量剧增的情况下,应用任然需要继续发挥作用,但是这样的突发流量并不常见。如果专门为处理这种如法的流量峰值来投入资源随时待命,那么将是巨大的资源浪费。使用消息队列能够使关键组件顶住突发的压力,而不会因为突发的超负荷的请求而完全崩溃。

5. 异步通信: 很多时候,用户不想也不需要立即处理消息。消息队列提供了异步处理机制,允许用户把一个消息放入队列,但并不立即处理它。想向队列中放入多少消息就放多少,然后在需要的时候再去处理他们。

    消息队列的两种模式:

1. 点对点模式(一对一,消费者主动拉取数据,消息收到后消息清除)

    消息生产者生产消息发送到Queue(队列)中,然后消息消费者从Queue中取出并且消费信息。消息被消费以后,Queue中不再存储,所以消息消费者不可能消费到已经被消费的消息。Queue支持存在多个消费者,但是对一个消息而言,只会有一个消费者可以消费。

kafka为啥要集群 kafka集群作用_kafka_02

 

 2. 发布/订阅模式(一对多,消费者消费数据之后不会清除消息)

     消息生产者(发布)将消息发布到topic(主题)中,同时有多个消息消费者(订阅)消费该消息。和点对点方式不同,发布到topic的消息会被所有订阅者消费。

kafka为啥要集群 kafka集群作用_kafka_03

 Kafka基础架构

kafka为啥要集群 kafka集群作用_kafka_04

 

1. Producere:  消息生产者,就是向kafka broker 发消息的客户端。

2.  Consumer: 消息消费者,向kafka broker取消息的客户端。

3. Consumer group: 消费者组,由多个consumer组成。消费者组内每个消费者负责不同分区的数据,一个分区只能由一个组内消费者消费;消费者组之间相互不会影响。所有的消费者都属于某个消费者组,即消费者组是逻辑上的一个订阅者。

4. Broker : 一台kafka服务器就是一个broker。一个集群由多个broker组成。一个broker可以容纳多个topic。

5. Topic : 可以理解为一个队列,一个主题,生产者和消费者面向的都是一个topic。

6. Partition: 为了实现扩展性,一个非常大的topic可以分布到多个broker(kafka服务器)上,一个topic可以分为多个partition,每个partition是一个有序队列。

7. Replica: 副本,为保证集群中的某个节点发生故障时,该节点上的partirion数据不丢失,并且kafka任然能够继续工作,kafka提供了副本机制,一个topic的每个分区都有若干个副本,一个leadder和若干个follower。

8. Leader: 每个分区多个副本的“主”,生产者发送数据的对象,以及消费者消费数据的对象都是leadder。

9. Follower: 每个分区多个副本中的“从”,实时从leader中同步数据,保持和leader数据的同步。leader发生故障时,某个follower会成为新的leader。

 

在K8S集群中部署kafka集群:

集群规划:

master:192.168.11.113

node1:192.168.11.106

node2:192.168.11.116

 

以下操作都在master上操作:

1. 创建一个kafka的名称空间。

[root@master ~]# kubectl create namespace kafka

  查看创建的名称空间

[root@master ~]# kubectl get namespace kafka
NAME    STATUS   AGE
kafka   Active   23h

2. 安装zookeeper集群,kafka基于zookeeper做服务注册发现。

     2.1 创建zookeeper service 的yaml文件

[root@master ~]# cd yaml/kafka/
[root@master kafka]# vim zookeeper-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: zoo1
  labels:
    app: zookeeper-1
spec:
  ports:
  - name: client
    port: 2181
    protocol: TCP
  - name: follower
    port: 2888
    protocol: TCP
  - name: leader
    port: 3888
    protocol: TCP
  selector:
    app: zookeeper-1
---
apiVersion: v1
kind: Service
metadata:
  name: zoo2
  labels:
    app: zookeeper-2
spec:
  ports:
  - name: client
    port: 2181
    protocol: TCP
  - name: follower
    port: 2888
    protocol: TCP
  - name: leader
    port: 3888
    protocol: TCP
  selector:
    app: zookeeper-2
---
apiVersion: v1
kind: Service
metadata:
  name: zoo3
  labels:
    app: zookeeper-3
spec:
  ports:
  - name: client
    port: 2181
    protocol: TCP
  - name: follower
    port: 2888
    protocol: TCP
  - name: leader
    port: 3888
    protocol: TCP
  selector:
    app: zookeeper-3

  使用kubectl命令创建zookeeper svc

[root@master kafka]# kubectl apply -f zookeeper-svc.yaml

  查看zookeeper svc

[root@master kafka]# kubectl get svc -n kafka
NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      
zoo1              ClusterIP   10.1.125.242   <none>        2181/TCP,2888/TCP,3888/TCP   21h
zoo2              ClusterIP   10.1.211.184   <none>        2181/TCP,2888/TCP,3888/TCP   21h
zoo3              ClusterIP   10.1.85.246    <none>        2181/TCP,2888/TCP,3888/TCP   21h

  2.2 创建zookeeper-deployment.yaml 服务的yaml文件

[root@master kafka]# vim zookeeper-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: zookeeper-deployment-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: zookeeper-1
      name: zookeeper-1
  template:
    metadata:
      labels:
        app: zookeeper-1
        name: zookeeper-1
    spec:
      containers:
      - name: zoo1
        image: zookeeper
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 2181
        env:
        - name: ZOO_MY_ID
          value: "1"
        - name: ZOO_SERVERS
          value: "server.1=0.0.0.0:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zookeeper-deployment-2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: zookeeper-2
      name: zookeeper-2
  template:
    metadata:
      labels:
        app: zookeeper-2
        name: zookeeper-2
    spec:
      containers:
      - name: zoo2
        image: zookeeper
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 2181
        env:
        - name: ZOO_MY_ID
          value: "2"
        - name: ZOO_SERVERS
          value: "server.1=zoo1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zoo3:2888:3888;2181"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: zookeeper-deployment-3
spec:
  replicas: 1
  selector:
    matchLabels:
      app: zookeeper-3
      name: zookeeper-3
  template:
    metadata:
      labels:
        app: zookeeper-3
        name: zookeeper-3
    spec:
      containers:
      - name: zoo3
        image: zookeeper
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 2181
        env:
        - name: ZOO_MY_ID
          value: "3"
        - name: ZOO_SERVERS
          value: "server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=0.0.0.0:2888:3888;2181"

  创建zookeeper服务

kubectl apply -f zookeeper-deployment.yaml

  查看zookeeper pod 状态

[root@master kafka]# kubectl get pod -n kafka
NAME                                      READY   STATUS    RESTARTS   AGE
zookeeper-deployment-1-66cb7dcb-j4zcr     1/1     Running   1          20h
zookeeper-deployment-2-56fc4644c7-q6nx4   1/1     Running   1          20h
zookeeper-deployment-3-7c7789cf98-9m5kn   1/1     Running   1          20h

3. 部署kafka

    3.1 创建kafka svc的yaml文件

vim kafka-svc.yaml

apiVersion: v1
kind: Service
metadata:
  name: kafka-service-1
  labels:
    app: kafka-service-1
spec:
  type: NodePort
  ports:
  - port: 9092
    name: kafka-service-1
    targetPort: 9092
    nodePort: 30901
    protocol: TCP
  selector:
    app: kafka-service-1
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-service-2
  labels:
    app: kafka-service-2
spec:
  type: NodePort
  ports:
  - port: 9092
    name: kafka-service-2
    targetPort: 9092
    nodePort: 30902
    protocol: TCP
  selector:
    app: kafka-service-2
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-service-3
  labels:
    app: kafka-service-3
spec:
  type: NodePort
  ports:
  - port: 9092
    name: kafka-service-3
    targetPort: 9092
    nodePort: 30903
    protocol: TCP
  selector:
    app: kafka-service-3

  使用kubectl命令创建kafka svc

[root@master kafka]# kubectl apply -f kafka-svc.yaml

  查看SVC

[root@master kafka]# kubectl get svc -n kafka
NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
kafka-service-1   NodePort    10.1.246.238   <none>        9092:30901/TCP               24h
kafka-service-2   NodePort    10.1.214.60    <none>        9092:30902/TCP               24h
kafka-service-3   NodePort    10.1.252.232   <none>        9092:30903/TCP               24h
zoo1              ClusterIP   10.1.125.242   <none>        2181/TCP,2888/TCP,3888/TCP   24h
zoo2              ClusterIP   10.1.211.184   <none>        2181/TCP,2888/TCP,3888/TCP   24h
zoo3              ClusterIP   10.1.85.246    <none>        2181/TCP,2888/TCP,3888/TCP   24h

  创建 zookeeper-deployment.yaml 文件

[root@master kafka]# vim kafka-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-deployment-1
spec:
  replicas: 1
  selector:
    matchLabels:
      name: kafka-service-1
  template:
    metadata:
      labels:
        name: kafka-service-1
        app: kafka-service-1
    spec:
      containers:
      - name: kafka-1
        image: wurstmeister/kafka
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9092
        env:
        - name: KAFKA_ADVERTISED_PORT
          value: "9092"
        - name: KAFKA_ADVERTISED_HOST_NAME
          value: 10.1.246.238 
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: zoo1:2181,zoo2:2181,zoo3:2181
        - name: KAFKA_BROKER_ID
          value: "1"
        - name: KAFKA_CREATE_TOPICS
          value: mytopic:2:1
        - name: KAFKA_ADVERTISED_LISTENERS
          value: PLAINTEXT://192.168.11.113:30901
        - name: KAFKA_LISTENERS
          value: PLAINTEXT://0.0.0.0:9092
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-deployment-2
spec:
  replicas: 1
  selector:
    matchLabels:
      name: kafka-service-2
  template:
    metadata:
      labels:
        name: kafka-service-2
        app: kafka-service-2
    spec:
      containers:
      - name: kafka-2
        image: wurstmeister/kafka
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9092
        env:
        - name: KAFKA_ADVERTISED_PORT
          value: "9092"
        - name: KAFKA_ADVERTISED_HOST_NAME
          value: 10.1.214.60 
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: zoo1:2181,zoo2:2181,zoo3:2181
        - name: KAFKA_BROKER_ID
          value: "2"
        - name: KAFKA_ADVERTISED_LISTENERS
          value: PLAINTEXT://192.168.11.113:30902
        - name: KAFKA_LISTENERS
          value: PLAINTEXT://0.0.0.0:9092
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-deployment-3
spec:
  replicas: 1
  selector:
    matchLabels:
      name: kafka-service-3
  template:
    metadata:
      labels:
        name: kafka-service-3
        app: kafka-service-3
    spec:
      containers:
      - name: kafka-3
        image: wurstmeister/kafka
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9092
        env:
        - name: KAFKA_ADVERTISED_PORT
          value: "9092"
        - name: KAFKA_ADVERTISED_HOST_NAME
          value: 10.1.252.232 
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: zoo1:2181,zoo2:2181,zoo3:2181
        - name: KAFKA_BROKER_ID
          value: "3"
        - name: KAFKA_ADVERTISED_LISTENERS
          value: PLAINTEXT://192.168.11.113:30903
        - name: KAFKA_LISTENERS
          value: PLAINTEXT://0.0.0.0:9092

  说明: 

1.  KAFKA_ADVERTISED_HOST_NAME     对应  kafka 的svc cluster ip地址,可通过: kubectl get svc -n kafka

[root@master kafka]# kubectl get svc -n kafka
NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
kafka-service-1   NodePort    10.1.246.238   <none>        9092:30901/TCP               24h
kafka-service-2   NodePort    10.1.214.60    <none>        9092:30902/TCP               24h
kafka-service-3   NodePort    10.1.252.232   <none>        9092:30903/TCP               24h

2. KAFKA_ADVERTISED_LISTENERS  监听地址,填写master或者node1或者node2的主机地址加 K8S 所暴露的映射端口(30000 - 32767)

metadata:
  name: kafka-deployment-1
      ………………
          - name: KAFKA_ADVERTISED_LISTENERS
          value: PLAINTEXT://192.168.11.113:30901   
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-deployment-2
      ………………
          - name: KAFKA_ADVERTISED_LISTENERS
          value: PLAINTEXT://192.168.11.113:30902
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-deployment-3
      ………………
          - name: KAFKA_ADVERTISED_LISTENERS
          value: PLAINTEXT://192.168.11.113:30903

  使用kubectl创建kafka

[root@master kafka]# kubectl apply -f kafka-deployment.yaml

  查看 kafka的状态

[root@master kafka]# kubectl get pod -n kafka
NAME                                      READY   STATUS    RESTARTS   AGE
kafka-deployment-1-6d8d6c9bd4-vj9tf       1/1     Running   1          25h
kafka-deployment-2-6c94d5cf5b-cghvc       1/1     Running   1          25h
kafka-deployment-3-566cf5994b-mz4mh       1/1     Running   1          25h
zookeeper-deployment-1-66cb7dcb-j4zcr     1/1     Running   1          25h
zookeeper-deployment-2-56fc4644c7-q6nx4   1/1     Running   1          25h
zookeeper-deployment-3-7c7789cf98-9m5kn   1/1     Running   1          25h

测试,命令行测试

      你可以进入任意一个pod然后使用命令行进行kafka的操作:

我们测试进入 kafka-deployment-1-6d8d6c9bd4-vj9tf  模拟消息生产者:

[root@master ~]# kubectl exec -it kafka-deployment-1-6d8d6c9bd4-vj9tf -n kafka /bin/bash
bash-4.4#

  进入  kafka-deployment-3-566cf5994b-mz4mh  模拟消息消费者:

[root@master kafka]# kubectl exec -it kafka-deployment-3-566cf5994b-mz4mh -n kafka /bin/bash
bash-4.4#

  进入容器后,进入kafka的命令目录(生产者和消费者一样):

bash-4.4# cd /opt/kafka/bin/

  消息生产者新建 test topic,准备发消息:

具体命令格式:

kafka-console-producer.sh --broker-list <kafka-svc1-clusterIP>:9092,<kafka-svc2-clusterIP>:9092,<kafka-svc3-clusterIP>:9092 --topic test

 虚拟机性能不行,我这里只写一个地址,也可以连接多个地址测试 

bash-4.4# kafka-console-producer.sh --broker-list 10.1.246.238:9092 --topic test
>

  打开消费者容器终端,模拟消费者收消息:

bin/kafka-console-consumer.sh --bootstrap-server <任意kafka-svc-clusterIP>:9092 --topic test --from-beginning

  我还是连接一个地址

bash-4.4# kafka-console-consumer.sh --bootstrap-server 10.1.252.232:9092 --topic test --from-beginning

  然后在消息生产者容器中输入想要发送的消息

bash-4.4# kafka-console-producer.sh --broker-list 10.1.246.238:9092 --topic test
>野猪套天下第一
>test
>!@#$
>123
>

  发送完成在消息接收者容器里能够看到生产者发送的消息(在生产者输入信息,查看消费者是否能够接收到。如果接收到,说明运行成功。)

bash-4.4# kafka-console-consumer.sh --bootstrap-server 10.1.252.232:9092 --topic test --from-beginning
野猪套天下第一
test
!@#$
123

  最后,还可以执行下面的命令以测试列出所有的消息主题:

kafka-topics.sh --list --zookeeper [zookeeper的service的clusterIP]:2181


bash-4.4# kafka-topics.sh --list --zookeeper zoo1:2181
__consumer_offsets
mytopic
test                    # 我们创建的test


常用的命令和目录:

# 进入容器
kubectl exec -it kafka-deployment-1-xxxxxxxxxxx -n zookeeper /bin/bash
cd cd opt/kafka
 
# 查看topics
bin/kafka-topics.sh --list --zookeeper <任意zookeeper-svc-clusterIP>:2181

# 手动创建主题
bin/kafka-topics.sh --create --zookeeper <zookeeper-svc1-clusterIP>:2181,<zookeeper-svc2-clusterIP>:2181,<zookeeper-svc3-clusterIP>:2181 --topic test --partitions 3 --replication-factor 1

# 写(CTRL+D结束写内容)
bin/kafka-console-producer.sh --broker-list <kafka-svc1-clusterIP>:9092,<kafka-svc2-clusterIP>:9092,<kafka-svc3-clusterIP>:9092 --topic test

# 读(CTRL+C结束读内容)
bin/kafka-console-consumer.sh --bootstrap-server <任意kafka-svc-clusterIP>:9092 --topic test --from-beginning

  

注意: 如果集群内部使用kafka,可以使用cluster ip 或者 K8S内的dns服务名称加对应的端口

            如果集群外部想使用kafka,可以使用集群内任意一台主机的IP地址 加 映射出去的(30000 - 32767端口)

 

可以配合kuboard来方便的展示集群,例如:

kafka为啥要集群 kafka集群作用_kafka_05

 

 

 

# 至此,我们在K8S集群内部署不带存储的Kafka集群完毕,不做持久化。