集群上新安装并启动了3个kafka Broker,代码打包上传至集群,运行后发现一直消费不到数据,
本地idea中debug后发现,程序一直阻塞在如下程序中,陷入了死循环。

  /**
     * Block until the coordinator for this group is known and is ready to receive requests.
     * 等待直到我们和服务端的GroupCoordinator取得连接
     */
    public void ensureCoordinatorReady() {
        while (coordinatorUnknown()) {//无法获取GroupCoordinator
            RequestFuture<Void> future = sendGroupCoordinatorRequest();//发送请求
            client.poll(future);//同步等待异步调用的结果
            if (future.failed()) {
                if (future.isRetriable())
                    client.awaitMetadataUpdate();
                else
                    throw future.exception();
            } else if (coordinator != null && client.connectionFailed(coordinator)) {
                // we found the coordinator, but the connection has failed, so mark
                // it dead and backoff before retrying discovery
                coordinatorDead();
                time.sleep(retryBackoffMs);//等待一段时间,然后重试
            }

        }
    }

流程大概说就是

  • consumer会从集群中选取一个broker作为coordinator
  • 然后group中的consumer会向coordinator发请求申请成为consumergroup中的leader
  • 最后有1个consumer会成为consumerLeader ,其他consumer成为follower
  • consumerLeader做分区分配任务,同步给coordinator
  • consumerFollower从coordinator同步分区分配数据

问题出现在第一步,意思就是说Consumer和服务端的GroupCoordinator无法取得连接,所以程序一直在等待状态。
看了下__consumer_offsets 这个topic情况,50个分区全在broker id为152的broker上

bin/kafka-topics.sh --describe --zookeeper localhost:2182 --topic __consumer_offsets
Topic:__consumer_offsets    PartitionCount:50    ReplicationFactor:1    Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets    Partition: 0    Leader: 152    Replicas: 152   Isr:152
    Topic: __consumer_offsets    Partition: 1    Leader: 152    Replicas: 152   Isr:152
    Topic: __consumer_offsets    Partition: 2    Leader: 152    Replicas: 152   Isr:152
    Topic: __consumer_offsets    Partition: 3    Leader: 152   
......

但是集群上并没有broker id为152的节点,想到该集群kafka节点曾经添加删除过节点,初步断定152是之前的kafka节点,后来该节点去掉后又加入新的节点但是zookeeper中的数据并没有更新。
所以就关闭broker,进入zookeeper客户端,将brokers节点下的topics节点下的__consumer_offsets删除,然后重启broker,注意,此时zookeeper上__consumer_offsets还并没有生成,要开启消费者之后才会生成.
然后再观察__consumer_offsets,分区已经均匀分布在三个broker上面了

 bin/kafka-topics.sh --zookeeper localhost:2182 --describe --topic __consumer_offsets
Topic:__consumer_offsets	PartitionCount:50	ReplicationFactor:3	Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
	Topic: __consumer_offsets	Partition: 0	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 1	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 2	Leader: 422	Replicas: 422,420,421	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 3	Leader: 420	Replicas: 420,422,421	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 4	Leader: 421	Replicas: 421,420,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 5	Leader: 422	Replicas: 422,421,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 6	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 7	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 8	Leader: 422	Replicas: 422,420,421	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 9	Leader: 420	Replicas: 420,422,421	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 10	Leader: 421	Replicas: 421,420,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 11	Leader: 422	Replicas: 422,421,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 12	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 13	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 14	Leader: 422	Replicas: 422,420,421	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 15	Leader: 420	Replicas: 420,422,421	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 16	Leader: 421	Replicas: 421,420,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 17	Leader: 422	Replicas: 422,421,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 18	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 19	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 20	Leader: 422	Replicas: 422,420,421	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 21	Leader: 420	Replicas: 420,422,421	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 22	Leader: 421	Replicas: 421,420,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 23	Leader: 422	Replicas: 422,421,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 24	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 25	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 26	Leader: 422	Replicas: 422,420,421	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 27	Leader: 420	Replicas: 420,422,421	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 28	Leader: 421	Replicas: 421,420,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 29	Leader: 422	Replicas: 422,421,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 30	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 31	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 32	Leader: 422	Replicas: 422,420,421	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 33	Leader: 420	Replicas: 420,422,421	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 34	Leader: 421	Replicas: 421,420,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 35	Leader: 422	Replicas: 422,421,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 36	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 37	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 38	Leader: 422	Replicas: 422,420,421	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 39	Leader: 420	Replicas: 420,422,421	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 40	Leader: 421	Replicas: 421,420,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 41	Leader: 422	Replicas: 422,421,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 42	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 43	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 44	Leader: 422	Replicas: 422,420,421	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 45	Leader: 420	Replicas: 420,422,421	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 46	Leader: 421	Replicas: 421,420,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 47	Leader: 422	Replicas: 422,421,420	Isr: 422,420,421
	Topic: __consumer_offsets	Partition: 48	Leader: 420	Replicas: 420,421,422	Isr: 420,422,421
	Topic: __consumer_offsets	Partition: 49	Leader: 421	Replicas: 421,422,420	Isr: 422,420,421

这个时候重启程序,发现已经可以正常消费了,问题解决。

参考资料: