Kafka 源码分析之Sender

Sender实现了Runnable接口,是一个位于后台的,向集群发送请求的线程。该线程发送元数据请求来更新集群视图,然后将请求发送到适当的节点。
其关键流程如下所示。

1.1 run方法过程

Sender实现了Runnable接口,其run方法过程如下

@Override
    public void run() {
        log.debug("Starting Kafka producer I/O thread.");

        // main loop, runs until close is called
        //1.一直循环,当调用了close的时候退出
        while (running) {
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }

        log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");

        // okay we stopped accepting requests but there may still be
        // requests in the transaction manager, accumulator or waiting for acknowledgment,
        // wait until these are completed.
        //2. 调用close后,并且不是强制关闭,同时累加器accumulator中还存在数据或者client中还存在正在处理的请求时,继续调用runOnce方法
        while (!forceClose && ((this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0) || hasPendingTransactionalRequests())) {
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }

  // Abort the transaction if any commit or abort didn't go through the transaction manager's queue
       //3. 放弃事务执行,并调用runOnce方法
        while (!forceClose && transactionManager != null && transactionManager.hasOngoingTransaction()) {
            if (!transactionManager.isCompleting()) {
                log.info("Aborting incomplete transaction due to shutdown");
                transactionManager.beginAbort();
            }
            try {
                runOnce();
            } catch (Exception e) {
                log.error("Uncaught error in kafka producer I/O thread: ", e);
            }
        }
        //4. 如果是强制关闭,则直接调用manager的close方法。
        if (forceClose) {
            // We need to fail all the incomplete transactional requests and batches and wake up the threads waiting on
            // the futures.
            if (transactionManager != null) {
                log.debug("Aborting incomplete transactional requests due to forced shutdown");
                transactionManager.close();
            }
            log.debug("Aborting incomplete batches due to forced shutdown");
            this.accumulator.abortIncompleteBatches();
        }
        try {
            this.client.close();
        } catch (Exception e) {
            log.error("Failed to close network client", e);
        }
    }

run方法的过程主要分为上面的四个步骤,一般情况下(如正常处理消息,或者非强制关闭时还存在消息未处理)都会调用runOnce方法来进行下一步的处理,否则强制关闭了,那么即便还存在数据也不会进行下一步的操作,直接close。

1.2 runOnce

runOnce方法实现了消息的一次发送过程

void runOnce() {
     if (transactionManager != null) {
         try {
             transactionManager.maybeResolveSequences();

             // do not continue sending if the transaction manager is in a failed state
             if (transactionManager.hasFatalError()) {
                 RuntimeException lastError = transactionManager.lastError();
                 if (lastError != null) //1. 异常检查
                     maybeAbortBatches(lastError);
                 client.poll(retryBackoffMs, time.milliseconds()); //2.client阻塞直到有response数据返回或者超时返回。
                 return;
             }

             // Check whether we need a new producerId. If so, we will enqueue an InitProducerId
             // request which will be sent below
             //3. Kafka中通过produderId来实现幂等性
             transactionManager.bumpIdempotentEpochAndResetIdIfNeeded();

             if (maybeSendAndPollTransactionalRequest()) { //3. 也许发送了事务请求
                 return;
             }
         } catch (AuthenticationException e) {
             // This is already logged as error, but propagated here to perform any clean ups.
             log.trace("Authentication exception while processing transactional request", e);
             transactionManager.authenticationFailed(e);
         }
     }

     long currentTimeMs = time.milliseconds();
     long pollTimeout = sendProducerData(currentTimeMs); //4.发送数据
     client.poll(pollTimeout, currentTimeMs); //5.客户端阻塞直到有response数据返回或者超时返回。
 }

在这个方法中,一开始进行了异常检查,当transactionmanager存在异常时,客户端调用poll方法阻塞然后直接return。然后调用maybeSendAndPollTransactionalRequest做下一步处理,maybeSendAndPollTransactionalRequest中会判断是否有正在处理的请求或者有FindCoordinator请求入队,是的话则返回,否则执行到第四步sendProducerData来发送数据。

1.3 maybeSendAndPollTransactionalRequest 方法过程

maybeSendAndPollTransactionalRequest 方法中大致分为以下两个过程

  1. 判断transactionManager是否正在处理请求,是的话则调用poll方法等待处理返回。
  2. 处理FindCoordinatorRequest请求。这一快暂时还不是很清楚

1.4 sendProducerData 方法过程

sendProducerData 方法主要是进行发送数据前的一些准备工作,包括获取集群信息,以及对要发送的数据进封装。

private long sendProducerData(long now) {
       Cluster cluster = metadata.fetch(); //1. 根据元数据获取集群信息
       // get the list of partitions with data ready to send
       RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);

       // if there are any partitions whose leaders are not known yet, force metadata update
       if (!result.unknownLeaderTopics.isEmpty()) { //2.如果partitions的leader存在未知的情况,那么强制愿数据进行更新
           // The set of topics with unknown leader contains topics with leader election pending as well as
           // topics which may have expired. Add the topic again to metadata to ensure it is included
           // and request metadata update, since there are messages to send to the topic.
           for (String topic : result.unknownLeaderTopics)
               this.metadata.add(topic, now);

           log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
               result.unknownLeaderTopics);
           this.metadata.requestUpdate();
       }

       // remove any nodes we aren't ready to send to
       Iterator<Node> iter = result.readyNodes.iterator(); //3. 移除不会发送请求的节点(通过迭代器来完成)
       long notReadyTimeout = Long.MAX_VALUE;
       while (iter.hasNext()) {
           Node node = iter.next();
           if (!this.client.ready(node, now)) {
               iter.remove();
               notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
           }
       }

       // create produce requests 
       //4.通过累积器accumulator获取要发送的消息
       Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);
       addToInflightBatches(batches);
       if (guaranteeMessageOrder) { //5.保证发送顺序
           // Mute all the partitions drained
           for (List<ProducerBatch> batchList : batches.values()) {
               for (ProducerBatch batch : batchList)
                   this.accumulator.mutePartition(batch.topicPartition);
           }
       }
       accumulator.resetNextBatchExpiryTime();
       List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
       List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
       expiredBatches.addAll(expiredInflightBatches);
       ... ...
       sensors.updateProduceRequestMetrics(batches);
       long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
       pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
       pollTimeout = Math.max(pollTimeout, 0);
       if (!result.readyNodes.isEmpty()) {
           log.trace("Nodes with data ready to send: {}", result.readyNodes);
           pollTimeout = 0;
       }
           sendProduceRequests(batches, now); //6.进行真正的数据发送
       return pollTimeout;
   }

代码比较长,但是逻辑也比较清晰。显示检查了集群的状况,然后获取要发送的数据batches,并通过sendProduceRequests方法进一步执行发送过程。当然里面的累积器也是比较重要的,他的数据结构和实现过程后面再进行分析。

1.5 sendProduceRequests

这个方法表短,通过for循环发送每个node的bathc数据

private void sendProduceRequests(Map<Integer, List<ProducerBatch>> collated, long now) {
     for (Map.Entry<Integer, List<ProducerBatch>> entry : collated.entrySet())
         sendProduceRequest(now, entry.getKey(), acks, requestTimeoutMs, entry.getValue());
 }

1.6 sendProduceRequest

sendProduceRequest 中先对Batch数据进行了简单的处理,原来ProducerBatch通过List结构进行存储,现在将ProducerBatch中的TopicPartition数据取出来,并和ProducerBatch一起组织成Map数据结构。然后定义了请求的回调接口和请求,再通过client来将请求发送出去。

private void sendProduceRequest(long now, int destination, short acks, int timeout, List<ProducerBatch> batches) {
      if (batches.isEmpty())
          return;

      final Map<TopicPartition, ProducerBatch> recordsByPartition = new HashMap<>(batches.size());

      // find the minimum magic version used when creating the record sets
      byte minUsedMagic = apiVersions.maxUsableProduceMagic();
      for (ProducerBatch batch : batches) {
          if (batch.magic() < minUsedMagic)
              minUsedMagic = batch.magic();
      }
      ProduceRequestData.TopicProduceDataCollection tpd = new ProduceRequestData.TopicProduceDataCollection();
      //1. 数据处理
      for (ProducerBatch batch : batches) {
          TopicPartition tp = batch.topicPartition;
          MemoryRecords records = batch.records();

          // down convert if necessary to the minimum magic used. In general, there can be a delay between the time
          // that the producer starts building the batch and the time that we send the request, and we may have
          // chosen the message format based on out-dated metadata. In the worst case, we optimistically chose to use
          // the new message format, but found that the broker didn't support it, so we need to down-convert on the
          // client before sending. This is intended to handle edge cases around cluster upgrades where brokers may
          // not all support the same message format version. For example, if a partition migrates from a broker
          // which is supporting the new magic version to one which doesn't, then we will need to convert.
          if (!records.hasMatchingMagic(minUsedMagic))
              records = batch.records().downConvert(minUsedMagic, 0, time).records();
          ProduceRequestData.TopicProduceData tpData = tpd.find(tp.topic());
          if (tpData == null) {
              tpData = new ProduceRequestData.TopicProduceData().setName(tp.topic());
              tpd.add(tpData);
          }
          tpData.partitionData().add(new ProduceRequestData.PartitionProduceData()
                  .setIndex(tp.partition())
                  .setRecords(records));
          recordsByPartition.put(tp, batch);
      }
      String transactionalId = null;
      if (transactionManager != null && transactionManager.isTransactional()) {
          transactionalId = transactionManager.transactionalId();
      }

      ProduceRequest.Builder requestBuilder = ProduceRequest.forMagic(minUsedMagic,
              new ProduceRequestData()
                      .setAcks(acks)
                      .setTimeoutMs(timeout)
                      .setTransactionalId(transactionalId)
                      .setTopicData(tpd));
                      //2. 创建回调接口
      RequestCompletionHandler callback = response -> handleProduceResponse(response, recordsByPartition, time.milliseconds());

      String nodeId = Integer.toString(destination);
      //3. 封装发送请求
      ClientRequest clientRequest = client.newClientRequest(nodeId, requestBuilder, now, acks != 0,
              requestTimeoutMs, callback);
              //4. 通过client来真正发送数据
      client.send(clientRequest, now);
      log.trace("Sent produce request to {}: {}", nodeId, requestBuilder);
  }