Kafka 源码分析之Sender
Sender实现了Runnable接口,是一个位于后台的,向集群发送请求的线程。该线程发送元数据请求来更新集群视图,然后将请求发送到适当的节点。
其关键流程如下所示。
1.1 run方法过程
Sender实现了Runnable接口,其run方法过程如下
@Override
public void run() {
log.debug("Starting Kafka producer I/O thread.");
// main loop, runs until close is called
//1.一直循环,当调用了close的时候退出
while (running) {
try {
runOnce();
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}
log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");
// okay we stopped accepting requests but there may still be
// requests in the transaction manager, accumulator or waiting for acknowledgment,
// wait until these are completed.
//2. 调用close后,并且不是强制关闭,同时累加器accumulator中还存在数据或者client中还存在正在处理的请求时,继续调用runOnce方法
while (!forceClose && ((this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0) || hasPendingTransactionalRequests())) {
try {
runOnce();
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}
// Abort the transaction if any commit or abort didn't go through the transaction manager's queue
//3. 放弃事务执行,并调用runOnce方法
while (!forceClose && transactionManager != null && transactionManager.hasOngoingTransaction()) {
if (!transactionManager.isCompleting()) {
log.info("Aborting incomplete transaction due to shutdown");
transactionManager.beginAbort();
}
try {
runOnce();
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}
//4. 如果是强制关闭,则直接调用manager的close方法。
if (forceClose) {
// We need to fail all the incomplete transactional requests and batches and wake up the threads waiting on
// the futures.
if (transactionManager != null) {
log.debug("Aborting incomplete transactional requests due to forced shutdown");
transactionManager.close();
}
log.debug("Aborting incomplete batches due to forced shutdown");
this.accumulator.abortIncompleteBatches();
}
try {
this.client.close();
} catch (Exception e) {
log.error("Failed to close network client", e);
}
}
run方法的过程主要分为上面的四个步骤,一般情况下(如正常处理消息,或者非强制关闭时还存在消息未处理)都会调用runOnce方法来进行下一步的处理,否则强制关闭了,那么即便还存在数据也不会进行下一步的操作,直接close。
1.2 runOnce
runOnce方法实现了消息的一次发送过程
void runOnce() {
if (transactionManager != null) {
try {
transactionManager.maybeResolveSequences();
// do not continue sending if the transaction manager is in a failed state
if (transactionManager.hasFatalError()) {
RuntimeException lastError = transactionManager.lastError();
if (lastError != null) //1. 异常检查
maybeAbortBatches(lastError);
client.poll(retryBackoffMs, time.milliseconds()); //2.client阻塞直到有response数据返回或者超时返回。
return;
}
// Check whether we need a new producerId. If so, we will enqueue an InitProducerId
// request which will be sent below
//3. Kafka中通过produderId来实现幂等性
transactionManager.bumpIdempotentEpochAndResetIdIfNeeded();
if (maybeSendAndPollTransactionalRequest()) { //3. 也许发送了事务请求
return;
}
} catch (AuthenticationException e) {
// This is already logged as error, but propagated here to perform any clean ups.
log.trace("Authentication exception while processing transactional request", e);
transactionManager.authenticationFailed(e);
}
}
long currentTimeMs = time.milliseconds();
long pollTimeout = sendProducerData(currentTimeMs); //4.发送数据
client.poll(pollTimeout, currentTimeMs); //5.客户端阻塞直到有response数据返回或者超时返回。
}
在这个方法中,一开始进行了异常检查,当transactionmanager存在异常时,客户端调用poll方法阻塞然后直接return。然后调用maybeSendAndPollTransactionalRequest做下一步处理,maybeSendAndPollTransactionalRequest中会判断是否有正在处理的请求或者有FindCoordinator请求入队,是的话则返回,否则执行到第四步sendProducerData来发送数据。
1.3 maybeSendAndPollTransactionalRequest 方法过程
maybeSendAndPollTransactionalRequest 方法中大致分为以下两个过程
- 判断transactionManager是否正在处理请求,是的话则调用poll方法等待处理返回。
- 处理FindCoordinatorRequest请求。这一快暂时还不是很清楚
1.4 sendProducerData 方法过程
sendProducerData 方法主要是进行发送数据前的一些准备工作,包括获取集群信息,以及对要发送的数据进封装。
private long sendProducerData(long now) {
Cluster cluster = metadata.fetch(); //1. 根据元数据获取集群信息
// get the list of partitions with data ready to send
RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);
// if there are any partitions whose leaders are not known yet, force metadata update
if (!result.unknownLeaderTopics.isEmpty()) { //2.如果partitions的leader存在未知的情况,那么强制愿数据进行更新
// The set of topics with unknown leader contains topics with leader election pending as well as
// topics which may have expired. Add the topic again to metadata to ensure it is included
// and request metadata update, since there are messages to send to the topic.
for (String topic : result.unknownLeaderTopics)
this.metadata.add(topic, now);
log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
result.unknownLeaderTopics);
this.metadata.requestUpdate();
}
// remove any nodes we aren't ready to send to
Iterator<Node> iter = result.readyNodes.iterator(); //3. 移除不会发送请求的节点(通过迭代器来完成)
long notReadyTimeout = Long.MAX_VALUE;
while (iter.hasNext()) {
Node node = iter.next();
if (!this.client.ready(node, now)) {
iter.remove();
notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
}
}
// create produce requests
//4.通过累积器accumulator获取要发送的消息
Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);
addToInflightBatches(batches);
if (guaranteeMessageOrder) { //5.保证发送顺序
// Mute all the partitions drained
for (List<ProducerBatch> batchList : batches.values()) {
for (ProducerBatch batch : batchList)
this.accumulator.mutePartition(batch.topicPartition);
}
}
accumulator.resetNextBatchExpiryTime();
List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
expiredBatches.addAll(expiredInflightBatches);
... ...
sensors.updateProduceRequestMetrics(batches);
long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
pollTimeout = Math.max(pollTimeout, 0);
if (!result.readyNodes.isEmpty()) {
log.trace("Nodes with data ready to send: {}", result.readyNodes);
pollTimeout = 0;
}
sendProduceRequests(batches, now); //6.进行真正的数据发送
return pollTimeout;
}
代码比较长,但是逻辑也比较清晰。显示检查了集群的状况,然后获取要发送的数据batches,并通过sendProduceRequests方法进一步执行发送过程。当然里面的累积器也是比较重要的,他的数据结构和实现过程后面再进行分析。
1.5 sendProduceRequests
这个方法表短,通过for循环发送每个node的bathc数据
private void sendProduceRequests(Map<Integer, List<ProducerBatch>> collated, long now) {
for (Map.Entry<Integer, List<ProducerBatch>> entry : collated.entrySet())
sendProduceRequest(now, entry.getKey(), acks, requestTimeoutMs, entry.getValue());
}
1.6 sendProduceRequest
sendProduceRequest 中先对Batch数据进行了简单的处理,原来ProducerBatch通过List结构进行存储,现在将ProducerBatch中的TopicPartition数据取出来,并和ProducerBatch一起组织成Map数据结构。然后定义了请求的回调接口和请求,再通过client来将请求发送出去。
private void sendProduceRequest(long now, int destination, short acks, int timeout, List<ProducerBatch> batches) {
if (batches.isEmpty())
return;
final Map<TopicPartition, ProducerBatch> recordsByPartition = new HashMap<>(batches.size());
// find the minimum magic version used when creating the record sets
byte minUsedMagic = apiVersions.maxUsableProduceMagic();
for (ProducerBatch batch : batches) {
if (batch.magic() < minUsedMagic)
minUsedMagic = batch.magic();
}
ProduceRequestData.TopicProduceDataCollection tpd = new ProduceRequestData.TopicProduceDataCollection();
//1. 数据处理
for (ProducerBatch batch : batches) {
TopicPartition tp = batch.topicPartition;
MemoryRecords records = batch.records();
// down convert if necessary to the minimum magic used. In general, there can be a delay between the time
// that the producer starts building the batch and the time that we send the request, and we may have
// chosen the message format based on out-dated metadata. In the worst case, we optimistically chose to use
// the new message format, but found that the broker didn't support it, so we need to down-convert on the
// client before sending. This is intended to handle edge cases around cluster upgrades where brokers may
// not all support the same message format version. For example, if a partition migrates from a broker
// which is supporting the new magic version to one which doesn't, then we will need to convert.
if (!records.hasMatchingMagic(minUsedMagic))
records = batch.records().downConvert(minUsedMagic, 0, time).records();
ProduceRequestData.TopicProduceData tpData = tpd.find(tp.topic());
if (tpData == null) {
tpData = new ProduceRequestData.TopicProduceData().setName(tp.topic());
tpd.add(tpData);
}
tpData.partitionData().add(new ProduceRequestData.PartitionProduceData()
.setIndex(tp.partition())
.setRecords(records));
recordsByPartition.put(tp, batch);
}
String transactionalId = null;
if (transactionManager != null && transactionManager.isTransactional()) {
transactionalId = transactionManager.transactionalId();
}
ProduceRequest.Builder requestBuilder = ProduceRequest.forMagic(minUsedMagic,
new ProduceRequestData()
.setAcks(acks)
.setTimeoutMs(timeout)
.setTransactionalId(transactionalId)
.setTopicData(tpd));
//2. 创建回调接口
RequestCompletionHandler callback = response -> handleProduceResponse(response, recordsByPartition, time.milliseconds());
String nodeId = Integer.toString(destination);
//3. 封装发送请求
ClientRequest clientRequest = client.newClientRequest(nodeId, requestBuilder, now, acks != 0,
requestTimeoutMs, callback);
//4. 通过client来真正发送数据
client.send(clientRequest, now);
log.trace("Sent produce request to {}: {}", nodeId, requestBuilder);
}