写入__consumer_offsets
之前介绍的 storeOffsets 方法调用它写入已提交位移消息。而本篇介绍的 storeGroup 方法调用它写入消费者组注册消息,即向 Coordinator 注册消费者组。
def storeGroup(group: GroupMetadata,
groupAssignment: Map[String, Array[Byte]],
responseCallback: Errors => Unit): Unit = {
// 判断当前Broker是否是该消费者组的Coordinator
getMagic(partitionFor(group.groupId)) match {
case Some(magicValue) => // 如果当前Broker是Coordinator
// We always use CREATE_TIME, like the producer. The conversion to LOG_APPEND_TIME (if necessary) happens automatically.
val timestampType = TimestampType.CREATE_TIME
val timestamp = time.milliseconds()
// 构建注册消息的Key
val key = GroupMetadataManager.groupMetadataKey(group.groupId)
// 构建注册消息的Value
val value = GroupMetadataManager.groupMetadataValue(group, groupAssignment, interBrokerProtocolVersion)
// 使用 Key 和 Value 构建待写入消息集合。这里的消息集合类是 MemoryRecords。
val records = {
val buffer = ByteBuffer.allocate(AbstractRecords.estimateSizeInBytes(magicValue, compressionType,
Seq(new SimpleRecord(timestamp, key, value)).asJava))
val builder = MemoryRecords.builder(buffer, magicValue, compressionType, timestampType, 0L)
builder.append(timestamp, key, value)
builder.build()
}
// 计算要写入的目标分区
val groupMetadataPartition = new TopicPartition(Topic.GROUP_METADATA_TOPIC_NAME, partitionFor(group.groupId))
val groupMetadataRecords = Map(groupMetadataPartition -> records)
val generationId = group.generationId
// set the callback function to insert the created group into cache after log append completed
// 就是当消息被写入到位移主题后,填充 Cache。
def putCacheCallback(responseStatus: Map[TopicPartition, PartitionResponse]): Unit = {
// the append response should only contain the topics partition
if (responseStatus.size != 1 || !responseStatus.contains(groupMetadataPartition))
throw new IllegalStateException("Append status %s should only have one partition %s"
.format(responseStatus, groupMetadataPartition))
// construct the error status in the propagated assignment response in the cache
val status = responseStatus(groupMetadataPartition)
val responseError = if (status.error == Errors.NONE) {
Errors.NONE
} else {
debug(s"Metadata from group ${group.groupId} with generation $generationId failed when appending to log " +
s"due to ${status.error.exceptionName}")
// transform the log append error code to the corresponding the commit status error code
status.error match {
case Errors.UNKNOWN_TOPIC_OR_PARTITION
| Errors.NOT_ENOUGH_REPLICAS
| Errors.NOT_ENOUGH_REPLICAS_AFTER_APPEND =>
Errors.COORDINATOR_NOT_AVAILABLE
case Errors.NOT_LEADER_FOR_PARTITION
| Errors.KAFKA_STORAGE_ERROR =>
Errors.NOT_COORDINATOR
case Errors.REQUEST_TIMED_OUT =>
Errors.REBALANCE_IN_PROGRESS
case Errors.MESSAGE_TOO_LARGE
| Errors.RECORD_LIST_TOO_LARGE
| Errors.INVALID_FETCH_SIZE =>
error(s"Appending metadata message for group ${group.groupId} generation $generationId failed due to " +
s"${status.error.exceptionName}, returning UNKNOWN error code to the client")
Errors.UNKNOWN_SERVER_ERROR
case other =>
error(s"Appending metadata message for group ${group.groupId} generation $generationId failed " +
s"due to unexpected error: ${status.error.exceptionName}")
other
}
}
responseCallback(responseError)
}
// 向位移主题写入消息
// 该方法就是调用 ReplicaManager 的 appendRecords 方法,将消息写入到位移主题中。
appendForGroup(group, groupMetadataRecords, putCacheCallback)
case None => // 如果当前Broker不是Coordinator
// 返回NOT_COORDINATOR异常
responseCallback(Errors.NOT_COORDINATOR)
None
}
}
读取__consumer_offsets主题
查询位移时,Coordinator 只会从 GroupMetadata 元数据缓存中查找对应的位移值,而不会读取位移主题。真正需要读取位移主题的时机,是在当前 Broker 当选 Coordinator,也就是 Broker 成为了位移主题某分区的 Leader 副本时。
一旦当前 Broker 当选为位移主题某分区的 Leader 副本,它就需要将它内存中的元数据缓存填充起来,因此需要读取位移主题。在代码中,这是由 scheduleLoadGroupAndOffsets 方法完成的。该方法会创建一个异步任务,来读取位移主题消息,并填充缓存。
def scheduleLoadGroupAndOffsets(offsetsPartition: Int, onGroupLoaded: GroupMetadata => Unit): Unit = {
val topicPartition = new TopicPartition(Topic.GROUP_METADATA_TOPIC_NAME, offsetsPartition)
if (addLoadingPartition(offsetsPartition)) {
info(s"Scheduling loading of offsets and group metadata from $topicPartition")
scheduler.schedule(topicPartition.toString, () => loadGroupsAndOffsets(topicPartition, onGroupLoaded))
} else {
info(s"Already loading offsets and group metadata from $topicPartition")
}
}
loadGroupsAndOffsets 方法本质上是调用 doLoadGroupsAndOffsets 方法实现的位移主题读取,它要做两件事请:加载消费者组;加载消费者组的位移。
private def doLoadGroupsAndOffsets(topicPartition: TopicPartition,
onGroupLoaded: GroupMetadata => Unit): Unit = { // 加载完成后要执行的逻辑
// 第 1 部分:读取consumer_offsets主题文件
// 获取位移主题指定分区的LEO值
// 如果当前Broker不是该分区的Leader副本,则返回-1
def logEndOffset: Long = replicaManager.getLogEndOffset(topicPartition).getOrElse(-1L)
// 读取位移主题目标分区的日志对象
replicaManager.getLog(topicPartition) match {
// 如果无法获取到日志对象
case None =>
warn(s"Attempted to load offsets and group metadata from $topicPartition, but found no log")
case Some(log) =>
// 已完成位移值加载的分区列表
val loadedOffsets = mutable.Map[GroupTopicPartition, CommitRecordMetadataAndOffset]()
// 处于位移加载中的分区列表,只用于Kafka事务
val pendingOffsets = mutable.Map[Long, mutable.Map[GroupTopicPartition, CommitRecordMetadataAndOffset]]()
// 已完成组信息加载的消费者组列表
val loadedGroups = mutable.Map[String, GroupMetadata]()
// 待移除的消费者组列表
val removedGroups = mutable.Set[String]()
// buffer may not be needed if records are read from memory
// 保存消息集合的ByteBuffer缓冲区
var buffer = ByteBuffer.allocate(0)
// loop breaks if leader changes at any time during the load, since logEndOffset is -1
// 位移主题目标分区日志起始位移值
var currOffset = log.logStartOffset
// loop breaks if no records have been read, since the end of the log has been reached
// 至少要求读取一条消息
var readAtLeastOneRecord = true
// 当前读取位移<LEO,且至少要求读取一条消息,且GroupMetadataManager未关闭
while (currOffset < logEndOffset && readAtLeastOneRecord && !shuttingDown.get()) {
// 读取位移主题指定分区,调用Log.scala的read方法
val fetchDataInfo = log.read(currOffset,
maxLength = config.loadBufferSize,
isolation = FetchLogEnd,
minOneMessage = true)
// 如果无消息可读,则不再要求至少读取一条消息
readAtLeastOneRecord = fetchDataInfo.records.sizeInBytes > 0
// 创建消息集合
// 由于 doLoadGroupsAndOffsets 方法要将读取的消息填充到缓存中,
// 因此,这里必须做出 MemoryRecords 类型的消息集合。这就是第二路 case 分支要将 FileRecords 转换成 MemoryRecords 类型的原因。
val memRecords = fetchDataInfo.records match {
case records: MemoryRecords => records
case fileRecords: FileRecords =>
val sizeInBytes = fileRecords.sizeInBytes
val bytesNeeded = Math.max(config.loadBufferSize, sizeInBytes)
// minOneMessage = true in the above log.read means that the buffer may need to be grown to ensure progress can be made
if (buffer.capacity < bytesNeeded) {
if (config.loadBufferSize < bytesNeeded)
warn(s"Loaded offsets and group metadata from $topicPartition with buffer larger ($bytesNeeded bytes) than " +
s"configured offsets.load.buffer.size (${config.loadBufferSize} bytes)")
buffer = ByteBuffer.allocate(bytesNeeded)
} else {
buffer.clear()
}
fileRecords.readInto(buffer, 0)
MemoryRecords.readableRecords(buffer)
}
// 第 2 部分:处理消息集合。
memRecords.batches.asScala.foreach { batch =>
// 遍历消息集合的每个消息批次(RecordBatch)
val isTxnOffsetCommit = batch.isTransactional
// 如果是控制类消息批次
// 控制类消息批次属于Kafka事务范畴,这里不展开讲
if (batch.isControlBatch) {
val recordIterator = batch.iterator
if (recordIterator.hasNext) {
val record = recordIterator.next()
val controlRecord = ControlRecordType.parse(record.key)
if (controlRecord == ControlRecordType.COMMIT) {
pendingOffsets.getOrElse(batch.producerId, mutable.Map[GroupTopicPartition, CommitRecordMetadataAndOffset]())
.foreach {
case (groupTopicPartition, commitRecordMetadataAndOffset) =>
if (!loadedOffsets.contains(groupTopicPartition) || loadedOffsets(groupTopicPartition).olderThan(commitRecordMetadataAndOffset))
loadedOffsets.put(groupTopicPartition, commitRecordMetadataAndOffset)
}
}
pendingOffsets.remove(batch.producerId)
}
} else {
// 保存消息批次第一条消息的位移值
var batchBaseOffset: Option[Long] = None
// 遍历消息批次下的所有消息
for (record <- batch.asScala) {
// 确保消息必须有Key,否则抛出异常
require(record.hasKey, "Group metadata/offset entry key should not be null")
if (batchBaseOffset.isEmpty)
batchBaseOffset = Some(record.offset)
// 读取消息Key
GroupMetadataManager.readMessageKey(record.key) match {
// 如果是OffsetKey,说明是提交位移消息
case offsetKey: OffsetKey =>
if (isTxnOffsetCommit && !pendingOffsets.contains(batch.producerId))
pendingOffsets.put(batch.producerId, mutable.Map[GroupTopicPartition, CommitRecordMetadataAndOffset]())
// load offset
val groupTopicPartition = offsetKey.key
// 如果该消息没有Value,那么此消费者组可以删掉
if (!record.hasValue) {
if (isTxnOffsetCommit)
pendingOffsets(batch.producerId).remove(groupTopicPartition)
else
// 将目标分区从已完成位移值加载的分区列表中移除
loadedOffsets.remove(groupTopicPartition)
} else {
val offsetAndMetadata = GroupMetadataManager.readOffsetMessageValue(record.value)
if (isTxnOffsetCommit)
pendingOffsets(batch.producerId).put(groupTopicPartition, CommitRecordMetadataAndOffset(batchBaseOffset, offsetAndMetadata))
else
// 将目标分区加入到已完成位移值加载的分区列表
loadedOffsets.put(groupTopicPartition, CommitRecordMetadataAndOffset(batchBaseOffset, offsetAndMetadata))
}
// 如果是GroupMetadataKey,说明是注册消息
case groupMetadataKey: GroupMetadataKey =>
// load group metadata
val groupId = groupMetadataKey.key
val groupMetadata = GroupMetadataManager.readGroupMessageValue(groupId, record.value, time)
// 如果消息Value不为空
if (groupMetadata != null) {
// 把该消费者组从待移除消费者组列表中移除
removedGroups.remove(groupId)
// 将消费者组加入到已完成加载的消费组列表
loadedGroups.put(groupId, groupMetadata)
} else {
// 把该消费者组从已完成加载的组列表中移除
loadedGroups.remove(groupId)
// 将消费者组加入到待移除消费组列表
removedGroups.add(groupId)
}
// 如果是未知类型的Key,抛出异常
case unknownKey =>
throw new IllegalStateException(s"Unexpected message key $unknownKey while loading offsets and group metadata")
}
}
}
// 更新读取位置到消息批次最后一条消息的位移值+1,等待下次while循环
currOffset = batch.nextOffset
}
}
// 第 3 部分:处理loadedOffsets
// 对 loadedOffsets 进行分组,将那些已经完成组加载的消费者组位移值分到一组,保存在字段 groupOffsets 中;
// 将那些有位移值,但没有对应组信息的分成另外一组,也就是字段 emptyGroupOffsets 保存的数据。
val (groupOffsets, emptyGroupOffsets) = loadedOffsets
.groupBy(_._1.group)
// 提取出<组名,主题名,分区号>与位移值对
.mapValues(_.map { case (groupTopicPartition, offset) => (groupTopicPartition.topicPartition, offset) })
.partition { case (group, _) => loadedGroups.contains(group) }
// 事务性逻辑,不描述
val pendingOffsetsByGroup = mutable.Map[String, mutable.Map[Long, mutable.Map[TopicPartition, CommitRecordMetadataAndOffset]]]()
pendingOffsets.foreach { case (producerId, producerOffsets) =>
producerOffsets.keySet.map(_.group).foreach(addProducerGroup(producerId, _))
producerOffsets
.groupBy(_._1.group)
.mapValues(_.map { case (groupTopicPartition, offset) => (groupTopicPartition.topicPartition, offset)})
.foreach { case (group, offsets) =>
val groupPendingOffsets = pendingOffsetsByGroup.getOrElseUpdate(group, mutable.Map.empty[Long, mutable.Map[TopicPartition, CommitRecordMetadataAndOffset]])
val groupProducerOffsets = groupPendingOffsets.getOrElseUpdate(producerId, mutable.Map.empty[TopicPartition, CommitRecordMetadataAndOffset])
groupProducerOffsets ++= offsets
}
}
val (pendingGroupOffsets, pendingEmptyGroupOffsets) = pendingOffsetsByGroup
.partition { case (group, _) => loadedGroups.contains(group)}
// 处理loadedGroups
loadedGroups.values.foreach { group =>
// 提取消费者组的已提交位移
val offsets = groupOffsets.getOrElse(group.groupId, Map.empty[TopicPartition, CommitRecordMetadataAndOffset])
val pendingOffsets = pendingGroupOffsets.getOrElse(group.groupId, Map.empty[Long, mutable.Map[TopicPartition, CommitRecordMetadataAndOffset]])
debug(s"Loaded group metadata $group with offsets $offsets and pending offsets $pendingOffsets")
// 为已完成加载的组执行加载组操作
loadGroup(group, offsets, pendingOffsets)
// 为已完成加载的组执行加载组操作之后的逻辑
onGroupLoaded(group)
}
// load groups which store offsets in kafka, but which have no active members and thus no group
// metadata stored in the log
(emptyGroupOffsets.keySet ++ pendingEmptyGroupOffsets.keySet).foreach { groupId =>
val group = new GroupMetadata(groupId, Empty, time)
val offsets = emptyGroupOffsets.getOrElse(groupId, Map.empty[TopicPartition, CommitRecordMetadataAndOffset])
val pendingOffsets = pendingEmptyGroupOffsets.getOrElse(groupId, Map.empty[Long, mutable.Map[TopicPartition, CommitRecordMetadataAndOffset]])
debug(s"Loaded group metadata $group with offsets $offsets and pending offsets $pendingOffsets")
// 为空的消费者组执行加载组操作
loadGroup(group, offsets, pendingOffsets)
// 为空的消费者执行加载组操作之后的逻辑
onGroupLoaded(group)
}
// 处理removedGroups
removedGroups.foreach { groupId =>
// if the cache already contains a group which should be removed, raise an error. Note that it
// is possible (however unlikely) for a consumer group to be removed, and then to be used only for
// offset storage (i.e. by "simple" consumers)
if (groupMetadataCache.contains(groupId) && !emptyGroupOffsets.contains(groupId))
throw new IllegalStateException(s"Unexpected unload of active group $groupId while " +
s"loading partition $topicPartition")
}
}
}