文章目录
- 消息存储结构
- 1、延迟消息源码分析
- 2、消息重试与进入死信队列源码分析
- 3、事务消息发送、提交、回查源码分析
- 1、half预提交消息发送与接收处理
- 2、提交、回滚事务消息
- 3、事务消息回查源码分析
消息存储结构
参考阿里官方文章:https://mp.weixin.qq.com/s/PzDO-UCLzxqDbFoHbS47Sg
1、延迟消息源码分析
在rocketMq
的broker
配置中有一个配置可以配置消息重试的时间等级:
messageDelayLevel =1s 5s 5s 5s 5s 5s 5s 5s 5s 5s 5s 5s 5s 5s 5s 5s 5s 5s
然后在发送的时候可以设置消息的延迟等级
Message msg = new Message("base", "Tag1", ("Hello World" + i).getBytes());
//设定延迟时间
msg.setDelayTimeLevel(10);
发送到broker
后最终由CommitLog
提交时会判断是否是延迟消息,如果是的话会保存真实的topic
属性,然后修改topic
为SCHEDULE_TOPIC_XXXX
提交,源码如下:
你会发现每种延迟级别分别有一个队列,所以一共有18个队列:
延迟消息会由DeliverDelayedMessageTimerTask#executeOnTimeup
中取出消费队列中的时间戳和当前时间对比,如果到时间了,则重新发送一条原始的消息,源码可查看ScheduleMessageService
中的start
方法和executeOnTimeup
方法:
public void executeOnTimeup() {
ConsumeQueue cq =
ScheduleMessageService.this.defaultMessageStore.findConsumeQueue(TopicValidator.RMQ_SYS_SCHEDULE_TOPIC,
delayLevel2QueueId(delayLevel));
long failScheduleOffset = offset;
if (cq != null) {
SelectMappedBufferResult bufferCQ = cq.getIndexBuffer(this.offset);
if (bufferCQ != null) {
try {
long nextOffset = offset;
int i = 0;
ConsumeQueueExt.CqExtUnit cqExtUnit = new ConsumeQueueExt.CqExtUnit();
for (; i < bufferCQ.getSize(); i += ConsumeQueue.CQ_STORE_UNIT_SIZE) {
long offsetPy = bufferCQ.getByteBuffer().getLong();
int sizePy = bufferCQ.getByteBuffer().getInt();
// 延迟队列存的是发送的时候的时间戳
long tagsCode = bufferCQ.getByteBuffer().getLong();
if (cq.isExtAddr(tagsCode)) {
if (cq.getExt(tagsCode, cqExtUnit)) {
tagsCode = cqExtUnit.getTagsCode();
} else {
//can't find ext content.So re compute tags code.
log.error("[BUG] can't find consume queue extend file content!addr={}, offsetPy={}, sizePy={}",
tagsCode, offsetPy, sizePy);
long msgStoreTime = defaultMessageStore.getCommitLog().pickupStoreTimestamp(offsetPy, sizePy);
tagsCode = computeDeliverTimestamp(delayLevel, msgStoreTime);
}
}
long now = System.currentTimeMillis();
long deliverTimestamp = this.correctDeliverTimestamp(now, tagsCode);
nextOffset = offset + (i / ConsumeQueue.CQ_STORE_UNIT_SIZE);
long countdown = deliverTimestamp - now;
if (countdown <= 0) {
MessageExt msgExt =
ScheduleMessageService.this.defaultMessageStore.lookMessageByOffset(
offsetPy, sizePy);
if (msgExt != null) {
try {
// 构造新的消息
MessageExtBrokerInner msgInner = this.messageTimeup(msgExt);
if (TopicValidator.RMQ_SYS_TRANS_HALF_TOPIC.equals(msgInner.getTopic())) {
log.error("[BUG] the real topic of schedule msg is {}, discard the msg. msg={}",
msgInner.getTopic(), msgInner);
continue;
}
// 发送新的消息
PutMessageResult putMessageResult =
ScheduleMessageService.this.writeMessageStore
.putMessage(msgInner);
if (putMessageResult != null
&& putMessageResult.getPutMessageStatus() == PutMessageStatus.PUT_OK) {
continue;
} else {
// XXX: warn and notify me
log.error(
"ScheduleMessageService, a message time up, but reput it failed, topic: {} msgId {}",
msgExt.getTopic(), msgExt.getMsgId());
ScheduleMessageService.this.timer.schedule(
new DeliverDelayedMessageTimerTask(this.delayLevel,
nextOffset), DELAY_FOR_A_PERIOD);
ScheduleMessageService.this.updateOffset(this.delayLevel,
nextOffset);
return;
}
} catch (Exception e) {
/*
* XXX: warn and notify me
*/
log.error(
"ScheduleMessageService, messageTimeup execute error, drop it. msgExt="
+ msgExt + ", nextOffset=" + nextOffset + ",offsetPy="
+ offsetPy + ",sizePy=" + sizePy, e);
}
}
} else {
ScheduleMessageService.this.timer.schedule(
new DeliverDelayedMessageTimerTask(this.delayLevel, nextOffset),
countdown);
ScheduleMessageService.this.updateOffset(this.delayLevel, nextOffset);
return;
}
} // end of for
nextOffset = offset + (i / ConsumeQueue.CQ_STORE_UNIT_SIZE);
ScheduleMessageService.this.timer.schedule(new DeliverDelayedMessageTimerTask(
this.delayLevel, nextOffset), DELAY_FOR_A_WHILE);
ScheduleMessageService.this.updateOffset(this.delayLevel, nextOffset);
return;
} finally {
bufferCQ.release();
}
} // end of if (bufferCQ != null)
else {
long cqMinOffset = cq.getMinOffsetInQueue();
if (offset < cqMinOffset) {
failScheduleOffset = cqMinOffset;
log.error("schedule CQ offset invalid. offset=" + offset + ", cqMinOffset="
+ cqMinOffset + ", queueId=" + cq.getQueueId());
}
}
} // end of if (cq != null)
ScheduleMessageService.this.timer.schedule(new DeliverDelayedMessageTimerTask(this.delayLevel,
failScheduleOffset), DELAY_FOR_A_WHILE);
}
2、消息重试与进入死信队列源码分析
我们以DefaultMQPushConsumer
自动消息推送(其实还是拉取)
为例,在消费者启动流程中,如果集群模式的话,默认会订阅一个%RETRY%+group
消息重试的topic
源码位置:org.apache.rocketmq.client.impl.consumer.DefaultMQPushConsumerImpl#copySubscription
那么broker
是怎么知道消息要重试的呢? 很明显得消费者来告诉它,消费者在拉取消息消费的时候,会根据监听器的执行结果决定是提交消息还是消息重试,消息拉取的源码位置主要在DefaultMQPushConsumerImpl#pullMessage
,下面主要是拉取消息回调源码:
PullCallback pullCallback = new PullCallback() {
@Override
public void onSuccess(PullResult pullResult) {
if (pullResult != null) {
pullResult = DefaultMQPushConsumerImpl.this.pullAPIWrapper.processPullResult(pullRequest.getMessageQueue(), pullResult,
subscriptionData);
switch (pullResult.getPullStatus()) {
case FOUND:
long prevRequestOffset = pullRequest.getNextOffset();
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
long pullRT = System.currentTimeMillis() - beginTimestamp;
DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullRT(pullRequest.getConsumerGroup(),
pullRequest.getMessageQueue().getTopic(), pullRT);
long firstMsgOffset = Long.MAX_VALUE;
if (pullResult.getMsgFoundList() == null || pullResult.getMsgFoundList().isEmpty()) {
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
} else {
firstMsgOffset = pullResult.getMsgFoundList().get(0).getQueueOffset();
DefaultMQPushConsumerImpl.this.getConsumerStatsManager().incPullTPS(pullRequest.getConsumerGroup(),
pullRequest.getMessageQueue().getTopic(), pullResult.getMsgFoundList().size());
boolean dispatchToConsume = processQueue.putMessage(pullResult.getMsgFoundList());
// ====== 拉取到消息呢会提交消息消费的请求
DefaultMQPushConsumerImpl.this.consumeMessageService.submitConsumeRequest(
pullResult.getMsgFoundList(),
processQueue,
pullRequest.getMessageQueue(),
dispatchToConsume);
if (DefaultMQPushConsumerImpl.this.defaultMQPushConsumer.getPullInterval() > 0) {
DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest,
DefaultMQPushConsumerImpl.this.defaultMQPushConsumer.getPullInterval());
} else {
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
}
}
if (pullResult.getNextBeginOffset() < prevRequestOffset
|| firstMsgOffset < prevRequestOffset) {
log.warn(
"[BUG] pull message result maybe data wrong, nextBeginOffset: {} firstMsgOffset: {} prevRequestOffset: {}",
pullResult.getNextBeginOffset(),
firstMsgOffset,
prevRequestOffset);
}
break;
case NO_NEW_MSG:
case NO_MATCHED_MSG:
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
DefaultMQPushConsumerImpl.this.correctTagsOffset(pullRequest);
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
break;
case OFFSET_ILLEGAL:
log.warn("the pull request offset illegal, {} {}",
pullRequest.toString(), pullResult.toString());
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
pullRequest.getProcessQueue().setDropped(true);
DefaultMQPushConsumerImpl.this.executeTaskLater(new Runnable() {
@Override
public void run() {
try {
DefaultMQPushConsumerImpl.this.offsetStore.updateOffset(pullRequest.getMessageQueue(),
pullRequest.getNextOffset(), false);
DefaultMQPushConsumerImpl.this.offsetStore.persist(pullRequest.getMessageQueue());
DefaultMQPushConsumerImpl.this.rebalanceImpl.removeProcessQueue(pullRequest.getMessageQueue());
log.warn("fix the pull request offset, {}", pullRequest);
} catch (Throwable e) {
log.error("executeTaskLater Exception", e);
}
}
}, 10000);
break;
default:
break;
}
}
}
@Override
public void onException(Throwable e) {
if (!pullRequest.getMessageQueue().getTopic().startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) {
log.warn("execute the pull request exception", e);
}
DefaultMQPushConsumerImpl.this.executePullRequestLater(pullRequest, pullTimeDelayMillsWhenException);
}
};
上面注释的代码就是去消费消息的源码,进一步跟踪的话,最终会由ConsumeMessageConcurrentlyService.ConsumeRequest
这个runnable
并交给线程池来执行,listener
在消费完消息会返回一个状态status
:
public enum ConsumeConcurrentlyStatus {
/**
* Success consumption
*/
CONSUME_SUCCESS,
/**
* Failure consumption,later try to consume
*/
RECONSUME_LATER;
}
如果消息消费失败,需要重新消费呢,就会发送一个MessageBack
的消息给broker
并且会给msg
设置一个延迟等级,比如ConsumeConcurrentlyContext
中的delayLevelWhenNextConsume
默认就是0
broker
在收到消息后呢,会交给SendMessageProcessor
消息后置处理器来处理MessageBack
消息,它是怎么处理的呢?
- 判断重试次数,如果超过了最大次数,则会放入死信队列(
%DLQ%+group)
- 否则判断延迟等级,设置消息的延迟等级,设置真实的
topic
到属性中,设置发送topic
为%RETRY%+group
,然后在发送消息。
那消费者是怎么消费到重试消息的呢?
- 重试的消息是有延迟的是吧,所以会现在
SCHEDULE_TOPIC_XXXX
队列中,到时间了就会到%RETRY%+group
队列中。 - 消息消费者如果是集群模式的话,默认也会订阅
%RETRY%+group
这个topic
的消息,当消费者消费到这个topic
的消息的时候,会先判断是不是重试的消息,如果是,则将属性中的真实的topic
拿出来替换一下消息的topic
然后在交给消息监听器去真正的消费。
源码位置主要如下:
public void resetRetryAndNamespace(final List<MessageExt> msgs, String consumerGroup) {
final String groupTopic = MixAll.getRetryTopic(consumerGroup);
for (MessageExt msg : msgs) {
// 如果是重试的消息,会先替换一下topic再去消费
String retryTopic = msg.getProperty(MessageConst.PROPERTY_RETRY_TOPIC);
if (retryTopic != null && groupTopic.equals(msg.getTopic())) {
msg.setTopic(retryTopic);
}
if (StringUtils.isNotEmpty(this.defaultMQPushConsumer.getNamespace())) {
msg.setTopic(NamespaceUtil.withoutNamespace(msg.getTopic(), this.defaultMQPushConsumer.getNamespace()));
}
}
}
3、事务消息发送、提交、回查源码分析
1、half预提交消息发送与接收处理
事务消息需要实现事务监听器的接口,在发送的时候会执行以下几步:
- 前置校验,清除掉延迟等级的设置内容,设置
TRAN_MSG
属性为true
,设置PGROUP
属性为produceGroup
标识为事务消息 - 发送消息至broker,事务消息发送失败会一直重试,一直到达到最大重试次数
发送源码位置在:DefaultMQProducerImpl#sendDefaultImpl
中,broker
在收到消息的时候也是由SendMessageProcessor
处理,会判断TRAN_MSG
属性是不是true
,如果是true
,会调用TransactionalMessageServiceImpl.asyncPrepareMessage
发送半提交消息到RMQ_SYS_TRANS_HALF_TOPIC
下的队列中去,源码如下:
2、提交、回滚事务消息
事务监听器transactionListener
执行完业务流程后,会返回一个状态:
public enum LocalTransactionState {
COMMIT_MESSAGE, // 提交
ROLLBACK_MESSAGE, // 回滚
UNKNOW, // 未知回查
}
然后会调用DefaultMQProducerImpl#endTransaction
向broker
发送事务提交的请求,事务状态通过设置requestHeader
的CommitOrRollback
来标识,主要源码如下:
public void endTransaction(
final SendResult sendResult,
final LocalTransactionState localTransactionState,
final Throwable localException) throws RemotingException, MQBrokerException, InterruptedException, UnknownHostException {
final MessageId id;
if (sendResult.getOffsetMsgId() != null) {
id = MessageDecoder.decodeMessageId(sendResult.getOffsetMsgId());
} else {
id = MessageDecoder.decodeMessageId(sendResult.getMsgId());
}
String transactionId = sendResult.getTransactionId();
final String brokerAddr = this.mQClientFactory.findBrokerAddressInPublish(sendResult.getMessageQueue().getBrokerName());
EndTransactionRequestHeader requestHeader = new EndTransactionRequestHeader();
requestHeader.setTransactionId(transactionId);
requestHeader.setCommitLogOffset(id.getOffset());
switch (localTransactionState) {
case COMMIT_MESSAGE:
requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_COMMIT_TYPE);
break;
case ROLLBACK_MESSAGE:
requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_ROLLBACK_TYPE);
break;
case UNKNOW:
requestHeader.setCommitOrRollback(MessageSysFlag.TRANSACTION_NOT_TYPE);
break;
default:
break;
}
requestHeader.setProducerGroup(this.defaultMQProducer.getProducerGroup());
requestHeader.setTranStateTableOffset(sendResult.getQueueOffset());
requestHeader.setMsgId(sendResult.getMsgId());
String remark = localException != null ? ("executeLocalTransactionBranch exception: " + localException.toString()) : null;
this.mQClientFactory.getMQClientAPIImpl().endTransactionOneway(brokerAddr, requestHeader, remark,
this.defaultMQProducer.getSendMsgTimeout());
}
broker
在接收到提交事务的请求后,交给EndTransactionProcessor
后置处理来处理,主要处理源码如下:
// 略
if (MessageSysFlag.TRANSACTION_COMMIT_TYPE == requestHeader.getCommitOrRollback()) {
// 根据预提交消息的偏移量获取到half消息
result = this.brokerController.getTransactionalMessageService().commitMessage(requestHeader);
if (result.getResponseCode() == ResponseCode.SUCCESS) {
// 检查
RemotingCommand res = checkPrepareMessage(result.getPrepareMessage(), requestHeader);
if (res.getCode() == ResponseCode.SUCCESS) {
// 封装成真实的需要提交的信息
MessageExtBrokerInner msgInner = endMessageTransaction(result.getPrepareMessage());
msgInner.setSysFlag(MessageSysFlag.resetTransactionValue(msgInner.getSysFlag(), requestHeader.getCommitOrRollback()));
msgInner.setQueueOffset(requestHeader.getTranStateTableOffset());
msgInner.setPreparedTransactionOffset(requestHeader.getCommitLogOffset());
msgInner.setStoreTimestamp(result.getPrepareMessage().getStoreTimestamp());
MessageAccessor.clearProperty(msgInner, MessageConst.PROPERTY_TRANSACTION_PREPARED);
// 发送真的的消息到topic中
RemotingCommand sendResult = sendFinalMessage(msgInner);
if (sendResult.getCode() == ResponseCode.SUCCESS) {
// 删除half消息,其实是将half消息存在到提交topic下去
this.brokerController.getTransactionalMessageService().deletePrepareMessage(result.getPrepareMessage());
}
return sendResult;
}
return res;
}
} else if (MessageSysFlag.TRANSACTION_ROLLBACK_TYPE == requestHeader.getCommitOrRollback()) {
result = this.brokerController.getTransactionalMessageService().rollbackMessage(requestHeader);
if (result.getResponseCode() == ResponseCode.SUCCESS) {
RemotingCommand res = checkPrepareMessage(result.getPrepareMessage(), requestHeader);
if (res.getCode() == ResponseCode.SUCCESS) {
this.brokerController.getTransactionalMessageService().deletePrepareMessage(result.getPrepareMessage());
}
return res;
}
}
消息的提交和回滚有一处需要注意的地方是,deletePrepareMessage
删除half
消息是将half
消息的偏移量存储到一个topic
为RMQ_SYS_TRANS_OP_HALF_TOPIC
,tag
为d
的队列中去。
看到这,对消息提交、回滚做了说明,无论消息是提交还是回滚,消息都会被put
到topic
为RMQ_SYS_TRANS_OP_HALF_TOPIC
的队列中,总结下两个Topic
的作用
-
RMQ_SYS_TRANS_HALF_TOPIC:prepare
消息的主题,事务消息首先先进入到该主题。 -
RMQ_SYS_TRANS_OP_HALF_TOPIC
:当消息服务器收到事务消息的提交或回滚请求后,会将消息存储在该主题下
3、事务消息回查源码分析
事务消息回查的实现在TransactionalMessageCheckService
中,代码如下
详细分析下TransactionalMessageService#check
方法
再来看下for
循环中处理逻辑,由于代码较长,分段截取并注释说明
- 先从
halfQueue
,opQueue
中取出对应offSet
- 根据
halfQueue
,opQueue
判断出opQueue
中哪些消息已经处理过,哪些没处理过,处理过的opQueue offSet
放入doneOpOffset
中,拿opQueue
中消息体内的halfQueue
的offSet
(已经处理过的半消息offSet
)与当前halfQueue
的offSet
做比较 - 接下来就到了消息回查的实现关键,源码如下:
- 这里有个要注意的地方是:
putBackHalfMsgQueue
,将msgExt
重新提交到half
队列中去,并且重置了msgExt
的偏移量,然后去回查msgExt
消息。
这么做是因为当前获取的msgExt
消费偏移量在回查后已经更新了,当重试消息再次失败或者成功的时候,会以putBackHalfMsgQueue
添加的消息为基准去第二次重试、回滚或者成功操作。不然的话,回查失败就不在回查了! - 具体事务回查代码逻辑
AbstractTransactionalMessageCheckListener#resolveHalfMsg
中,启动了一个线程池来进行事务回查,就是向可用的produceGroup
的生产者实例发送请求,然后生产者会执行回查操作,然后就是重复操作了 - 接收事务回查请求逻辑在
ClientRemotingProcessor#checkTransactionState
中,各种请求的接受者的查询逻辑可以根据RequestCode
中的变量值来找,看下具体代码: - 至于
checkTransactionState
,与事务提交部分代码较为相似,只不过这里是调用TransactionListener#checkLocalTransaction
来得到事务执行状态码localTransactionState
,再根据localTransactionState
来发送commit/rollback
请求,不再赘述: