command唯一消费实现原理
实现原理总共分三步:
1. 每个master分配slot
master在初次启动和注册的监听中都核心调用了 syncMasterNodes() 方法。
该方法主要更新 全部MASTER_SIZE 和 自身SLOT_LIST,SLOT_LIST只存放自身slot值。
至此,每个master都能知道总master个数和自己的slot值。
大致流程为:清空slot -> 获取锁 -> 更新master -> 释放锁
特别注意,这里SLOT_LIST.clear()和分布式锁,后面会有思考。
private void updateMasterNodes() {
// 清空slot, 此时每个master的slot都为0
SLOT_LIST.clear();
this.masterNodes.clear();
String nodeLock = Constants.REGISTRY_DOLPHINSCHEDULER_LOCK_MASTERS;
try {
// 获取分布式锁
registryClient.getLock(nodeLock);
Collection<String> currentNodes = registryClient.getMasterNodesDirectly();
List<Server> masterNodes = registryClient.getServerList(NodeType.MASTER);
syncMasterNodes(currentNodes, masterNodes);
} catch (Exception e) {
logger.error("update master nodes error", e);
} finally {
// 释放分布式锁
registryClient.releaseLock(nodeLock);
}
}
private void syncMasterNodes(Collection<String> nodes, List<Server> masterNodes) {
masterLock.lock();
try {
this.masterNodes.addAll(nodes);
this.masterPriorityQueue.clear();
this.masterPriorityQueue.putList(masterNodes);
int index = masterPriorityQueue.getIndex(NetUtils.getHost());
if (index >= 0) {
// 更新master个数和自身slot
MASTER_SIZE = nodes.size();
SLOT_LIST.add(masterPriorityQueue.getIndex(NetUtils.getHost()));
}
logger.info("update master nodes, master size: {}, slot: {}",
MASTER_SIZE, SLOT_LIST.toString()
);
} finally {
masterLock.unlock();
}
}
2. 消费command
消费条件:只要master_size不为0即可正常消费command
消费逻辑:使用command的ID % MASTER_SIZE == slot确定command属于哪个master。一次只消费一个command,高版本已经实现获取多个。
理论上,每个master都有各自的slot,一个command不会被多个master扫到,但是假如command被多个master扫到呢,为了防止重复消费,才有第三步。
特别注意,master能消费command的条件,后面会有思考。
private Command findOneCommand() {
int pageNumber = 0;
Command result = null;
while (Stopper.isRunning()) {
// 只要master_size不为0即可正常消费command
if (ServerNodeManager.MASTER_SIZE == 0) {
return null;
}
List<Command> commandList = processService.findCommandPage(ServerNodeManager.MASTER_SIZE, pageNumber);
if (commandList.size() == 0) {
return null;
}
for (Command command : commandList) {
int slot = ServerNodeManager.getSlot();
// 获取属于自身的command
if (ServerNodeManager.MASTER_SIZE != 0
&& command.getId() % ServerNodeManager.MASTER_SIZE == slot) {
result = command;
break;
}
}
if (result != null) {
logger.info("find command {}, slot:{} :",
result.getId(),
ServerNodeManager.getSlot());
break;
}
pageNumber += 1;
}
return result;
}
3. 防止重复消费
如果没有删除到记录,表示已经被消费,抛异常,触发事务回滚
@Transactional
public ProcessInstance handleCommand(Logger logger, String host, Command command, HashMap<String, ProcessDefinition> processDefinitionCacheMaps
) {
ProcessInstance processInstance = constructProcessInstance(command, host, processDefinitionCacheMaps);
// cannot construct process instance, return null
if (processInstance == null) {
logger.error("scan command, command parameter is error: {}", command);
moveToErrorCommand(command, "process instance is null");
return null;
}
processInstance.setCommandType(command.getCommandType());
processInstance.addHistoryCmd(command.getCommandType());
saveProcessInstance(processInstance);
this.setSubProcessParam(processInstance);
// 删除并校验
this.deleteCommandWithCheck(command.getId());
return processInstance;
}
private void deleteCommandWithCheck(int commandId) {
int delete = this.commandMapper.deleteById(commandId);
// 通过删除 + 事务保证
if (delete != 1) {
throw new ServiceException("delete command fail, id:" + commandId);
}
}
思考
command为什么会被重复消费?
一旦所有master都已启动,且slot值都固定,command是不会被重复消费的,只有当master上下线,才有可能被重复消费。
在有command的前提下分析:
首先(见第2步骤)master消费command的条件是MASTER_SIZE != 0,(见第1步骤)当master发生上下线时,所有其余master会通过监听触发updateMasterNodes() 方法,执行以下2个操作
1) 将SLOT_SIZE.clear(),这意味着getSlot()时都返回0。
public static Integer getSlot() {
if (SLOT_LIST.size() > 0) {
return SLOT_LIST.get(0);
}
return 0;
}
2)争夺分布式锁,切记这时串行的。也就是没抢到锁的master此时:slot = 0,master_size = 原master个数。此时是可以正常消费command的,且消费的一样
优化建议
虽然有第三步事务保证command不被重复消费,但是还是有优化空间的,尽可能减少重复消费。
1)上下线时,未获取到锁的master暂时不工作,只需置MASTER_SIZE = 0
private void updateMasterNodes() {
SLOT_LIST.clear();
// 设置为0
MASTER_SIZE = 0;
......
}
2)上下线时,未获取到锁的master保留原slot,正常工作。移动SLOT_LIST.clear()到获取锁后
private void updateMasterNodes() {
// 删除
// SLOT_LIST.clear();
this.masterNodes.clear();
String nodeLock = Constants.REGISTRY_DOLPHINSCHEDULER_LOCK_MASTERS;
try {
registryClient.getLock(nodeLock);
// 移到到获取锁后
SLOT_LIST.clear();
Collection<String> currentNodes = registryClient.getMasterNodesDirectly();
List<Server> masterNodes = registryClient.getServerList(NodeType.MASTER);
syncMasterNodes(currentNodes, masterNodes);
} catch (Exception e) {
logger.error("update master nodes error", e);
} finally {
registryClient.releaseLock(nodeLock);
}
}