一 序
在对数据库中的16384个槽都进行了指派之后,集群就会进入上线状态,这时客户端就可以向集群中的节点发送数据命令了。
当客户端向节点发送与数据库键有关的命令时,接收命令的节点会计算出命令要处理的数据库键属于哪个槽,并检查这个槽是否指派给了自己:
- 如果键所在的槽正好就指派给了当前节点,那么节点直接执行这个命令
- 如果将所在的槽并没有指派给当前节点,那么节点会向客户端返回一个MOVED错误,指引客户端转向(redirect)至正确的节点,并在此发送之前想要执行的命令。
本节接下来的内容将介绍计算键所属槽的方法,节点判断某个槽是否由自己负责的方法,以及MOVED错误的实现方法,最后,本节还会介绍节点和单机Redis服务器保存键值对的相同和不同之处。
二 计算键属于哪个槽
/* We have 16384 hash slots. The hash slot of a given key is obtained
* as the least significant 14 bits of the crc16 of the key.
*
* However if the key contains the {...} pattern, only the part between
* { and } is hashed. This may be useful in the future to force certain
* keys to be in the same node (assuming no resharding is in progress). */
// 计算给定键应该被分配到那个槽
unsigned int keyHashSlot(char *key, int keylen) {
int s, e; /* start-end indexes of { and } */
// 找'{'字符
for (s = 0; s < keylen; s++)
if (key[s] == '{') break;
/* No '{' ? Hash the whole key. This is the base case. */
// 没有找到"{}",直接计算整个key的哈希值
if (s == keylen) return crc16(key,keylen) & 0x3FFF;
/* '{' found? Check if we have the corresponding '}'. */
// 找到'{',检查是否有'}'
for (e = s+1; e < keylen; e++)
if (key[e] == '}') break;
/* No '}' or nothing betweeen {} ? Hash the whole key. */
// 没有找到配对的'}',直接计算整个key的哈希值
if (e == keylen || e == s+1) return crc16(key,keylen) & 0x3FFF;
/* If we are here there is both a { and a } on its right. Hash
* what is in the middle between { and }. */
// 如果找到了"{}",计算{}中间的哈希值
return crc16(key+s+1,e-s-1) & 0x3FFF;
}
其中CRC16(key)语句用于计算键key的CRC-16校验和,而 0x3FFF换成十进制就是16383,&语句则用于计算出一个介于0至16383之间的整数作为键key的槽号。
三 判断槽是否由当前节点负责处理
当节点计算出键所属的槽i之后,节点就会检查自己在clusterState.slots数组中的项i,判断键所在的槽是否由自己负责:
- 如果clusterState.slots[i]等于clusterState.myself,那么说明槽i由当前节点负责,节点可以执行客户端发送的命令。
- 如果clusterState.slots[i]不等于clusterState.myself,那么说明槽i并非由当前节点负责,节点会根据clusterState.slots[i]指向的clusterNode结构所记录的节点IP和端口号,向客户端返回MOVED错误,指引客户端转向至正在处理槽i的节点。
四 MOVED错误
当节点发现键所在的槽并非由自己负责处理的时候,节点就会向客户端返回一个MOVED错误,指引客户端转向至正在负责槽的节点。
其中slot为键所在的槽,而ip和port则是负责处理槽slot的节点的IP地址和端口号
当客户端接收到节点返回的MOVED错误时,客户端会根据MOVED错误中提供的IP地址和端口号,转向至负责处理槽slot的节点,并向该节点重新发送之前想要执行的命令。
一个集群客户端通常会与集群中的多个节点创建套接字连接,而所谓的节点转向实际上就是换一个套接字来发送命令。
如果客户端尚未与想要转向的节点创建套接字连接,那么客户端会现根据MOVED错误提供的IP地址和端口号来连接节点,然后再进行转向。
五 实现源码:
在server.c 的processCommand 里面:
int processCommand(client *c) {
...
/* If cluster is enabled perform the cluster redirection here.
* However we don't perform the redirection if:
* 1) The sender of this command is our master.
* 2) The command has no key arguments. */
// 如果开启了集群模式,则执行集群的重定向操作,下面的两种情况例外:
// 1. 命令的发送是主节点服务器
// 2. 命令没有key
if (server.cluster_enabled &&
!(c->flags & CLIENT_MASTER) &&
!(c->flags & CLIENT_LUA &&
server.lua_caller->flags & CLIENT_MASTER) &&
!(c->cmd->getkeys_proc == NULL && c->cmd->firstkey == 0 &&
c->cmd->proc != execCommand))
{
int hashslot;
int error_code;
// 从集群中返回一个能够执行命令的节点
clusterNode *n = getNodeByQuery(c,c->cmd,c->argv,c->argc,
&hashslot,&error_code);
// 返回的节点不合格
if (n == NULL || n != server.cluster->myself) {
// 如果是执行事务的命令,则取消事务
if (c->cmd->proc == execCommand) {
discardTransaction(c);
} else {// 将事务状态设置为失败
flagTransaction(c);
}
// 执行client的重定向操作
clusterRedirectClient(c,n,hashslot,error_code);
return C_OK;
}
}
...
}
redis集群模式下的读写过程中,先对key进行hash找到slot进而找到clusterNode,如果clusterNode不是本节点就返回ASK或者MOVED错误码让客户端向新的节点ip:port发起连接。 getNodeByQuery计算命令中要处理的key处于哪个槽,并获取槽相应的节点.
/* Return the pointer to the cluster node that is able to serve the command.
* For the function to succeed the command should only target either:
*
* 1) A single key (even multiple times like LPOPRPUSH mylist mylist).
* 2) Multiple keys in the same hash slot, while the slot is stable (no
* resharding in progress).
*
* On success the function returns the node that is able to serve the request.
* If the node is not 'myself' a redirection must be perfomed. The kind of
* redirection is specified setting the integer passed by reference
* 'error_code', which will be set to CLUSTER_REDIR_ASK or
* CLUSTER_REDIR_MOVED.
*
* When the node is 'myself' 'error_code' is set to CLUSTER_REDIR_NONE.
*
* If the command fails NULL is returned, and the reason of the failure is
* provided via 'error_code', which will be set to:
*
* CLUSTER_REDIR_CROSS_SLOT if the request contains multiple keys that
* don't belong to the same hash slot.
*
* CLUSTER_REDIR_UNSTABLE if the request contains multiple keys
* belonging to the same slot, but the slot is not stable (in migration or
* importing state, likely because a resharding is in progress).
*
* CLUSTER_REDIR_DOWN_UNBOUND if the request addresses a slot which is
* not bound to any node. In this case the cluster global state should be
* already "down" but it is fragile to rely on the update of the global state,
* so we also handle it here.
*
* CLUSTER_REDIR_DOWN_STATE if the cluster is down but the user attempts to
* execute a command that addresses one or more keys. */
// 返回一个能够执行命令的集群节点,该函数能够成功执行命令的条件:
/*
1. 一个单个键
2. 如果槽没有被迁移或导入,那么多个键应该属于一个槽
*/
clusterNode *getNodeByQuery(client *c, struct redisCommand *cmd, robj **argv, int argc, int *hashslot, int *error_code) {
clusterNode *n = NULL;
robj *firstkey = NULL;
int multiple_keys = 0;
multiState *ms, _ms;
multiCmd mc;
int i, slot = 0, migrating_slot = 0, importing_slot = 0, missing_keys = 0;
/* Set error code optimistically for the base case. */
// 初始化错误码
if (error_code) *error_code = CLUSTER_REDIR_NONE;
/* We handle all the cases as if they were EXEC commands, so we have
* a common code path for everything */
// 如果是事务命令,需要进行一些判断
// 但必须确保事务中的所有命令都是针对某个相同的键进行的
// 这个 if 和接下来的 for 进行的就是这一合法性检测
if (cmd->proc == execCommand) {
/* If CLIENT_MULTI flag is not set EXEC is just going to return an
* error. */
// 如果没有设置CLIENT_MULTI,那么client无法执行事务,返回错误
if (!(c->flags & CLIENT_MULTI)) return myself;
ms = &c->mstate;
} else {
/* In order to have a single codepath create a fake Multi State
* structure if the client is not in MULTI/EXEC state, this way
* we have a single codepath below. */
ms = &_ms;
_ms.commands = &mc;
_ms.count = 1;
mc.argv = argv;
mc.argc = argc;
mc.cmd = cmd;
}
/* Check that all the keys are in the same hash slot, and obtain this
* slot and the node associated. */
// 检查所有在相同槽的键,获取槽和关联的节点
for (i = 0; i < ms->count; i++) {
struct redisCommand *mcmd;
robj **margv;
int margc, *keyindex, numkeys, j;
mcmd = ms->commands[i].cmd;
margc = ms->commands[i].argc;
margv = ms->commands[i].argv;
// 从argv和argc指定的参数列表中返回所有的键
keyindex = getKeysFromCommand(mcmd,margv,margc,&numkeys);
// 遍历命令中的所有键
for (j = 0; j < numkeys; j++) {
robj *thiskey = margv[keyindex[j]];
int thisslot = keyHashSlot((char*)thiskey->ptr,
sdslen(thiskey->ptr));
if (firstkey == NULL) {
/* This is the first key we see. Check what is the slot
* and node. */
// 这是事务中第一个被处理的键,获取该键的槽和负责处理该槽的节点
firstkey = thiskey;
slot = thisslot;
n = server.cluster->slots[slot];
/* Error: If a slot is not served, we are in "cluster down"
* state. However the state is yet to be updated, so this was
* not trapped earlier in processCommand(). Report the same
* error to the client. */
//key对应的槽没有节点指派
if (n == NULL) {
getKeysFreeResult(keyindex);
if (error_code)
*error_code = CLUSTER_REDIR_DOWN_UNBOUND;
return NULL;
}
/* If we are migrating or importing this slot, we need to check
* if we have all the keys in the request (the only way we
* can safely serve the request, otherwise we return a TRYAGAIN
* error). To do so we set the importing/migrating state and
* increment a counter for every missing key. */
//判断是否在重新分片——key所在的槽要迁移到别的节点或者从别的节点迁移过来
if (n == myself &&
server.cluster->migrating_slots_to[slot] != NULL)
{
migrating_slot = 1;
} else if (server.cluster->importing_slots_from[slot] != NULL) {
importing_slot = 1;
}
} else {
/* If it is not the first key, make sure it is exactly
* the same key as the first we saw. */
//用来确认所有的key是否都在同一个slot
if (!equalStringObjects(firstkey,thiskey)) {
if (slot != thisslot) {
/* Error: multiple keys from different slots. */
getKeysFreeResult(keyindex);
if (error_code)
*error_code = CLUSTER_REDIR_CROSS_SLOT;
return NULL;
} else {
/* Flag this request as one with multiple different
* keys. */
multiple_keys = 1;
}
}
}
/* Migarting / Improrting slot? Count keys we don't have. */
//key是否已经迁移到别的节点或者还没从别的节点迁移过来
if ((migrating_slot || importing_slot) &&
lookupKeyRead(&server.db[0],thiskey) == NULL)
{
missing_keys++;
}
}
getKeysFreeResult(keyindex);
}
/* No key at all in command? then we can serve the request
* without redirections or errors in all the cases. */
if (n == NULL) return myself;
/* Cluster is globally down but we got keys? We can't serve the request. */
if (server.cluster->state != CLUSTER_OK) {
if (error_code) *error_code = CLUSTER_REDIR_DOWN_STATE;
return NULL;
}
/* Return the hashslot by reference. */
if (hashslot) *hashslot = slot;
/* MIGRATE always works in the context of the local node if the slot
* is open (migrating or importing state). We need to be able to freely
* move keys among instances in this case. */
if ((migrating_slot || importing_slot) && cmd->proc == migrateCommand)
return myself;
/* If we don't have all the keys and we are migrating the slot, send
* an ASK redirection. */
//如果slot已经迁移到别的节点,返回CLUSTER_REDIR_ASK
if (migrating_slot && missing_keys) {
if (error_code) *error_code = CLUSTER_REDIR_ASK;
return server.cluster->migrating_slots_to[slot];
}
/* If we are receiving the slot, and the client correctly flagged the
* request as "ASKING", we can serve the request. However if the request
* involves multiple keys and we don't have them all, the only option is
* to send a TRYAGAIN error. */
if (importing_slot &&
(c->flags & CLIENT_ASKING || cmd->flags & CMD_ASKING))
{
if (multiple_keys && missing_keys) {
if (error_code) *error_code = CLUSTER_REDIR_UNSTABLE;
return NULL;
} else {
return myself;
}
}
/* Handle the read-only client case reading from a slave: if this
* node is a slave and the request is about an hash slot our master
* is serving, we can reply without redirection. */
if (c->flags & CLIENT_READONLY &&
cmd->flags & CMD_READONLY &&
nodeIsSlave(myself) &&
myself->slaveof == n)
{
return myself;
}
/* Base case: just return the right node. However if this node is not
* myself, set error_code to MOVED since we need to issue a rediretion. */
//slot不在此节点,返回CLUSTER_REDIR_MOVED
if (n != myself && error_code) *error_code = CLUSTER_REDIR_MOVED;
return n;
}
根据计算key对应的槽和节点时返回的错误,返回给客户端进行重定向
/* Send the client the right redirection code, according to error_code
* that should be set to one of CLUSTER_REDIR_* macros.
*
* If CLUSTER_REDIR_ASK or CLUSTER_REDIR_MOVED error codes
* are used, then the node 'n' should not be NULL, but should be the
* node we want to mention in the redirection. Moreover hashslot should
* be set to the hash slot that caused the redirection. */
// 发送client一个正确的重定向标识
void clusterRedirectClient(client *c, clusterNode *n, int hashslot, int error_code) {
if (error_code == CLUSTER_REDIR_CROSS_SLOT) {
addReplySds(c,sdsnew("-CROSSSLOT Keys in request don't hash to the same slot\r\n"));
} else if (error_code == CLUSTER_REDIR_UNSTABLE) {
/* The request spawns mutliple keys in the same slot,
* but the slot is not "stable" currently as there is
* a migration or import in progress. */
addReplySds(c,sdsnew("-TRYAGAIN Multiple keys request during rehashing of slot\r\n"));
} else if (error_code == CLUSTER_REDIR_DOWN_STATE) {
addReplySds(c,sdsnew("-CLUSTERDOWN The cluster is down\r\n"));
} else if (error_code == CLUSTER_REDIR_DOWN_UNBOUND) {
addReplySds(c,sdsnew("-CLUSTERDOWN Hash slot not served\r\n"));
} else if (error_code == CLUSTER_REDIR_MOVED ||
error_code == CLUSTER_REDIR_ASK)
{
addReplySds(c,sdscatprintf(sdsempty(),
"-%s %d %s:%d\r\n",
(error_code == CLUSTER_REDIR_ASK) ? "ASK" : "MOVED",
hashslot,n->ip,n->port));
} else {
serverPanic("getNodeByQuery() unknown error.");
}
}
节点数据库的实现
节点和单机服务器在数据库方面的一个区别是,节点只能使用0号数据库,而单机Redis服务器则没有这一限制。