1、 哨兵服务器
在(24)(25)中分析哨兵发现从服务器和其他哨兵服务器使用的方法,接下来继续哨兵服务器在启动过程中如何使用上述方法发现服务器。
哨兵服务器启动
在(23)中提到了哨兵服务器可以用redis-server启动,这和普通的redis服务器一样,其启动调用的方法也相同,即server.c中的main方法,与普通redis服务器启动不同的是其还有一个参数–sentinel,这个参数在main方法中有处理,其处理方法如下:
这里会调用checkForSentinelMode方法来解析启动时的参数,其内容如下:
/* Returns 1 if there is --sentinel among the arguments or if
* argv[0] contains "redis-sentinel". */
int checkForSentinelMode(int argc, char **argv) {
int j;
if (strstr(argv[0],"redis-sentinel") != NULL) return 1;
for (j = 1; j < argc; j++)
if (!strcmp(argv[j],"--sentinel")) return 1;
return 0;
}
如果是启动哨兵服务器这个方法会返回1,即将server.sentinel_mode的值设置为1。在服务器正常运行时调用的函数serverCron中有以下片段:
这里调用的sentinelTimer方法实现在sentinel.c文件中,其内容如下:
void sentinelTimer(void) {
sentinelCheckTiltCondition();
sentinelHandleDictOfRedisInstances(sentinel.masters);
sentinelRunPendingScripts();
sentinelCollectTerminatedScripts();
sentinelKillTimedoutScripts();
/* We continuously change the frequency of the Redis "timer interrupt"
* in order to desynchronize every Sentinel from every other.
* This non-determinism avoids that Sentinels started at the same time
* exactly continue to stay synchronized asking to be voted at the
* same time again and again (resulting in nobody likely winning the
* election because of split brain voting). */
server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ;
}
这段代码的重点在第3行,这里调用了sentinelHandleDictOfRedisInstances方法,同时传入了一个参数sentinel.masters。
首先解析这个参数sentinel.masters,对于这个参数的创建,需要先回到server.c的main方法中,上文分析了解析参数的checkForSentinelMode方法,这个方法确定了使用哨兵模式后,会执行以下代码:
这段代码会调用initSentinel方法,初始化哨兵模式需要的参数,其内容如下:
/* Perform the Sentinel mode initialization. */
void initSentinel(void) {
unsigned int j;
/* Remove usual Redis commands from the command table, then just add
* the SENTINEL command. */
dictEmpty(server.commands,NULL);
for (j = 0; j < sizeof(sentinelcmds)/sizeof(sentinelcmds[0]); j++) {
int retval;
struct redisCommand *cmd = sentinelcmds+j;
retval = dictAdd(server.commands, sdsnew(cmd->name), cmd);
serverAssert(retval == DICT_OK);
}
/* Initialize various data structures. */
sentinel.current_epoch = 0;
sentinel.masters = dictCreate(&instancesDictType,NULL);
sentinel.tilt = 0;
sentinel.tilt_start_time = 0;
sentinel.previous_time = mstime();
sentinel.running_scripts = 0;
sentinel.scripts_queue = listCreate();
sentinel.announce_ip = NULL;
sentinel.announce_port = 0;
sentinel.simfailure_flags = SENTINEL_SIMFAILURE_NONE;
sentinel.deny_scripts_reconfig = SENTINEL_DEFAULT_DENY_SCRIPTS_RECONFIG;
memset(sentinel.myid,0,sizeof(sentinel.myid));
}
在第18行可以看见了初始化了 sentinel.masters,它是作为字典被初始化的。但是这里只是创建,其内部存储的数据依然是未知的。
接着向下看,main方法在初始化了参数后,还需要读取配置文件中的配置。在(23)中,提到了配置一个最简单的哨兵服务器只需配置sentinel monitor。我们以这个配置查看redis的处理方式。首先是在main方法中,解析配置文件调用的方法如下:
这个loadServerConfig方法实现在config.c中,这个方法非常长,其中和上述配置有关的方法如下:
这里可以看见它调用了一个sentinelHandleConfiguration方法来处理和哨兵相关的配置,这个方法实现在sentinel.c中,其中和上述配置相关的代码如下:
这里主要是调用了createSentinelRedisInstance方法来处理配置的主服务器,注意这里传入了一个参数SRI_MASTER,这个参数表明了它是主服务器。其内容如下:
sentinelRedisInstance *createSentinelRedisInstance(char *name, int flags, char *hostname, int port, int quorum, sentinelRedisInstance *master) {
sentinelRedisInstance *ri;
sentinelAddr *addr;
dict *table = NULL;
char slavename[NET_PEER_ID_LEN], *sdsname;
serverAssert(flags & (SRI_MASTER|SRI_SLAVE|SRI_SENTINEL));
serverAssert((flags & SRI_MASTER) || master != NULL);
/* Check address validity. */
addr = createSentinelAddr(hostname,port);
if (addr == NULL) return NULL;
/* For slaves use ip:port as name. */
if (flags & SRI_SLAVE) {
anetFormatAddr(slavename, sizeof(slavename), hostname, port);
name = slavename;
}
/* Make sure the entry is not duplicated. This may happen when the same
* name for a master is used multiple times inside the configuration or
* if we try to add multiple times a slave or sentinel with same ip/port
* to a master. */
if (flags & SRI_MASTER) table = sentinel.masters;
else if (flags & SRI_SLAVE) table = master->slaves;
else if (flags & SRI_SENTINEL) table = master->sentinels;
sdsname = sdsnew(name);
if (dictFind(table,sdsname)) {
releaseSentinelAddr(addr);
sdsfree(sdsname);
errno = EBUSY;
return NULL;
}
/* Create the instance object. */
ri = zmalloc(sizeof(*ri));
/* Note that all the instances are started in the disconnected state,
* the event loop will take care of connecting them. */
ri->flags = flags;
ri->name = sdsname;
ri->runid = NULL;
ri->config_epoch = 0;
ri->addr = addr;
ri->link = createInstanceLink();
ri->last_pub_time = mstime();
ri->last_hello_time = mstime();
ri->last_master_down_reply_time = mstime();
ri->s_down_since_time = 0;
ri->o_down_since_time = 0;
ri->down_after_period = master ? master->down_after_period :
SENTINEL_DEFAULT_DOWN_AFTER;
ri->master_link_down_time = 0;
ri->auth_pass = NULL;
ri->slave_priority = SENTINEL_DEFAULT_SLAVE_PRIORITY;
ri->slave_reconf_sent_time = 0;
ri->slave_master_host = NULL;
ri->slave_master_port = 0;
ri->slave_master_link_status = SENTINEL_MASTER_LINK_STATUS_DOWN;
ri->slave_repl_offset = 0;
ri->sentinels = dictCreate(&instancesDictType,NULL);
ri->quorum = quorum;
ri->parallel_syncs = SENTINEL_DEFAULT_PARALLEL_SYNCS;
ri->master = master;
ri->slaves = dictCreate(&instancesDictType,NULL);
ri->info_refresh = 0;
ri->renamed_commands = dictCreate(&renamedCommandsDictType,NULL);
/* Failover state. */
ri->leader = NULL;
ri->leader_epoch = 0;
ri->failover_epoch = 0;
ri->failover_state = SENTINEL_FAILOVER_STATE_NONE;
ri->failover_state_change_time = 0;
ri->failover_start_time = 0;
ri->failover_timeout = SENTINEL_DEFAULT_FAILOVER_TIMEOUT;
ri->failover_delay_logged = 0;
ri->promoted_slave = NULL;
ri->notification_script = NULL;
ri->client_reconfig_script = NULL;
ri->info = NULL;
/* Role */
ri->role_reported = ri->flags & (SRI_MASTER|SRI_SLAVE);
ri->role_reported_time = mstime();
ri->slave_conf_change_time = mstime();
/* Add into the right table. */
dictAdd(table, ri->name, ri);
return ri;
}
首先是第24行,更具传入服务器的种类为table赋值,由上文分析可知传入的主服务器,table的值实际是上文初始化的sentinel.masters,然后是36行创建了一个代表服务器实例的ri,然后对ri赋值,最后是第88行将ri添加的table中。
由此,可以知道在sentinel.masters中存储的实际是主服务器,接下来继续分析sentinelHandleDictOfRedisInstances方法,其内容如下:
/* Perform scheduled operations for all the instances in the dictionary.
* Recursively call the function against dictionaries of slaves. */
void sentinelHandleDictOfRedisInstances(dict *instances) {
dictIterator *di;
dictEntry *de;
sentinelRedisInstance *switch_to_promoted = NULL;
/* There are a number of things we need to perform against every master. */
di = dictGetIterator(instances);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);
sentinelHandleRedisInstance(ri);
if (ri->flags & SRI_MASTER) {
sentinelHandleDictOfRedisInstances(ri->slaves);
sentinelHandleDictOfRedisInstances(ri->sentinels);
if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) {
switch_to_promoted = ri;
}
}
}
if (switch_to_promoted)
sentinelFailoverSwitchToPromotedSlave(switch_to_promoted);
dictReleaseIterator(di);
}
这个方法重点在第10行,这里会用一个while循环遍历传入的instances。上文传入的instance是sentinel.masters,其中记录的是主服务器的instances。接下来是第13行执行的sentinelHandleRedisInstance方法,哨兵的主要功能都在这个方法中执行。最后是14行的if语句,if语句内会对主服务器中记录了的从服务器和哨兵服务器同样执行sentinelHandleDictOfRedisInstances方法。
sentinelHandleRedisInstance方法是哨兵模式中最重要的方法,(23)提到的五个功能都是在这个方法中实现的。其内容如下:
/* Perform scheduled operations for the specified Redis instance. */
void sentinelHandleRedisInstance(sentinelRedisInstance *ri) {
/* ========== MONITORING HALF ============ */
/* Every kind of instance */
sentinelReconnectInstance(ri);
sentinelSendPeriodicCommands(ri);
/* ============== ACTING HALF ============= */
/* We don't proceed with the acting half if we are in TILT mode.
* TILT happens when we find something odd with the time, like a
* sudden change in the clock. */
if (sentinel.tilt) {
if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return;
sentinel.tilt = 0;
sentinelEvent(LL_WARNING,"-tilt",NULL,"#tilt mode exited");
}
/* Every kind of instance */
sentinelCheckSubjectivelyDown(ri);
/* Masters and slaves */
if (ri->flags & (SRI_MASTER|SRI_SLAVE)) {
/* Nothing so far. */
}
/* Only masters */
if (ri->flags & SRI_MASTER) {
sentinelCheckObjectivelyDown(ri);
if (sentinelStartFailoverIfNeeded(ri))
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
sentinelFailoverStateMachine(ri);
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
}
}
首先是第6行的sentinelReconnectInstance方法,这个方法与发现其他哨兵服务器相关,然后是第7行的sentinelSendPeriodicCommands方法,这个方法与发现从服务器相关,然后是第20行的sentinelCheckSubjectivelyDown方法,这个方法在检查主观下线,然后是第29行的客观下线检查,如果客观下线那么后面的方法会执行头领选举和故障转移。