环境说明:redis源码版本 5.0.3;我在阅读源码过程做了注释,git地址:https://gitee.com/xiaoangg/redis_annotation 如有错误欢迎指正
参考书籍:《redis的设计与实现》


目录

serverCron时间事件

一.更新服务器时间缓存

二.更新LRU时钟

三.增加操作采样信息

四.更新服务器内存峰值记录

五.处理SIGTERM信号

六.管理客户端资源

七.管理数据库资源

 八.执行被延时的BGREWRITEAOF

九.检查持久化操作的运行状态


serverCron时间事件

目前redis中的时间时间只有serverCron函数,默认每隔100毫秒执行一次;

这个函数负责管理服务器的资源,保持服务器的良好运行;

serverCron的注册流程:

1.server.c/main()
2.server.c/initServer()
3.调用aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL);

下面详解介绍serverCron函数所做的事情。

一.更新服务器时间缓存

服务中有不好获取系统当前时间的操作,而获取系统时间都需要调用一次系统调用,为了减少系统调用,服务器状态中unixtime属性和mstime属性会缓存当前时间。

因为serverCron默认是100ms执行一次,所以这个两个属性存在误差;

所以两属性只会用在对时间精度要求不高的功能上,如打印日志、更新服务lru时钟,决定是否执行持久化任务、计算服务器上线时间等;

上源码:

server.c/serverCron:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    
    //....
    /*更新server中时间缓存*/
    /* Update the time cache. */
    updateCachedTime();

    //.....
}

server.c/updateCachedTime:

/**
 * 我们在全局状态下缓存unix时间的值,
 * 因为在虚拟内存和老化的情况下,每次访问对象时都要将当前时间存储在对象中;
 * 访问全局变量比调用时间(NULL)快得多
 * 在不需要准确的的获取时间的情况下,可以访问存在
*/
/* We take a cached value of the unix time in the global state because with
 * virtual memory and aging there is to store the current time in objects at
 * every object access, and accuracy is not needed. To access a global var is
 * a lot faster than calling time(NULL) */
void updateCachedTime(void) {
    time_t unixtime = time(NULL);
    atomicSet(server.unixtime,unixtime); //原子操作
    server.mstime = mstime();

    /* To get information about daylight saving time, we need to call localtime_r
     * and cache the result. However calling localtime_r in this context is safe
     * since we will never fork() while here, in the main thread. The logging
     * function will call a thread safe version of localtime that has no locks. */
    struct tm tm;
    localtime_r(&server.unixtime,&tm);
    server.daylight_active = tm.tm_isdst;
}

 

二.更新LRU时钟

服务器状态中lruclock属性保存了服务的lru时钟,这个属性和和上面介绍的unixtime属性和mstime属性一样,都是服务器时间缓存的一种;

每个redis对象都会有个lru属性,记录对象最后一次被访问的时间:
server.h/ struct redisObject:

//redisObjec结构体来表示string、hash、list、set、zset五种数据类型
typedef struct redisObject {
    //4位的type表示具体的数据类型()。Redis中共有5中数据类型(string、hash、list、set、zset)。
    //2^4 = 16足以表示这些类型
    unsigned type:4;
    //4位的encoding表示该类型的物理编码方式,同一种数据类型可能有不同的编码方式
    unsigned encoding:4; 
    //lru 属性保存了对象最后一次被命令访问的时间
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;//refcount表示对象的引用计数
    void *ptr;//ptr指针指向真正的存储结构
} robj;

当服务器需要数据库键的空转时间时,程序就会用服务器的lruclock属性减 对象的lru属性就是,得出空转时间;

tips:可以使用OBJECT IDLETIME 命令获取key的空转时间

 

三.增加操作采样信息

trackInstantaneousMetric函数会以每100ms一次的频率采样,统计时间段内服务器请求数、流量等信息;

然后计算平均一毫米的处理量,乘以1000就是估算1s的处理量;

这个估量会存放的服务端状态inst_metric的环形数组中;

当客户端执行info命令,就会去server.h/inst_metric数组拿去取样结果;

上代码:
server.h/inst_metric结构:

struct redisServer {
//.....
    //用来跟踪实时指标,如每秒操作数、网络流量等
    /* The following two are used to track instantaneous metrics, like
     * number of operations per second, network traffic. */
    struct {
        long long last_sample_time; /* Timestamp of last sample in ms */ //上次采样时间 毫秒级时间戳
        long long last_sample_count;/* Count in last sample */ // 上次采样的值
        long long samples[STATS_METRIC_SAMPLES];
        int idx;
    } inst_metric[STATS_METRIC_COUNT];

//.....
}

server.c/serverCron:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ///.....
    
    //用来跟踪实时指标,如每秒操作数、网络流量等
    run_with_period(100) {
        trackInstantaneousMetric(STATS_METRIC_COMMAND,server.stat_numcommands); //命令操作数
        trackInstantaneousMetric(STATS_METRIC_NET_INPUT,
                server.stat_net_input_bytes);   //NET_INPUT
        trackInstantaneousMetric(STATS_METRIC_NET_OUTPUT,
                server.stat_net_output_bytes);  //NET_OUTPUT
    }

}

server.c/trackInstantaneousMetric:存采样信息

/* Add a sample to the operations per second array of samples. */
void trackInstantaneousMetric(int metric, long long current_reading) {
    long long t = mstime() - server.inst_metric[metric].last_sample_time; //两次取样的时间差值
    long long ops = current_reading -
                    server.inst_metric[metric].last_sample_count; //采样时间段内 操作量
    long long ops_sec;

    ops_sec = t > 0 ? (ops*1000/t) : 0; //计算出每秒的操作量

    //放到循环数组中
    server.inst_metric[metric].samples[server.inst_metric[metric].idx] =
        ops_sec;
    server.inst_metric[metric].idx++;
    server.inst_metric[metric].idx %= STATS_METRIC_SAMPLES;
    server.inst_metric[metric].last_sample_time = mstime();
    server.inst_metric[metric].last_sample_count = current_reading;
}

server.c/getInstantaneousMetric:获取采样信息:

/* Return the mean of all the samples. */
long long getInstantaneousMetric(int metric) {
    int j;
    long long sum = 0;

    for (j = 0; j < STATS_METRIC_SAMPLES; j++)
        sum += server.inst_metric[metric].samples[j];
    return sum / STATS_METRIC_SAMPLES;
}

tips: 采样信息可以通过 INFO status 命令的 返回的instantaneous_ops_per_sec查看

 

四.更新服务器内存峰值记录

服务器状态中的stat_peak_memory属性记录了服务器内存使用的峰值;

server.c/serverCron:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    //......
    //记录内存使用峰值
    /* Record the max memory used since the server was started. */
    if (zmalloc_used_memory() > server.stat_peak_memory)
        server.stat_peak_memory = zmalloc_used_memory();

    //.....
}

tips: 服务器内存峰值可以通过 INFO memory命令 返回used_memory_peak查看

 

五.处理SIGTERM信号

在服务器初始化的时候会调用setupSignalHandlers 设置信号关联处理函数;

设置SIGTERM信号关联的处理函数是sigShutdownHandler;

sigShutdownHandler会设置服务状态shutdown_asap标识为1;

serverCron函数每次运营的时候,都会检查shutdown_asap,如果属性为1,则会执行服务器关闭操作;

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {   
 //......
 /**
     * 处理SIGTERM信号
     */ 
    /* We received a SIGTERM, shutting down here in a safe way, as it is
     * not ok doing so inside the signal handler. */
    if (server.shutdown_asap) {
        if (prepareForShutdown(SHUTDOWN_NOFLAGS) == C_OK) exit(0);
        serverLog(LL_WARNING,"SIGTERM received but errors trying to shut down the server, check the logs for more information");
        server.shutdown_asap = 0;
    }

    //......
}

 

六.管理客户端资源

serverCron函数每次执行都会调用clientsCron()函数;

clientsCron会做以下事情:

  • 检查客户端与服务器之间连接是否超时(长时间没有和服务端互动),如果长时间没有互动,那么释放这个客户端
  • 客户端输入缓冲区是否超过一定限制,如果超过限制,那么释放输入缓冲区,并创建一个默认大小的缓冲区,防止占用内存过多;
  •  跟踪最近几秒钟内使用最大内存量的客户端。 这样可以给info命令提供相关信息,从而避免O(n)遍历client列表;

server.c/clientsCron():

/**
 * 这个函数被serverCron函数调用
 * 用于在客户机上执行必须经常执行的重要操作。
 * 例如:
 * 断开超时的客户端连接,包括哪些被堵塞命令的堵塞客户机;
 * 
 * 
 */ 
/* This function is called by serverCron() and is used in order to perform
 * operations on clients that are important to perform constantly. For instance
 * we use this function in order to disconnect clients after a timeout, including
 * clients blocked in some blocking command with a non-zero timeout.
 *
 * The function makes some effort to process all the clients every second, even
 * if this cannot be strictly guaranteed, since serverCron() may be called with
 * an actual frequency lower than server.hz in case of latency events like slow
 * commands.
 *
 * It is very important for this function, and the functions it calls, to be
 * very fast: sometimes Redis has tens of hundreds of connected clients, and the
 * default server.hz value is 10, so sometimes here we need to process thousands
 * of clients per second, turning this function into a source of latency.
 */
#define CLIENTS_CRON_MIN_ITERATIONS 5
void clientsCron(void) {
    /**
     * 每次调用 尝试至少处理numclient/server.hz客户端数。
     * 
     * 通常在没有大的延时事件发生时,这个函数每秒会被调用server.hz次;
     * 平均1s就处理了所有的客户端;
     */ 
    /* Try to process at least numclients/server.hz of clients
     * per call. Since normally (if there are no big latency events) this
     * function is called server.hz times per second, in the average case we
     * process all the clients in 1 second. */
    int numclients = listLength(server.clients);
    int iterations = numclients/server.hz;
    mstime_t now = mstime();

    //每次至少处理CLIENTS_CRON_MIN_ITERATIONS个客户端
    /* Process at least a few clients while we are at it, even if we need
     * to process less than CLIENTS_CRON_MIN_ITERATIONS to meet our contract
     * of processing each client once per second. */
    if (iterations < CLIENTS_CRON_MIN_ITERATIONS)
        iterations = (numclients < CLIENTS_CRON_MIN_ITERATIONS) ?
                     numclients : CLIENTS_CRON_MIN_ITERATIONS;

    while(listLength(server.clients) && iterations--) {
        client *c;
        listNode *head;

        /* Rotate the list, take the current head, process.
         * This way if the client must be removed from the list it's the
         * first element and we don't incur into O(N) computation. */
        listRotate(server.clients); //翻转列表,将尾部移动到头部,(保证每次处理最老的连接)
        head = listFirst(server.clients);
        c = listNodeValue(head);

        /* The following functions do different service checks on the client.
         * The protocol is that they return non-zero if the client was
         * terminated. */
        if (clientsCronHandleTimeout(c,now)) continue; //处理空闲超时的客户端
        if (clientsCronResizeQueryBuffer(c)) continue; //处理输入的客户端
        if (clientsCronTrackExpansiveClients(c)) continue;
    }
}

 

七.管理数据库资源

serverCon每次执行都会调用databasesCron函数,处理Redis数据库中需要增量执行的“后台”操作。

例如处理key过期、resize、rehash

server.c/databasesCron:

/**
 * 此函数处理Redis数据库中需要增量执行的“后台”操作,
 * 例如处理key过期、resize、rehash。
 */ 
/* This function handles 'background' operations we are required to do
 * incrementally in Redis databases, such as active key expiring, resizing,
 * rehashing. */
void databasesCron(void) {

    /**
     * 随机处理过期的key
     * (server.masterhost == null说明是该服务是master)
     */ 
    /* Expire keys by random sampling. Not required for slaves
     * as master will synthesize DELs for us. */
    if (server.active_expire_enabled && server.masterhost == NULL) {
        activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
    } else if (server.masterhost != NULL) {
        expireSlaveKeys();
    }

    /**
     * 初步整理key碎片
     */ 
    /* Defrag keys gradually. */
    if (server.active_defrag_enabled)
        activeDefragCycle();

    /**
     * 只在没有其他进程将数据库保存在磁盘上时,才执行哈希表重新哈希。
     * 否则,rehash是不友好的,因为这将导致内存页的大量copy-on-write。
     */ 
    /* Perform hash tables rehashing if needed, but only if there are no
     * other processes saving the DB on disk. Otherwise rehashing is bad
     * as will cause a lot of copy-on-write of memory pages. */
    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1) {
        /* We use global counters so if we stop the computation at a given
         * DB we'll be able to start from the successive in the next
         * cron loop iteration. */
        static unsigned int resize_db = 0;
        static unsigned int rehash_db = 0;
        int dbs_per_call = CRON_DBS_PER_CALL;
        int j;

        /* Don't test more DBs than we have. */
        if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;

        /* Resize */
        for (j = 0; j < dbs_per_call; j++) {
            tryResizeHashTables(resize_db % server.dbnum);
            resize_db++;
        }

        /* Rehash */
        if (server.activerehashing) {
            for (j = 0; j < dbs_per_call; j++) {
                int work_done = incrementallyRehash(rehash_db);
                if (work_done) {
                    /* If the function did some work, stop here, we'll do
                     * more at the next cron loop. */
                    break;
                } else {
                    /* If this db didn't need rehash, we'll try the next one. */
                    rehash_db++;
                    rehash_db %= server.dbnum;
                }
            }
        }
    }
}

 

 八.执行被延时的BGREWRITEAOF

在服务器执行BGSAVE 命令期间,如果客户端向服务器发来了BGREWRITEAOF命令,

 那么服务器会将BGREWRITEAOF执行延迟,直到BGSAVE执行完毕;

serverCron每次执行都会检查是否有被延迟的BGREWRITEAOF命令;

如果有则会调用rewriteAppendOnlyFileBackground()函数,执行BGREWRITEAOF;

server.c/serverCron:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {    
    
    //.................
    /**
     * (背景:在服务器执行BGSAVE 命令期间,如果客户端向服务器发来了BGREWRITEAOF命令,
     * 那么服务器会将BGREWRITEAOF执行延迟,直到BGSAVE执行完毕;)下面就是检查是否有被延迟的BGREWRITEAOF命令;
     * 
     * 执行被延时的BGREWRITEAOF命令;  * 
     * server.aof_rewrite_scheduled标记服务器是否延时了BGREWRITEAOF
     */ 
    /* Start a scheduled AOF rewrite if this was requested by the user while
     * a BGSAVE was in progress. */
    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1 &&
        server.aof_rewrite_scheduled)
    {
        rewriteAppendOnlyFileBackground();
    }
    //..................

}

 

九.检查持久化操作的运行状态

服务器状态使用rdb_child_pid和aof_child_pid记录了BGSAVE和BGREWRITEAOF命令了子进程ID,

这两个属性可以用来检查BGSAVE和BGREWRITEAOF是否正在执行;

这两个值中只要有一个不是-1,程序就会调用wait3函数,检查子进程是否有信号发到服务器进程,

如果有信号达到,表示子进程已经完成,服务器执行后续操作,如用新的RDB文件替换旧的;

/** 检查持久化操作的运行状态  检查 background saving 或AOF重写是否已终止。*/
    /* Check if a background saving or AOF rewrite in progress terminated. */
    if (server.rdb_child_pid != -1 || server.aof_child_pid != -1 ||
        ldbPendingChildren())
    {  
        int statloc;
        pid_t pid;

        if ((pid = wait3(&statloc,WNOHANG,NULL)) != 0) { //检查子进程是否有信号发到服务器进程
            int exitcode = WEXITSTATUS(statloc);
            int bysignal = 0;

            if (WIFSIGNALED(statloc)) bysignal = WTERMSIG(statloc);

            if (pid == -1) {
                serverLog(LL_WARNING,"wait3() returned an error: %s. "
                    "rdb_child_pid = %d, aof_child_pid = %d",
                    strerror(errno),
                    (int) server.rdb_child_pid,
                    (int) server.aof_child_pid);
            } else if (pid == server.rdb_child_pid) { //bgsave完成 后续处理处理,
                backgroundSaveDoneHandler(exitcode,bysignal); 
                if (!bysignal && exitcode == 0) receiveChildInfo();
            } else if (pid == server.aof_child_pid) { //bgrewriteaof完成 后续处理处理,
                backgroundRewriteDoneHandler(exitcode,bysignal); //
                if (!bysignal && exitcode == 0) receiveChildInfo();
            } else {
                if (!ldbRemoveChild(pid)) {
                    serverLog(LL_WARNING,
                        "Warning, detected child with unmatched pid: %ld",
                        (long)pid);
                }
            }
            updateDictResizePolicy();
            closeChildInfoPipe();
        }
    } else {
        //如果服务器当前没有进行持久化操作, 检查现在是否要执行持久化操作
        /* If there is not a background saving/rewrite in progress check if
         * we have to save/rewrite now. */
        for (j = 0; j < server.saveparamslen; j++) { //循环检查save的触发条件
            struct saveparam *sp = server.saveparams+j;

            /* Save if we reached the given amount of changes,
             * the given amount of seconds, and if the latest bgsave was
             * successful or if, in case of an error, at least
             * CONFIG_BGSAVE_RETRY_DELAY seconds already elapsed. */
            if (server.dirty >= sp->changes &&
                server.unixtime-server.lastsave > sp->seconds &&
                (server.unixtime-server.lastbgsave_try >
                 CONFIG_BGSAVE_RETRY_DELAY ||
                 server.lastbgsave_status == C_OK))
            {
                serverLog(LL_NOTICE,"%d changes in %d seconds. Saving...",
                    sp->changes, (int)sp->seconds);
                rdbSaveInfo rsi, *rsiptr;
                rsiptr = rdbPopulateSaveInfo(&rsi);
                rdbSaveBackground(server.rdb_filename,rsiptr);
                break;
            }
        }

        //判断是否触发了AOF持久化
        /* Trigger an AOF rewrite if needed. */
        if (server.aof_state == AOF_ON &&
            server.rdb_child_pid == -1 &&
            server.aof_child_pid == -1 &&
            server.aof_rewrite_perc &&
            server.aof_current_size > server.aof_rewrite_min_size)
        {
            long long base = server.aof_rewrite_base_size ?
                server.aof_rewrite_base_size : 1;
            long long growth = (server.aof_current_size*100/base) - 100;
            if (growth >= server.aof_rewrite_perc) {
                serverLog(LL_NOTICE,"Starting automatic rewriting of AOF on %lld%% growth",growth);
                rewriteAppendOnlyFileBackground();
            }
        }
    }