[实战] 调试 Redis 准备工作

时下的业界,相对于传统的关系型数据库,以 key-value 思想实现的 NoSQL 内存数据库非常流行,而提到内存数据库,很多读者第一反应就是 Redis 。确实,Redis 以其高效的性能和优雅的实现成为众多内存数据库中的翘楚。

1、Redis 源码下载与编译

Redis 的最新源码下载地址可以在Redis 官网获得。我使用的是 CentOS 7.0 系统,使用 wget 命令将 Redis 源码文件下载下来:

[root@localhost gdbtest]# wget http://download.redis.io/releases/redis-4.0.11.tar.gz

–2018-09-08 13:08:41– http://download.redis.io/releases/redis-4.0.11.tar.gz

Resolving download.redis.io (download.redis.io)… 109.74.203.151

Connecting to download.redis.io (download.redis.io)|109.74.203.151|:80… connected.

HTTP request sent, awaiting response… 200 OK

Length: 1739656 (1.7M) [application/x-gzip]

Saving to: ‘redis-4.0.11.tar.gz’

54% [======================> ] 940,876 65.6KB/s eta 9s

解压:

[root@localhost gdbtest]# tar zxvf redis-4.0.11.tar.gz

进入生成的 redis-4.0.11 目录使用 makefile 进行编译:

[root@localhost gdbtest]# cd redis-4.0.11

[root@localhost redis-4.0.11]# make -j 4

编译成功后,会在 src 目录下生成多个可执行程序,其中 redis-server 和 redis-cli 是我们即将调试的程序。

进入 src 目录,使用 GDB 启动 redis-server 这个程序:

[root@localhost src]# gdb redis-server

Reading symbols from /root/redis-4.0.9/src/redis-server…done.

(gdb) r

Starting program: /root/redis-4.0.9/src/redis-server

[Thread debugging using libthread_db enabled]

Using host libthread_db library “/lib64/libthread_db.so.1”.

31212:C 17 Sep 11:59:50.781 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

31212:C 17 Sep 11:59:50.781 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=31212, just started

31212:C 17 Sep 11:59:50.781 # Warning: no config file specified, using the default config. In order to specify a config file use /root/redis-4.0.9/src/redis-server /path/to/redis.conf

31212:M 17 Sep 11:59:50.781 * Increased maximum number of open files to 10032 (it was originally set to 1024).

[New Thread 0x7ffff07ff700 (LWP 31216)]

[New Thread 0x7fffefffe700 (LWP 31217)]

[New Thread 0x7fffef7fd700 (LWP 31218)]

.

_.-__ ''-._

_.- .. ”-. Redis 4.0.9 (00000000/0) 64 bit

.-.-```. ```/ _.,_ ''-._

( ' , .-` | `, ) Running in standalone mode

|`-._`-...-` __...-.-._|’_.-'| Port: 6379

|-._ ._ / _.-' | PID: 31212

-._ -._-./ .-’ .-’

|-._-._ -.__.-' _.-'_.-'|

|-._-._ _.-'_.-' | http://redis.io

-._ -._-._.-‘.-’ _.-’

|-._-._ -.__.-' _.-'_.-'|

|-._-._ _.-'_.-' |

-._ -._-._.-‘.-’ _.-’

-._-._.-’ .-’

-._ _.-'

-.__.-’

31212:M 17 Sep 11:59:50.793 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.

31212:M 17 Sep 11:59:50.793 # Server initialized

31212:M 17 Sep 11:59:50.793 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1’ for this to take effect.

31212:M 17 Sep 11:59:50.794 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command ‘echo never > /sys/kernel/mm/transparent_hugepage/enabled’ as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

31212:M 17 Sep 11:59:50.794 * DB loaded from disk: 0.000 seconds

31212:M 17 Sep 11:59:50.794 * Ready to accept connections

以上是 redis-server 启动成功后的画面。

我们再开一个 session,再次进入 Redis 源码所在的 src 目录,然后使用 GDB 启动 Redis 客户端 redis-cli:

[root@localhost src]# gdb redis-cli

Reading symbols from /root/redis-4.0.9/src/redis-cli…done.

(gdb) r

Starting program: /root/redis-4.0.9/src/redis-cli

[Thread debugging using libthread_db enabled]

Using host libthread_db library “/lib64/libthread_db.so.1”.

127.0.0.1:6379>

以上是 redis-cli 启动成功后的画面。

2、通信示例

本课程的学习目的是研究 Redis 的网络通信模块,为了说明问题方便,我们使用一个简单的通信实例,即通过 redis-cli 产生一个 key 为“hello”、值为“world”的 key-value 数据,然后得到 redis-server 的响应。

127.0.0.1:6379> set hello world

OK

127.0.0.1:6379>

读者需要注意的是,我这里说是一个“简单”的实例,其实并不简单。有两个原因:

  • 我们是在 redis-cli (Redis 客户端)输入的命令,这个命令经 redis-cli 处理后封装成网络通信包,通过客户端的网络通信模块发给 redis-server,然后 redis-server 网络通信模块收到后解析出命令,执行命令后得到结果再封装成相应的网络数据包,返回给 redis-cli。这个过程中涉及到两端的网络通信模块是我们研究和学习的重点
  • redis-server 基本的数据类型都是可以通过类似的命令产生,因此这个例子是一个典型的研究 redis 的典范。

[实战] Redis 网络通信模块源码分析(1)

我们这里先研究 redis-server 端的网络通信模块。除去 Redis 本身的业务功能以外,Redis 的网络通信模块实现思路和细节非常有代表性。由于网络通信模块的设计也是 Linux C++ 后台开发一个很重要的模块,虽然网络上有很多现成的网络库,但是简单易学且可以作为典范的并不多,而 redis-server 就是这方面值得借鉴学习的材料之一。

1、侦听 socket 初始化工作

通过前面课程的介绍,我们知道网络通信在应用层上的大致流程如下:

  • 服务器端创建侦听 socket;
  • 将侦听 socket 绑定到需要的 IP 地址和端口上(调用 Socket API bind 函数);
  • 启动侦听(调用 socket API listen 函数);
  • 无限等待客户端连接到来,调用 Socket API accept 函数接受客户端连接,并产生一个与该客户端对应的客户端 socket;
  • 处理客户端 socket 上网络数据的收发,必要时关闭该 socket。

根据上面的流程,先来探究前三步的流程。由于 redis-server 默认对客户端的端口号是 6379,可以使用这个信息作为依据。

全局搜索一下 Redis 的代码,寻找调用了 bind() 函数的代码,经过过滤和筛选,我们确定了位于 anet.c 的 anetListen() 函数。

static int anetListen(char *err, int s, struct sockaddr *sa, socklen_t len, int backlog) {

if (bind(s,sa,len) == -1) {

anetSetError(err, “bind: %s”, strerror(errno));

close(s);

return ANET_ERR;

}

if (listen(s, backlog) == -1) {

anetSetError(err, “listen: %s”, strerror(errno));

close(s);

return ANET_ERR;

}

return ANET_OK;

}

用 GDB 的 b 命令在这个函数上加个断点,然后重新运行 redis-server:

(gdb) b anetListen

Breakpoint 1 at 0x426cd0: file anet.c, line 440.

(gdb) r

The program being debugged has been started already.

Start it from the beginning? (y or n) y

Starting program: /root/redis-4.0.9/src/redis-server

[Thread debugging using libthread_db enabled]

Using host libthread_db library “/lib64/libthread_db.so.1”.

31546:C 17 Sep 14:20:43.861 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo

31546:C 17 Sep 14:20:43.861 # Redis version=4.0.9, bits=64, commit=00000000, modified=0, pid=31546, just started

31546:C 17 Sep 14:20:43.861 # Warning: no config file specified, using the default config. In order to specify a config file use /root/redis-4.0.9/src/redis-server /path/to/redis.conf

31546:M 17 Sep 14:20:43.862 * Increased maximum number of open files to 10032 (it was originally set to 1024).

Breakpoint 1, anetListen (err=0x745bb0 “”, s=10, sa=0x75dfe0, len=28, backlog=511) at anet.c:440

440 static int anetListen(char *err, int s, struct sockaddr *sa, socklen_t len, int backlog) {

当 GDB 中断在这个函数时,使用 bt 命令查看一下此时的调用堆栈:

(gdb) bt

#0 anetListen (err=0x745bb0 "", s=10, sa=0x75dfe0, len=28, backlog=511) at anet.c:440

#1 0x0000000000426e25 in _anetTcpServer (err=err@entry=0x745bb0 "", port=port@entry=6379, bindaddr=bindaddr@entry=0x0, af=af@entry=10, backlog=511)

at anet.c:487

#2 0x000000000042792d in anetTcp6Server (err=err@entry=0x745bb0 "", port=port@entry=6379, bindaddr=bindaddr@entry=0x0, backlog=)

at anet.c:510

#3 0x000000000042b01f in listenToPort (port=6379, fds=fds@entry=0x745ae4 , count=count@entry=0x745b24 ) at server.c:1728

#4 0x000000000042f917 in initServer () at server.c:1852

#5 0x0000000000423803 in main (argc=, argv=0x7fffffffe588) at server.c:3857

通过这个堆栈,结合堆栈 #2 的 6379 端口号可以确认这就是我们要找的逻辑,并且这个逻辑在主线程(因为从堆栈上看,最顶层堆栈是 main() 函数)中进行。

我们看下堆栈 #1 处的代码:

static int _anetTcpServer(char *err, int port, char *bindaddr, int af, int backlog)

{

int s = -1, rv;

char _port[6]; /* strlen("65535") */

struct addrinfo hints, *servinfo, *p;

snprintf(_port,6,"%d",port);

memset(&hints,0,sizeof(hints));

hints.ai_family = af;

hints.ai_socktype = SOCK_STREAM;

hints.ai_flags = AI_PASSIVE; /* No effect if bindaddr != NULL */

if ((rv = getaddrinfo(bindaddr,_port,&hints,&servinfo)) != 0) {

anetSetError(err, "%s", gai_strerror(rv));

return ANET_ERR;

}

for (p = servinfo; p != NULL; p = p->ai_next) {

if ((s = socket(p->ai_family,p->ai_socktype,p->ai_protocol)) == -1)

continue;

if (af == AF_INET6 && anetV6Only(err,s) == ANET_ERR) goto error;

if (anetSetReuseAddr(err,s) == ANET_ERR) goto error;

if (anetListen(err,s,p->ai_addr,p->ai_addrlen,backlog) == ANET_ERR) goto error;

goto end;

}

if (p == NULL) {

anetSetError(err, "unable to bind socket, errno: %d", errno);

goto error;

}

error:

if (s != -1) close(s);

s = ANET_ERR;

end:

freeaddrinfo(servinfo);

return s;

}

将堆栈切换至 #1,然后输入 info arg 查看传入给这个函数的参数:

(gdb) f 1

#1 0x0000000000426e25 in _anetTcpServer (err=err@entry=0x745bb0 "", port=port@entry=6379, bindaddr=bindaddr@entry=0x0, af=af@entry=10, backlog=511)

at anet.c:487

487 if (anetListen(err,s,p->ai_addr,p->ai_addrlen,backlog) == ANET_ERR) s = ANET_ERR;

(gdb) info args

err = 0x745bb0 ""

port = 6379

bindaddr = 0x0

af = 10

backlog = 511

使用系统 API getaddrinfo 来解析得到当前主机的 IP 地址和端口信息。这里没有选择使用 gethostbyname 这个 API 是因为 gethostbyname 仅能用于解析 ipv4 相关的主机信息,而 getaddrinfo 既可以用于 ipv4 也可以用于 ipv6 ,这个函数的签名如下:

int getaddrinfo(const char *node, const char *service,

const struct addrinfo *hints,

struct addrinfo **res);

这个函数的具体用法可以在 Linux man 手册上查看。通常服务器端在调用 getaddrinfo 之前,将 hints 参数的 ai_flags 设置为 AI_PASSIVE,用于 bind;主机名 nodename 通常会设置为 NULL,返回通配地址 [::]。当然,客户端调用 getaddrinfo 时,hints 参数的 ai_flags 一般不设置 AI_PASSIVE,但是主机名 node 和服务名 service(更愿意称之为端口)则应该不为空。

解析完协议信息后,利用得到的协议信息创建侦听 socket,并开启该 socket 的 reuseAddr 选项。然后调用 anetListen 函数,在该函数中先 bind 后 listen。至此,redis-server 就可以在 6379 端口上接受客户端连接了。

2、接受客户端连接

同样的道理,要研究 redis-server 如何接受客户端连接,只要搜索 socket API accept 函数即可。

经定位,我们最终在 anet.c 文件中找到 anetGenericAccept 函数:

static int anetGenericAccept(char *err, int s, struct sockaddr *sa, socklen_t *len) {

int fd;

while(1) {

fd = accept(s,sa,len);

if (fd == -1) {

if (errno == EINTR)

continue;

else {

anetSetError(err, “accept: %s”, strerror(errno));

return ANET_ERR;

}

}

break;

}

return fd;

}

我们用 b 命令在这个函数处加个断点,然后重新运行 redis-server。一直到程序全部运行起来,GDB 都没有触发该断点,这时新打开一个 redis-cli,以模拟新客户端连接到 redis-server 上的行为。断点触发了,此时查看一下调用堆栈。

Breakpoint 2, anetGenericAccept (err=0x745bb0 "", s=s@entry=11, sa=sa@entry=0x7fffffffe2b0, len=len@entry=0x7fffffffe2ac) at anet.c:531

531 static int anetGenericAccept(char *err, int s, struct sockaddr *sa, socklen_t *len) {

(gdb) bt

#0 anetGenericAccept (err=0x745bb0 "", s=s@entry=11, sa=sa@entry=0x7fffffffe2b0, len=len@entry=0x7fffffffe2ac) at anet.c:531

#1 0x0000000000427a1d in anetTcpAccept (err=, s=s@entry=11, ip=ip@entry=0x7fffffffe370 "317P237[", ip_len=ip_len@entry=46,

port=port@entry=0x7fffffffe36c) at anet.c:552

#2 0x0000000000437fb1 in acceptTcpHandler (el=, fd=11, privdata=, mask=) at networking.c:689

#3 0x00000000004267f0 in aeProcessEvents (eventLoop=eventLoop@entry=0x7ffff083a0a0, flags=flags@entry=11) at ae.c:440

#4 0x0000000000426adb in aeMain (eventLoop=0x7ffff083a0a0) at ae.c:498

#5 0x00000000004238ef in main (argc=, argv=0x7fffffffe588) at server.c:3894

分析这个调用堆栈,梳理一下这个调用流程。在 main 函数的 initServer 函数中创建侦听 socket、绑定地址然后开启侦听,接着调用 aeMain 函数启动一个循环不断地处理“事件”。

void aeMain(aeEventLoop *eventLoop) {

eventLoop->stop = 0;

while (!eventLoop->stop) {

if (eventLoop->beforesleep != NULL)

eventLoop->beforesleep(eventLoop);

aeProcessEvents(eventLoop, AE_ALL_EVENTS|AE_CALL_AFTER_SLEEP);

}

}

循环的退出条件是 eventLoop→stop 为 1。事件处理的代码如下:

int aeProcessEvents(aeEventLoop *eventLoop, int flags)

{

int processed = 0, numevents;

/* Nothing to do? return ASAP */

if (!(flags & AE_TIME_EVENTS) && !(flags & AE_FILE_EVENTS)) return 0;

/* Note that we want call select() even if there are no

* file events to process as long as we want to process time

* events, in order to sleep until the next time event is ready

* to fire. */

if (eventLoop->maxfd != -1 ||

((flags & AE_TIME_EVENTS) && !(flags & AE_DONT_WAIT))) {

int j;

aeTimeEvent *shortest = NULL;

struct timeval tv, *tvp;

if (flags & AE_TIME_EVENTS && !(flags & AE_DONT_WAIT))

shortest = aeSearchNearestTimer(eventLoop);

if (shortest) {

long now_sec, now_ms;

aeGetTime(&now_sec, &now_ms);

tvp = &tv;

/* How many milliseconds we need to wait for the next

* time event to fire? */

long long ms =

(shortest->when_sec - now_sec)*1000 +

shortest->when_ms - now_ms;

if (ms > 0) {

tvp->tv_sec = ms/1000;

tvp->tv_usec = (ms % 1000)*1000;

} else {

tvp->tv_sec = 0;

tvp->tv_usec = 0;

}

} else {

/* If we have to check for events but need to return

* ASAP because of AE_DONT_WAIT we need to set the timeout

* to zero */

if (flags & AE_DONT_WAIT) {

tv.tv_sec = tv.tv_usec = 0;

tvp = &tv;

} else {

/* Otherwise we can block */

tvp = NULL; /* wait forever */

}

}

/* Call the multiplexing API, will return only on timeout or when

* some event fires. */

numevents = aeApiPoll(eventLoop, tvp);

/* After sleep callback. */

if (eventLoop->aftersleep != NULL && flags & AE_CALL_AFTER_SLEEP)

eventLoop->aftersleep(eventLoop);

for (j = 0; j < numevents; j++) {

aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];

int mask = eventLoop->fired[j].mask;

int fd = eventLoop->fired[j].fd;

int rfired = 0;

/* note the fe->mask & mask & ... code: maybe an already processed

* event removed an element that fired and we still didn't

* processed, so we check if the event is still valid. */

if (fe->mask & mask & AE_READABLE) {

rfired = 1;

fe->rfileProc(eventLoop,fd,fe->clientData,mask);

}

if (fe->mask & mask & AE_WRITABLE) {

if (!rfired || fe->wfileProc != fe->rfileProc)

fe->wfileProc(eventLoop,fd,fe->clientData,mask);

}

processed++;

}

}

/* Check time events */

if (flags & AE_TIME_EVENTS)

processed += processTimeEvents(eventLoop);

return processed; /* return the number of processed file/time events */

}

这段代码先通过 flag 参数检查是否有事件需要处理。如果有定时器事件( AE_TIME_EVENTS 标志 ),则寻找最近要到期的定时器。

/* Search the first timer to fire.

* This operation is useful to know how many time the select can be

* put in sleep without to delay any event.

* If there are no timers NULL is returned.

*

* Note that's O(N) since time events are unsorted.

* Possible optimizations (not needed by Redis so far, but...):

* 1) Insert the event in order, so that the nearest is just the head.

* Much better but still insertion or deletion of timers is O(N).

* 2) Use a skiplist to have this operation as O(1) and insertion as O(log(N)).

*/

static aeTimeEvent *aeSearchNearestTimer(aeEventLoop *eventLoop)

{

aeTimeEvent *te = eventLoop->timeEventHead;

aeTimeEvent *nearest = NULL;

while(te) {

if (!nearest || te->when_sec < nearest->when_sec ||

(te->when_sec == nearest->when_sec &&

te->when_ms < nearest->when_ms))

nearest = te;

te = te->next;

}

return nearest;

}

这段代码有详细的注释,也非常好理解。注释告诉我们,由于这里的定时器集合是无序的,所以需要遍历一下这个链表,算法复杂度是 O(n) 。同时,注释中也“暗示”了我们将来 Redis 在这块的优化方向,即把这个链表按到期时间从小到大排序,这样链表的头部就是我们要的最近时间点的定时器对象,算法复杂度是 O(1) 。或者使用 Redis 中的 skiplist ,算法复杂度是 O(log(N)) 。

接着获取当前系统时间( aeGetTime(&now_sec, &now_ms); )将最早要到期的定时器时间减去当前系统时间获得一个间隔。这个时间间隔作为 numevents = aeApiPoll(eventLoop, tvp); 调用的参数,aeApiPoll() 在 Linux 平台上使用 epoll 技术,Redis 在这个 IO 复用技术上、在不同的操作系统平台上使用不同的系统函数,在 Windows 系统上使用 select,在 Mac 系统上使用 kqueue。这里重点看下 Linux 平台下的实现:

static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {

aeApiState *state = eventLoop->apidata;

int retval, numevents = 0;

retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,

tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);

if (retval > 0) {

int j;

numevents = retval;

for (j = 0; j < numevents; j++) {

int mask = 0;

struct epoll_event *e = state->events+j;

if (e->events & EPOLLIN) mask |= AE_READABLE;

if (e->events & EPOLLOUT) mask |= AE_WRITABLE;

if (e->events & EPOLLERR) mask |= AE_WRITABLE;

if (e->events & EPOLLHUP) mask |= AE_WRITABLE;

eventLoop->fired[j].fd = e->data.fd;

eventLoop->fired[j].mask = mask;

}

}

return numevents;

}

epoll_wait 这个函数的签名如下:

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

最后一个参数 timeout 的设置非常有讲究,如果传入进来的 tvp 是 NULL ,根据上文的分析,说明没有定时器事件,则将等待时间设置为 -1 ,这会让 epoll_wait 无限期地挂起来,直到有事件时才会被唤醒。挂起的好处就是不浪费 CPU 时间片。反之,将 timeout 设置成最近的定时器事件间隔,将 epoll_wait 的等待时间设置为最近的定时器事件来临的时间间隔,可以及时唤醒 epoll_wait ,这样程序流可以尽快处理这个到期的定时器事件(下文会介绍)。

对于 epoll_wait 这种系统调用,所有的 fd(对于网络通信,也叫 socket)信息包括侦听 fd 和普通客户端 fd 都记录在事件循环对象 aeEventLoop 的 apidata 字段中,当某个 fd 上有事件触发时,从 apidata 中找到该 fd,并把事件类型(mask 字段)一起记录到 aeEventLoop 的 fired 字段中去。我们先把这个流程介绍完,再介绍 epoll_wait 函数中使用的 epfd 是在何时何地创建的,侦听 fd、客户端 fd 是如何挂载到 epfd 上去的。

在得到了有事件的 fd 以后,接下来就要处理这些事件了。在主循环 aeProcessEvents 中从 aeEventLoop 对象的 fired 数组中取出上一步记录的 fd,然后根据事件类型(读事件和写事件)分别进行处理。

for (j = 0; j < numevents; j++) {

aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];

int mask = eventLoop->fired[j].mask;

int fd = eventLoop->fired[j].fd;

int rfired = 0;

/* note the fe->mask & mask & ... code: maybe an already processed

* event removed an element that fired and we still didn't

* processed, so we check if the event is still valid. */

if (fe->mask & mask & AE_READABLE) {

rfired = 1;

fe->rfileProc(eventLoop,fd,fe->clientData,mask);

}

if (fe->mask & mask & AE_WRITABLE) {

if (!rfired || fe->wfileProc != fe->rfileProc)

fe->wfileProc(eventLoop,fd,fe->clientData,mask);

}

processed++;

}

读事件字段 rfileProc 和写事件字段 wfileProc 都是函数指针,在程序早期设置好,这里直接调用就可以了。

typedef void aeFileProc(struct aeEventLoop *eventLoop, int fd, void *clientData, int mask);

/* File event structure */

typedef struct aeFileEvent {

int mask; /* one of AE_(READABLE|WRITABLE) */

aeFileProc *rfileProc;

aeFileProc *wfileProc;

void *clientData;

} aeFileEvent;

3、EPFD 的创建

我们通过搜索关键字 epoll_create 在 ae_poll.c 文件中找到 EPFD 的创建函数 aeApiCreate 。

static int aeApiCreate(aeEventLoop *eventLoop) {

aeApiState *state = zmalloc(sizeof(aeApiState));

if (!state) return -1;

state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize);

if (!state->events) {

zfree(state);

return -1;

}

state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */

if (state->epfd == -1) {

zfree(state->events);

zfree(state);

return -1;

}

eventLoop->apidata = state;

return 0;

}

使用 GDB 的 b 命令在这个函数上加个断点,然后使用 run 命令重新运行一下 redis-server,触发断点,使用 bt 命令查看此时的调用堆栈。发现 EPFD 也是在上文介绍的 initServer 函数中创建的。

(gdb) bt

#0 aeCreateEventLoop (setsize=10128) at ae.c:79

#1 0x000000000042f542 in initServer () at server.c:1841

#2 0x0000000000423803 in main (argc=, argv=0x7fffffffe588) at server.c:3857

在 aeCreateEventLoop 中不仅创建了 EPFD,也创建了整个事件循环需要的 aeEventLoop 对象,并把这个对象记录在 Redis 的一个全局变量的 el 字段中。这个全局变量叫 server,这是一个结构体类型。其定义如下:

//位于 server.c 文件中

struct redisServer server; /* Server global state */

//位于 server.h 文件中

struct redisServer {

/* General */

//省略部分字段…

aeEventLoop *el;

unsigned int lruclock; /* Clock for LRU eviction */

//太长了,省略部分字段…

}

[实战] Redis 网络通信模块源码分析(2)

接着上一课的内容继续分析。

1、侦听 fd 与客户端 fd 是如何挂载到 EPFD 上去的

同样的方式,要把一个 fd 挂载到 EPFD 上去,需要调用系统 API epoll_ctl ,搜索一下这个函数名。在文件 ae_epoll.c 中我们找到 aeApiAddEvent 函数:

static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {

aeApiState *state = eventLoop->apidata;

struct epoll_event ee = {0}; /* avoid valgrind warning */

/* If the fd was already monitored for some event, we need a MOD

* operation. Otherwise we need an ADD operation. */

int op = eventLoop->events[fd].mask == AE_NONE ?

EPOLL_CTL_ADD : EPOLL_CTL_MOD;

ee.events = 0;

mask |= eventLoop->events[fd].mask; /* Merge old events */

if (mask & AE_READABLE) ee.events |= EPOLLIN;

if (mask & AE_WRITABLE) ee.events |= EPOLLOUT;

ee.data.fd = fd;

if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return -1;

return 0;

}

当把一个 fd 绑定到 EPFD 上去的时候,先从 eventLoop( aeEventLoop类型 )中寻找是否存在已关注的事件类型,如果已经有了,说明使用 epoll_ctl 是更改已绑定的 fd 事件类型( EPOLL_CTL_MOD ),否则就是添加 fd 到 EPFD 上。

在 aeApiAddEvent 加个断点,再重启下 redis-server 。触发断点后的调用堆栈如下:

#0 aeCreateFileEvent (eventLoop=0x7ffff083a0a0, fd=15, mask=mask@entry=1, proc=0x437f50 , clientData=clientData@entry=0x0) at ae.c:145

#1 0x000000000042f83b in initServer () at server.c:1927

#2 0x0000000000423803 in main (argc=, argv=0x7fffffffe588) at server.c:3857

同样在 initServer 函数中,结合上文分析的侦听 fd 的创建过程,去掉无关代码,抽出这个函数的主脉络得到如下伪代码:

void initServer(void) {

//记录程序进程 ID

server.pid = getpid();

//创建程序的 aeEventLoop 对象和 epfd 对象

server.el = aeCreateEventLoop(server.maxclients+CONFIG_FDSET_INCR);

//创建侦听 fd

listenToPort(server.port,server.ipfd,&server.ipfd_count) == C_ERR)

//将侦听 fd 设置为非阻塞的

anetNonBlock(NULL,server.sofd);

//创建 Redis 的定时器,用于执行定时任务 cron

/* Create the timer callback, this is our way to process many background

* operations incrementally, like clients timeout, eviction of unaccessed

* expired keys and so forth. */

aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL) == AE_ERR

//将侦听 fd 绑定到 epfd 上去

/* Create an event handler for accepting new connections in TCP and Unix

* domain sockets. */

aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE, acceptTcpHandler,NULL) == AE_ERR

//创建一个管道,用于在需要时去唤醒 epoll_wait 挂起的整个 EventLoop

/* Register a readable event for the pipe used to awake the event loop

* when a blocked client in a module needs attention. */

aeCreateFileEvent(server.el, server.module_blocked_pipe[0], AE_READABLE, moduleBlockedClientPipeReadable,NULL) == AE_ERR)

}

注意:这里所说的“主脉络”是指我们关心的网络通信的主脉络,不代表这个函数中其他代码就不是主要的。

如何验证这个断点处挂载到 EPFD 上的 fd 就是侦听 fd 呢?很简单,创建侦听 fd 时,用 GDB 记录下这个 fd 的值。例如,当我的电脑某次运行时,侦听 fd 的值是 15 。如下图( 调试工具用的是 CGDB ):




redis 设置内网可用 redis网络_客户端


然后在运行程序至绑定 fd 的地方,确认一下绑定到 EPFD 上的 fd 值:


redis 设置内网可用 redis网络_Redis_02


这里的 fd 值也是 15 ,说明绑定的 fd 是侦听 fd 。当然在绑定侦听 fd 时,同时也指定了只关注可读事件,并设置事件回调函数为 acceptTcpHandler 。对于侦听 fd ,一般只要关注可读事件就可以了,当触发可读事件,说明有新的连接到来。

aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE, acceptTcpHandler,NULL) == AE_ERR

acceptTcpHandler 函数定义如下( 位于文件 networking.c 中 ):

void acceptTcpHandler(aeEventLoop *el, int fd, void *privdata, int mask) {

int cport, cfd, max = MAX_ACCEPTS_PER_CALL;

char cip[NET_IP_STR_LEN];

UNUSED(el);

UNUSED(mask);

UNUSED(privdata);

while(max--) {

cfd = anetTcpAccept(server.neterr, fd, cip, sizeof(cip), &cport);

if (cfd == ANET_ERR) {

if (errno != EWOULDBLOCK)

serverLog(LL_WARNING,

"Accepting client connection: %s", server.neterr);

return;

}

serverLog(LL_VERBOSE,"Accepted %s:%d", cip, cport);

acceptCommonHandler(cfd,0,cip);

}

}

anetTcpAccept 函数中调用的就是我们上面说的 anetGenericAccept 函数了。

int anetTcpAccept(char *err, int s, char *ip, size_t ip_len, int *port) {

int fd;

struct sockaddr_storage sa;

socklen_t salen = sizeof(sa);

if ((fd = anetGenericAccept(err,s,(struct sockaddr*)&sa,&salen)) == -1)

return ANET_ERR;

if (sa.ss_family == AF_INET) {

struct sockaddr_in *s = (struct sockaddr_in *)&sa;

if (ip) inet_ntop(AF_INET,(void*)&(s->sin_addr),ip,ip_len);

if (port) *port = ntohs(s->sin_port);

} else {

struct sockaddr_in6 *s = (struct sockaddr_in6 *)&sa;

if (ip) inet_ntop(AF_INET6,(void*)&(s->sin6_addr),ip,ip_len);

if (port) *port = ntohs(s->sin6_port);

}

return fd;

}

至此,这段流程总算连起来了,在 acceptTcpHandler 上加个断点,然后重新运行一下 redis-server ,再开个 redis-cli 去连接 redis-server 。看看是否能触发该断点,如果能触发该断点,说明我们的分析是正确的。

经验证,确实触发了该断点。


redis 设置内网可用 redis网络_redis 设置内网可用_03


在 acceptTcpHandler 中成功接受新连接后,产生客户端 fd ,然后调用 acceptCommonHandler 函数,在该函数中调用 createClient 函数,在 createClient 函数中先将客户端 fd 设置成非阻塞的,然后将该 fd 关联到 EPFD 上去,同时记录到整个程序的 aeEventLoop 对象上。

注意:这里客户端 fd 绑定到 EPFD 上时也只关注可读事件。将无关的代码去掉,然后抽出我们关注的部分,整理后如下( 位于 networking.c 文件中 ):

client *createClient(int fd) {

//将客户端 fd 设置成非阻塞的

anetNonBlock(NULL,fd);

//启用 tcp NoDelay 选项

anetEnableTcpNoDelay(NULL,fd);

//根据配置,决定是否启动 tcpkeepalive 选项

if (server.tcpkeepalive)

anetKeepAlive(NULL,fd,server.tcpkeepalive);

//将客户端 fd 绑定到 epfd,同时记录到 aeEventLoop 上,关注的事件为 AE_READABLE,回调函数为

//readQueryFromClient

aeCreateFileEvent(server.el,fd,AE_READABLE, readQueryFromClient, c) == AE_ERR

return c;

}

2、如何处理 fd 可读事件

客户端 fd 触发可读事件后,回调函数是 readQueryFromClient 。该函数实现如下( 位于 networking.c 文件中):

void readQueryFromClient(aeEventLoop *el, int fd, void *privdata, int mask) {

client *c = (client*) privdata;

int nread, readlen;

size_t qblen;

UNUSED(el);

UNUSED(mask);

readlen = PROTO_IOBUF_LEN;

/* If this is a multi bulk request, and we are processing a bulk reply

* that is large enough, try to maximize the probability that the query

* buffer contains exactly the SDS string representing the object, even

* at the risk of requiring more read(2) calls. This way the function

* processMultiBulkBuffer() can avoid copying buffers to create the

* Redis Object representing the argument. */

if (c->reqtype == PROTO_REQ_MULTIBULK && c->multibulklen && c->bulklen != -1

&& c->bulklen >= PROTO_MBULK_BIG_ARG)

{

int remaining = (unsigned)(c->bulklen+2)-sdslen(c->querybuf);

if (remaining < readlen) readlen = remaining;

}

qblen = sdslen(c->querybuf);

if (c->querybuf_peak < qblen) c->querybuf_peak = qblen;

c->querybuf = sdsMakeRoomFor(c->querybuf, readlen);

nread = read(fd, c->querybuf+qblen, readlen);

if (nread == -1) {

if (errno == EAGAIN) {

return;

} else {

serverLog(LL_VERBOSE, "Reading from client: %s",strerror(errno));

freeClient(c);

return;

}

} else if (nread == 0) {

serverLog(LL_VERBOSE, "Client closed connection");

freeClient(c);

return;

} else if (c->flags & CLIENT_MASTER) {

/* Append the query buffer to the pending (not applied) buffer

* of the master. We'll use this buffer later in order to have a

* copy of the string applied by the last command executed. */

c->pending_querybuf = sdscatlen(c->pending_querybuf,

c->querybuf+qblen,nread);

}

sdsIncrLen(c->querybuf,nread);

c->lastinteraction = server.unixtime;

if (c->flags & CLIENT_MASTER) c->read_reploff += nread;

server.stat_net_input_bytes += nread;

if (sdslen(c->querybuf) > server.client_max_querybuf_len) {

sds ci = catClientInfoString(sdsempty(),c), bytes = sdsempty();

bytes = sdscatrepr(bytes,c->querybuf,64);

serverLog(LL_WARNING,"Closing client that reached max query buffer length: %s (qbuf initial bytes: %s)", ci, bytes);

sdsfree(ci);

sdsfree(bytes);

freeClient(c);

return;

}

/* Time to process the buffer. If the client is a master we need to

* compute the difference between the applied offset before and after

* processing the buffer, to understand how much of the replication stream

* was actually applied to the master state: this quantity, and its

* corresponding part of the replication stream, will be propagated to

* the sub-slaves and to the replication backlog. */

if (!(c->flags & CLIENT_MASTER)) {

processInputBuffer(c);

} else {

size_t prev_offset = c->reploff;

processInputBuffer(c);

size_t applied = c->reploff - prev_offset;

if (applied) {

replicationFeedSlavesFromMasterStream(server.slaves,

c->pending_querybuf, applied);

sdsrange(c->pending_querybuf,applied,-1);

}

}

}

给这个函数加个断点,然后重新运行下 redis-server ,再启动一个客户端,然后尝试给服务器发送一个命令“set hello world”。但是在我们实际调试的时候会发现。只要 redis-cli 一连接成功,GDB 就触发该断点,此时并没有发送我们预想的命令。我们单步调试 readQueryFromClient 函数,将收到的数据打印出来,得到如下字符串:

(gdb) p c->querybuf

$8 = (sds) 0x7ffff09b8685 "*1$7COMMAND"

c → querybuf 是什么呢?这里 c 的类型是 client 结构体,它是上文中连接接收成功后产生的新客户端 fd 绑定回调函数时产生的、并传递给 readQueryFromClient 函数的参数。我们可以在 server.h 中找到它的定义:

* With multiplexing we need to take per-client state.

* Clients are taken in a linked list. */

typedef struct client {

uint64_t id; /* Client incremental unique ID. */

int fd; /* Client socket. */

redisDb *db; /* Pointer to currently SELECTed DB. */

robj *name; /* As set by CLIENT SETNAME. */

sds querybuf; /* Buffer we use to accumulate client queries. */

//省略掉部分字段

} client;

client 实际上是存储每个客户端连接信息的对象,其 fd 字段就是当前连接的 fd,querybuf 字段就是当前连接的接收缓冲区,也就是说每个新客户端连接都会产生这样一个对象。从 fd 上收取数据后就存储在这个 querybuf 字段中。

我们贴一下完整的 createClient 函数的代码:

client *createClient(int fd) {

client *c = zmalloc(sizeof(client));

/* passing -1 as fd it is possible to create a non connected client.

* This is useful since all the commands needs to be executed

* in the context of a client. When commands are executed in other

* contexts (for instance a Lua script) we need a non connected client. */

if (fd != -1) {

anetNonBlock(NULL,fd);

anetEnableTcpNoDelay(NULL,fd);

if (server.tcpkeepalive)

anetKeepAlive(NULL,fd,server.tcpkeepalive);

if (aeCreateFileEvent(server.el,fd,AE_READABLE,

readQueryFromClient, c) == AE_ERR)

{

close(fd);

zfree(c);

return NULL;

}

}

selectDb(c,0);

uint64_t client_id;

atomicGetIncr(server.next_client_id,client_id,1);

c->id = client_id;

c->fd = fd;

c->name = NULL;

c->bufpos = 0;

c->querybuf = sdsempty();

c->pending_querybuf = sdsempty();

c->querybuf_peak = 0;

c->reqtype = 0;

c->argc = 0;

c->argv = NULL;

c->cmd = c->lastcmd = NULL;

c->multibulklen = 0;

c->bulklen = -1;

c->sentlen = 0;

c->flags = 0;

c->ctime = c->lastinteraction = server.unixtime;

c->authenticated = 0;

c->replstate = REPL_STATE_NONE;

c->repl_put_online_on_ack = 0;

c->reploff = 0;

c->read_reploff = 0;

c->repl_ack_off = 0;

c->repl_ack_time = 0;

c->slave_listening_port = 0;

c->slave_ip[0] = '0';

c->slave_capa = SLAVE_CAPA_NONE;

c->reply = listCreate();

c->reply_bytes = 0;

c->obuf_soft_limit_reached_time = 0;

listSetFreeMethod(c->reply,freeClientReplyValue);

listSetDupMethod(c->reply,dupClientReplyValue);

c->btype = BLOCKED_NONE;

c->bpop.timeout = 0;

c->bpop.keys = dictCreate(&objectKeyPointerValueDictType,NULL);

c->bpop.target = NULL;

c->bpop.numreplicas = 0;

c->bpop.reploffset = 0;

c->woff = 0;

c->watched_keys = listCreate();

c->pubsub_channels = dictCreate(&objectKeyPointerValueDictType,NULL);

c->pubsub_patterns = listCreate();

c->peerid = NULL;

listSetFreeMethod(c->pubsub_patterns,decrRefCountVoid);

listSetMatchMethod(c->pubsub_patterns,listMatchObjects);

if (fd != -1) listAddNodeTail(server.clients,c);

initClientMultiState(c);

return c;

}

[实战] Redis 网络通信模块源码分析(3)

接着上一课的内容继续分析。

1、redis-server 接收到客户端的第一条命令

redis-cli 给 redis-server 发送的第一条数据是 *1$7COMMAND 。我们来看下对于这条数据如何处理,单步调试一下 readQueryFromClient 调用 read 函数收取完数据,接着继续处理 c→querybuf 的代码即可。经实际跟踪调试,调用的是 processInputBuffer 函数,位于 networking.c 文件中:

/* This function is called every time, in the client structure 'c', there is

* more query buffer to process, because we read more data from the socket

* or because a client was blocked and later reactivated, so there could be

* pending query buffer, already representing a full command, to process. */

void processInputBuffer(client *c) {

server.current_client = c;

/* Keep processing while there is something in the input buffer */

while(sdslen(c->querybuf)) {

/* Return if clients are paused. */

if (!(c->flags & CLIENT_SLAVE) && clientsArePaused()) break;

/* Immediately abort if the client is in the middle of something. */

if (c->flags & CLIENT_BLOCKED) break;

/* CLIENT_CLOSE_AFTER_REPLY closes the connection once the reply is

* written to the client. Make sure to not let the reply grow after

* this flag has been set (i.e. don't process more commands).

*

* The same applies for clients we want to terminate ASAP. */

if (c->flags & (CLIENT_CLOSE_AFTER_REPLY|CLIENT_CLOSE_ASAP)) break;

/* Determine request type when unknown. */

if (!c->reqtype) {

if (c->querybuf[0] == '*') {

c->reqtype = PROTO_REQ_MULTIBULK;

} else {

c->reqtype = PROTO_REQ_INLINE;

}

}

if (c->reqtype == PROTO_REQ_INLINE) {

if (processInlineBuffer(c) != C_OK) break;

} else if (c->reqtype == PROTO_REQ_MULTIBULK) {

if (processMultibulkBuffer(c) != C_OK) break;

} else {

serverPanic("Unknown request type");

}

/* Multibulk processing could see a <= 0 length. */

if (c->argc == 0) {

resetClient(c);

} else {

/* Only reset the client when the command was executed. */

if (processCommand(c) == C_OK) {

if (c->flags & CLIENT_MASTER && !(c->flags & CLIENT_MULTI)) {

/* Update the applied replication offset of our master. */

c->reploff = c->read_reploff - sdslen(c->querybuf);

}

/* Don't reset the client structure for clients blocked in a

* module blocking command, so that the reply callback will

* still be able to access the client argv and argc field.

* The client will be reset in unblockClientFromModule(). */

if (!(c->flags & CLIENT_BLOCKED) || c->btype != BLOCKED_MODULE)

resetClient(c);

}

/* freeMemoryIfNeeded may flush slave output buffers. This may

* result into a slave, that may be the active client, to be

* freed. */

if (server.current_client == NULL) break;

}

}

server.current_client = NULL;

}

processInputBuffer 先判断接收到的字符串是不是以星号( * )开头,这里是以星号开头,然后设置 client 对象的 reqtype 字段值为 PROTO_REQ_MULTIBULK 类型,接着调用 processMultibulkBuffer 函数继续处理剩余的字符串。处理后的字符串被解析成 redis 命令,记录在 client 对象的 argc 和 argv 两个字段中,前者记录当前命令的数目,后者存储的是命令对应结构体对象的地址。这些命令的相关内容不是我们本课程的关注点,不再赘述。

命令解析完成以后,从 processMultibulkBuffer 函数返回,在 processCommand 函数中处理刚才记录在 client 对象 argv 字段中的命令。

//为了与原代码保持一致,代码缩进未调整

if (c->argc == 0) {

resetClient(c);

} else {

/* Only reset the client when the command was executed. */

if (processCommand(c) == C_OK) {

//省略部分代码

}

}

在 processCommand 函数中处理命令,流程大致如下:

(1)先判断是不是 quit 命令,如果是,则往发送缓冲区中添加一条应答命令( 应答 redis 客户端 ),并给当前 client 对象设置 CLIENT_CLOSE_AFTER_REPLY 标志,这个标志见名知意,即应答完毕后关闭连接。

(2)如果不是 quit 命令,则使用 lookupCommand 函数从全局命令字典表中查找相应的命令,如果出错,则向发送缓冲区中添加出错应答。出错不是指程序逻辑出错,有可能是客户端发送的非法命令。如果找到相应的命令,则执行命令后添加应答。

int processCommand(client *c) {

/* The QUIT command is handled separately. Normal command procs will

* go through checking for replication and QUIT will cause trouble

* when FORCE_REPLICATION is enabled and would be implemented in

* a regular command proc. */

if (!strcasecmp(c->argv[0]->ptr,"quit")) {

addReply(c,shared.ok);

c->flags |= CLIENT_CLOSE_AFTER_REPLY;

return C_ERR;

}

/* Now lookup the command and check ASAP about trivial error conditions

* such as wrong arity, bad command name and so forth. */

c->cmd = c->lastcmd = lookupCommand(c->argv[0]->ptr);

if (!c->cmd) {

flagTransaction(c);

addReplyErrorFormat(c,"unknown command '%s'",

(char*)c->argv[0]->ptr);

return C_OK;

} else if ((c->cmd->arity > 0 && c->cmd->arity != c->argc) ||

(c->argc < -c->cmd->arity)) {

flagTransaction(c);

addReplyErrorFormat(c,"wrong number of arguments for '%s' command",

c->cmd->name);

return C_OK;

}

//...省略部分代码

}

全局字典表是前面介绍的 server 全局变量(类型是 redisServer)的一个字段 commands 。

struct redisServer {

/* General */

pid_t pid; /* Main process pid. */

//无关字段省略

dict *commands; /* Command table */

//无关字段省略

}

至于这个全局字典表在哪里初始化以及相关的数据结构类型,由于与本课程主题无关,这里就不分析了。

下面重点探究如何将应答命令(包括出错的应答)添加到发送缓冲区去。我们以添加一个“ok”命令为例:

void addReply(client *c, robj *obj) {

if (prepareClientToWrite(c) != C_OK) return;

/* This is an important place where we can avoid copy-on-write

* when there is a saving child running, avoiding touching the

* refcount field of the object if it's not needed.

*

* If the encoding is RAW and there is room in the static buffer

* we'll be able to send the object to the client without

* messing with its page. */

if (sdsEncodedObject(obj)) {

if (_addReplyToBuffer(c,obj->ptr,sdslen(obj->ptr)) != C_OK)

_addReplyObjectToList(c,obj);

} else if (obj->encoding == OBJ_ENCODING_INT) {

/* Optimization: if there is room in the static buffer for 32 bytes

* (more than the max chars a 64 bit integer can take as string) we

* avoid decoding the object and go for the lower level approach. */

if (listLength(c->reply) == 0 && (sizeof(c->buf) - c->bufpos) >= 32) {

char buf[32];

int len;

len = ll2string(buf,sizeof(buf),(long)obj->ptr);

if (_addReplyToBuffer(c,buf,len) == C_OK)

return;

/* else... continue with the normal code path, but should never

* happen actually since we verified there is room. */

}

obj = getDecodedObject(obj);

if (_addReplyToBuffer(c,obj->ptr,sdslen(obj->ptr)) != C_OK)

_addReplyObjectToList(c,obj);

decrRefCount(obj);

} else {

serverPanic("Wrong obj->encoding in addReply()");

}

}

addReply 函数中有两个关键的地方,一个是 prepareClientToWrite 函数调用,另外一个是 _addReplyToBuffer 函数调用。先来看 prepareClientToWrite ,这个函数中有这样一段代码:

if (!clientHasPendingReplies(c) &&

!(c->flags & CLIENT_PENDING_WRITE) &&

(c->replstate == REPL_STATE_NONE ||

(c->replstate == SLAVE_STATE_ONLINE && !c->repl_put_online_on_ack)))

{

/* Here instead of installing the write handler, we just flag the

* client and put it into a list of clients that have something

* to write to the socket. This way before re-entering the event

* loop, we can try to directly write to the client sockets avoiding

* a system call. We'll only really install the write handler if

* we'll not be able to write the whole reply at once. */

c->flags |= CLIENT_PENDING_WRITE;

listAddNodeHead(server.clients_pending_write,c);

}

这段代码先判断发送缓冲区中是否还有未发送的应答命令——通过判断 client 对象的 bufpos 字段( int 型 )和 reply 字段( 这是一个链表 )的长度是否大于 0 。

/* Return true if the specified client has pending reply buffers to write to

* the socket. */

int clientHasPendingReplies(client *c) {

return c->bufpos || listLength(c->reply);

}

如果当前 client 对象不是处于 CLIENT_PENDING_WRITE 状态,且在发送缓冲区没有剩余数据,则给该 client 对象设置 CLIENT_PENDING_WRITE 标志,并将当前 client 对象添加到全局 server 对象的名叫 clients_pending_write 链表中去。这个链表中存的是所有有数据要发送的 client 对象,注意和上面说的 reply 链表区分开来。

关于 CLIENT_PENDING_WRITE 标志,redis 解释是:

Client has output to send but a write handler is yet not installed

翻译成中文就是:一个有数据需要发送,但是还没有注册可写事件的 client 对象。

下面讨论 _addReplyToBuffer 函数,位于 networking.c 文件中。

int _addReplyToBuffer(client *c, const char *s, size_t len) {

size_t available = sizeof(c->buf)-c->bufpos;

if (c->flags & CLIENT_CLOSE_AFTER_REPLY) return C_OK;

/* If there already are entries in the reply list, we cannot

* add anything more to the static buffer. */

if (listLength(c->reply) > 0) return C_ERR;

/* Check that the buffer has enough space available for this string. */

if (len > available) return C_ERR;

memcpy(c->buf+c->bufpos,s,len);

c->bufpos+=len;

return C_OK;

}

在这个函数中再次确保了 client 对象的 reply 链表长度不能大于 0( if 判断,如果不满足条件,则退出该函数 )。reply 链表存储的是待发送的应答命令。应答命令被存储在 client 对象的 buf 字段中,其长度被记录在 bufpos 字段中。buf 字段是一个固定大小的字节数组:

typedef struct client {

uint64_t id; /* Client incremental unique ID. */

int fd; /* Client socket. */

redisDb *db; /* Pointer to currently SELECTed DB. */

robj *name; /* As set by CLIENT SETNAME. */

sds querybuf; /* Buffer we use to accumulate client queries. */

sds pending_querybuf; /* If this is a master, this buffer represents the

yet not applied replication stream that we

are receiving from the master. */

//省略部分字段...

/* Response buffer */

int bufpos;

char buf[PROTO_REPLY_CHUNK_BYTES];

} client;

PROTO_REPLY_CHUNK_BYTES 在 redis 中的定义是 16*1024 ,也就是说应答命令数据包最长是 16k 。

回到我们上面提的命令:*1$7COMMAND ,通过 lookupCommand 解析之后得到 command 命令,在 GDB 中显示如下:

2345 c->cmd = c->lastcmd = lookupCommand(c->argv[0]->ptr);

(gdb) n

2346 if (!c->cmd) {

(gdb) p c->cmd

$23 = (struct redisCommand *) 0x742db0

(gdb) p *c->cmd

$24 = {name = 0x4fda67 "command", proc = 0x42d920 , arity = 0, sflags = 0x50dc3e "lt", flags = 1536, getkeys_proc = 0x0, firstkey = 0, lastkey = 0,

keystep = 0, microseconds = 1088, calls = 1}