一、研究问题1:redis 配置文件设置了选项timeout后,是否会导致大量close_wait状态连接

注:redis配置文件timeout选项说明如下

# Close the connection after a client is idle for N seconds (0 to disable)
timeout 60

(1)窗口1: 为了进行抓包测试,可以通过python  manage.py shell 创建redis的客户端连接,指令如下:

>>> from django.core.cache import caches
>>> caches['default']
>>> cl = caches['default'].client
>>> con = cl.connect()
>>> connection1 = con.connection_pool.make_connection()
>>> connection1.connect()
>>> connection1._sock.getsockname()

                这里我们就建立了tcp连接了,通过getsockname函数获取连接的端口号为55310

('127.0.0.1', 55310)

 

此时我们通过netstat -anpl |grep 55310可以看到tcp连接已经建立:

/home $ netstat -anpl |grep 55310
netstat: showing only processes with your user ID
tcp        0      0 127.0.0.1:6379          127.0.0.1:55310         ESTABLISHED 16693/redis-server
tcp        7      0 127.0.0.1:55310         127.0.0.1:6379          ESTABLISHED 2905/python

(2)另开一个ssh界面进行抓包,我们称之为窗口2

抓包情况如下:

/home $ sudo ./tcpdump -i lo port 55310 -nn

                       这里使用-i lo 是因为容器中服务端和客户端是通过回环网络进行通信的

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes

由于没有任何请求,所以此时抓不到任何包

 

此时在窗口1  python shell中执行ping 请求:

>>> connection1.send_command("PING", check_health=False)

可以看到窗口2抓包界面出现了ping包:

/home $ sudo ./tcpdump -i lo port 55310 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
15:45:30.142212 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [P.], seq 4047013069:4047013083, ack 3246937596, win 342, options [nop,nop,TS val 257508686 ecr 257484577], length 14
15:45:30.142386 IP 127.0.0.1.6379 > 127.0.0.1.55310: Flags [P.], seq 1:8, ack 14, win 342, options [nop,nop,TS val 257508687 ecr 257508686], length 7
15:45:30.142402 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [.], ack 8, win 342, options [nop,nop,TS val 257508687 ecr 257508687], length 0

 

(3) 由于我们设置了timeout 为60s, 过了60s 后发现redis_server会主动断开连接:

/home $ sudo ./tcpdump -i lo port 55310 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
15:45:30.142212 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [P.], seq 4047013069:4047013083, ack 3246937596, win 342, options [nop,nop,TS val 257508686 ecr 257484577], length 14
15:45:30.142386 IP 127.0.0.1.6379 > 127.0.0.1.55310: Flags [P.], seq 1:8, ack 14, win 342, options [nop,nop,TS val 257508687 ecr 257508686], length 7
15:45:30.142402 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [.], ack 8, win 342, options [nop,nop,TS val 257508687 ecr 257508687], length 0
15:46:31.584184 IP 127.0.0.1.6379 > 127.0.0.1.55310: Flags [F.], seq 8, ack 14, win 342, options [nop,nop,TS val 257570128 ecr 257508687], length 0
15:46:31.623320 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [.], ack 9, win 342, options [nop,nop,TS val 257570168 ecr 257570128], length 0

 

此时再查看下tcp连接的状态:

/home $ netstat -anpl |grep 55310
netstat: showing only processes with your user ID
tcp        0      0 127.0.0.1:6379          127.0.0.1:55310          FIN_WAIT2   -
tcp        8      0 127.0.0.1:55310         127.0.0.1:6379          CLOSE_WAIT  2905/python

但是过一会后又变成下面状态,也就是FIN_WAIT2状态的连接消失了,但是CLOSE_WAIT状态的连接不会消失

/home $ netstat -anpl |grep 55310
netstat: showing only processes with your user ID
tcp        8      0 127.0.0.1:55310         127.0.0.1:6379          CLOSE_WAIT  2905/pytho

这里可以解释下原因:

    redis_server作为连接断开的发起者,首先发出FIN请求给redis_client,redis_client收到FIN请求后,返回ACK给redis_server,此时

redis_server侧状态变成FIN_WAIT2 , 处于FIN_WAIT2   状态的连接会等待服务端调用close,然后发出FIN请求。但是由于redis_client一直占用连接,并没有

发送FIN请求给redis_server, 所以会短暂的处于FIN_WAIT2   状态。为什么是短暂的处于该状态呢?因为处于该状态的连接无法再发送和接收数据,所以不能持续太久,

linux会关闭这个状态的连接,持续时间由tcp_fin_timeout决定。

    redis_client返回ACK后,此时redis_client连接状态处于CLOSE_WAIT,处于该状态的连接是可以持续很久的,因为该连接状态可以在半关闭状态收发数据

(调用shutdown来关闭连接会出现这种半关闭状态),所以linux 没有限制CLOSE_WAIT的持续时间.

 

解决办法:

给连接加上保活机制,cache配置中添加keep-alive参数:

CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": "redis://%s:%s" % (url_ipv6_sup(REDIS_HOST), REDIS_PORT),
        "TIMEOUT": 60 * 60,
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
            "PASSWORD": REDIS_PASSWD,
            "CONNECTION_POOL_KWARGS": {
                'retry_on_timeout': True,
                'health_check_interval': REDIS_HEALTH_CHECK_INTERVAL,
                'socket_timeout': 120,
                'socket_keepalive': True,
                'socket_keepalive_options': {
                    socket.TCP_KEEPIDLE: REDIS_TCP_KEEPIDLE,
                    socket.TCP_KEEPINTVL: REDIS_TCP_KEEPINTVL,  常数值为10min
                    socket.TCP_KEEPCNT: REDIS_TCP_KEEPCNT
                }
            }
        }
    }
}

加上保活机制后,可以看到10min后redis_client发了个keep-alive包,但是由于redis_server已经关闭了连接,所以返回RSET请求,最后再查看redis_client连接,发现redis_client处于

CLOSE_WAIT连接已经没了

/home $ sudo ./tcpdump -i lo port 55310 -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
15:45:30.142212 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [P.], seq 4047013069:4047013083, ack 3246937596, win 342, options [nop,nop,TS val 257508686 ecr 257484577], length 14
15:45:30.142386 IP 127.0.0.1.6379 > 127.0.0.1.55310: Flags [P.], seq 1:8, ack 14, win 342, options [nop,nop,TS val 257508687 ecr 257508686], length 7
15:45:30.142402 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [.], ack 8, win 342, options [nop,nop,TS val 257508687 ecr 257508687], length 0
15:46:31.584184 IP 127.0.0.1.6379 > 127.0.0.1.55310: Flags [F.], seq 8, ack 14, win 342, options [nop,nop,TS val 257570128 ecr 257508687], length 0
15:46:31.623320 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [.], ack 9, win 342, options [nop,nop,TS val 257570168 ecr 257570128], length 0
15:56:41.808327 IP 127.0.0.1.55310 > 127.0.0.1.6379: Flags [.], ack 9, win 342, options [nop,nop,TS val 258180353 ecr 257570128], length 0
15:56:41.808347 IP 127.0.0.1.6379 > 127.0.0.1.55310: Flags [R], seq 3246937604, win 0, length 0