项目场景:

Redis集群8主8从架构

基于上周五的Redis缩容后,发现Redis的2-0 2-1节点的内存占用达到40G,其余节点平均12-15G左右,怀疑是槽位分布不均衡导致。

Redis 7.0.4


问题描述

客户于周一早上反馈说某个Redis节点内存占用高

在青云底座监控内存占用情况:

k8s redis 集群 yaml k8s部署redis cluster_k8s redis 集群 yaml


查看集群槽位分布情况:

redis-cli  -a <password> --cluster check 192.168.3.3:6479

能看到确实存在槽位分布不均衡,其中第二行的节点存在7000个槽位,超1亿的key。

注:图片为迁移后的截图,迁移前的图忘了截了。

k8s redis 集群 yaml k8s部署redis cluster_k8s_02


进去集群内部查看2节点的错误日志:

kubectl exec -it -n redis-ns redis2-0 bash
tail -1000f /var/log/redis/redis.log
存在一个明显的报错:

Client id=189074 addr=xx.x.xx.xx:12315 laddr=xx.xx.xx.xx:6379 fd=341 name= age=591 idle=591 flags=S db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=13092 omem=2
68438368 tot-mem=268440168 events=r cmd=psync user=default redir=-1 resp=2 scheduled to be closed ASAP for overcoming of output buffer limits.

接着查看Redis的dir目录,发现存在大量的temp临时文件没有转储成dump.rdb
root@2-0# ls -lrth
total 52G
-rw-r--r-- 1 root root  20G Sep 21 15:08 temp-51034.rdb
-rw-r--r-- 1 root root 6.1G Sep 22 11:06 temp-1543.rdb
-rw-r--r-- 1 root root 900M Sep 22 12:14 temp-1042.rdb
-rw-r--r-- 1 root root 2.0G Sep 22 12:45 temp-209.rdb
-rw-r--r-- 1 root root 7.9G Sep 22 15:53 temp-10658.rdb
-rw-r--r-- 1 root root 3.1G Sep 23 00:50 temp-8.rdb
drwxr-xr-x 2 root root 4.0K Sep 27 02:01 conf
-rw-r--r-- 1 root root  12G Sep 27 02:12 dump.rdb

原因分析:

初步怀疑是Redis集群在k8s平台上做缩容集群规模时,槽位的迁移由脚本来做的,原理是将即将移除的节点的槽位迁移到任意保留的节点,然后从集群中移除该节点,也有可能是因为该集群的keys前缀存在很大相似性导致。


解决方案:

解决第一个问题,日志中的告警overcoming of output buffer limits.

去掉slave的输出缓冲区限制,直接在命令行配置然后进行配置文件重写,避免重启后失效。

redis-cli -h xxx -p 6379 -a <password> config set client-output-buffer-limit 'slave 0 0 0'
redis-cli -h xxx -p 6379 -a <password>config rewrite
集群的每个节点都需要做.

解决第二个问题,槽位分配不均衡的问题。

方案一: 直接在集群内执行rebalance

redis-cli -a <password> --cluster rebalance 192.168.33.147:6379
该操作较为花费时间,建议在晚上8 9点之后进行。

方案二: 手动进行Reshared

redis-cli -a <password> --cluster reshard 192.168.33.151:6379 --cluster-from fd9d994b67b91xasdff293b363dadcfe617a19gs1 --cluster-to 01981f970cfe668e1bfedfasb35175f85axde --cluster-slots 50 --cluster-yes
from后边跟 从哪个节点移出来的ID, to后边跟想移到哪个节点的ID, cluster-slots后边填多少个槽位 每次迁移。

方案三: 单个槽位迁移

redis-cli -h 192.168.33.34 -p 6379 -a <password> CLUSTER SETSLOT 2209 IMPORTING fd9d994b67b9193ba9f293b363dadcfe617a191d  某个槽位归属于某个节点
redis-cli  -a <password> CLUSTER SETSLOT 2209 MIGRATING fd9d994b67b9193ba9f293b363dadcfe617a191d  槽位迁移于某个节点
redis-cli  -a <password> CLUSTER SETSLOT 2209 NODE 8ee0a098c32642388b79912ba94c2e45f1d8af98 
2209 是槽位号。

本次采用方案2,手动Reshared,这里还遇到了一个其它问题,该节点的当前内存占用达到90% 即43G,而当前pod的内存limits只有48G,迁移大量槽位很容易OOM被kill。
另外容器的liveiness和Readiness的探测也是个问题,当迁移大量槽位时可能存在某些时刻无法响应探针中的脚本,导致超时而被k8s认为当前Redis实例已经阻塞,被kill重启,而我的进程其实还在跑只是负载比较大而已。

鉴于这个问题还是决定用脚本来跑,每次少量迁移然后循环。

先试用命令,获取集群的Master节点的ID,
root@2-0:/data# redis-cli  -a <password> cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
fd9d994b67b9193ba9f293b363dadcfasdd 192.168.33.83:4002@4003 myself,master - 0 1695784061000 200 connected 14271-157
8ee0a098c32642388b79912ba94c2asd8af98 192.168.33.164:4002@4003 master - 0 1695784062535 707 connected 2185-220
89583cf380e1ebbfed285f73e99dbasd8723 192.168.33.14:4002@4003 slave 82e53e5943aebasdf4c5c8437189d22df7f00d19 0 1695784064940 709 connected
82e53e5943aeb656c4c5c8437189d2sd0d19 192.168.33.42:4002@4003 master - 0 1695784064038 709 connected 2305 2369 2598-2608 2626-2632 2683-2692 
dc7d6c68316df58aef8d66a43b201asd2019 192.168.33.85:4002@4003 slave b3532bb6asdf744cdbedd73aca8420f4dccec3c 0 1695784065542 711 connected
2936bb1563e0cb217be111696ca505asd9e2 192.168.33.143:4002@4003 slave 2b87f710asdfdfdef4dd7d94d5ccff9075d5677 0 1695784067947 705 connected
2cd7ea25ceb9c4f1a69ca5a1054cc8fads187 192.168.33.98:4002@4003 master - 0 1695784063938 706 connected 2643-2652 2713-2722 2763-2772 2833-2842 29
af1d2b145450fbe60a86c4bdaf5fasdf2533 192.168.33.167:4002@4003 slave fd9d994b67asdfba9f293b363dadcfe617a191d 0 1695784065942 200 connected
94dca25c9fcc678f64dff4e6a1227asdf32208 192.168.33.172:4002@4003 slave 8ee0a098casdf388b79912ba94c2e45f1d8af98 0 1695784067000 707 connected
b3532bb649903744cdbedd73acsdfcec3c 192.168.33.15:4002@4003 master - 0 1695784063000 711 connected 1093-2184 2210-2233 2333-2368 2619-2625 2663-26
b091d2dacd57678c852dea4ceed95aasdf7dd 192.168.33.115:4002@4003 slave 01981f970cfe6asdfbfe1586db1b35175f85a84e 0 1695784063000 708 connected
2b87f71075f67dfdef4dd7d94d5ccfasdfd5677 192.168.33.12:4002@4003 master - 0 1695784066945 705 connected 2653-2662 2703-2712 2753-2762 2823-2832 304
d5c6ce77b803ed1279270ca120fc9basdfa7aa70 192.168.33.129:4002@4003 slave 2cd7ea25ceb9c4asdf9ca5a1054cc8687fce1187 0 1695784063000 706 connected
1d48dc7a8e63e95d62808848745cac80asdf74e96 192.168.33.140:4002@4003 slave 362b02d8b0d4e11dasdf075afbdab3f52cc8fd0 0 1695784063000 710 connected
01981f970cfe668e1bfe1586db1b35asdfa84e 192.168.33.136:4002@4003 master - 0 1695784064000 708 connected 2270-2304 2417 2733-2742 2783-2792 2853-286
362b02d8b0d4e11da4ee0075afbdab3asdf8fd0 192.168.33.147:4002@4003 master - 0 1695784062000 710 connected 0-1092 2234-2269 2311-233
截取命令返回的内容中的Master节点的ID值存入node.txt中,并去掉迁出的那个节点的ID


root@7-0:/sc# cat node.txt
2b87f71075f67dfdef4dd7d94d5ccff9075d5677
2cd7ea25ceb9c4f1a69ca5a1054cc8687fce1187
8ee0a098c32642388b79912ba94c2e45f1d8af98
01981f970cfe668e1bfe1586db1b35175f85a84e
82e53e5943aeb656c4c5c8437189d22df7f00d19
362b02d8b0d4e11da4ee0075afbdab3f52cc8fd0
b3532bb649903744cdbedd73aca8420f4dccec3c
这个文件中只有7个Master节点,删掉的那个Master节点是想从它迁出槽位。


#!/bin/bash
for i in `seq 0 10`;do
mapfile -t ips < "node.txt"
for node in "${ips[@]}"; do
		redis-cli -a '<password>' --cluster reshard 192.168.33.98:6379 --cluster-from fd9d994b67b9193ba9f293b363dadcfe617a191d --cluster-to $node --cluster-slots 50 --cluster-yes
		#这里的from是填那个槽位最多的节点ID
		echo "上次迁移的返回值为:$?"
		if [ $? -eq 1 ]; then
		echo "上一个命令返回值为1,中断脚本执行。"

			exit 1
		else
		echo "上一个命令返回值不为1,继续执行脚本。"
		fi
        echo "节点$node执行迁移完毕,正在迁移下一个节点,3s后开始,。。。。"
		echo "当前是第$i个循环!"
        sleep 3
done
echo "第$i遍执行完毕,,正在sleep 3s。。。。。。"
sleep 3
done


./start_move_slots.sh > run_time.log &

tail -100f run_time.log 
Moving slot 2922 from xx.xx.xx.83:4002 to xx.xx.xx.164:4002: 
......................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................
.........................
上次迁移的返回值为:0
上一个命令返回值不为1,继续执行脚本。
节点8ee0a098c32642388b79912ba94c2e45f1d8af98执行迁移完毕,正在迁移下一个节点,5s后开始,。。。。
当前是第0个循环!

.............................
....
....
....

耐心等待即可。

最后执行以下Redis check

redis-cli  -a <password> --cluster check 192.168.33.83:6379
192.168.33.147:4002 (362b02d8...) -> 35995693 keys | 1861 slots | 1 slaves.
192.168.33.15:4002 (b3532bb6...) -> 36296991 keys | 1859 slots | 1 slaves.
192.168.33.12:4002 (2b87f710...) -> 47025546 keys | 2454 slots | 1 slaves.
192.168.33.136:4002 (01981f97...) -> 35934705 keys | 1858 slots | 1 slaves.
192.168.33.42:4002 (82e53e59...) -> 42896192 keys | 2315 slots | 1 slaves.
192.168.33.164:4002 (8ee0a098...) -> 41161639 keys | 2099 slots | 1 slaves.
192.168.33.83:4002 (fd9d994b...) -> 41322824 keys | 1865 slots | 1 slaves.
192.168.33.98:4002 (2cd7ea25...) -> 39681898 keys | 2073 slots | 1 slaves.
...
...
...
...
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

至此迁移完毕。