学习目标:

1、了解Redis的Cluster的原理

2、掌握Redis的Cluster的搭建

学习过程:

一、常用命令

1、进入集群客户端,注意我们要加入-c,否则就不是进入集群模式,我们可以登陆任何一台redis服务器

[root@localhost src]# redis-cli -c  -h 192.168.137.101 -p 7379
192.168.137.101:7379> get name
-> Redirected to slot [5798] located at 192.168.137.102:6379
(nil)
192.168.137.102:6379> set name liubao
OK
192.168.137.102:6379> get name
"liubao"
192.168.137.102:6379> set age 10
-> Redirected to slot [741] located at 192.168.137.101:6379
OK
192.168.137.101:6379> get age
"10"

从上面的命令可以看到,当操作一个新的key时,会先计算其对应的槽,并得到对应的所在的服务器,如果当前的服务器不在会帮你直接Redirected到对应的服务器的,但是当你再操作同一key是就不会了,应该客户端已经有记录该key所在的槽和服务器了,不需要再计算一次,你退出后再试一下又会计算了,测试代码如下:

[root@localhost src]# redis-cli -c  -h 192.168.137.101 -p 7379
192.168.137.101:7379> get age
-> Redirected to slot [741] located at 192.168.137.101:6379
"10"
192.1

CLUSTER KEYSLOT <key> 计算键 key 应该被放置在哪个槽上。  

192.168.137.101:6379> CLUSTER KEYSLOT age
(integer) 741

CLUSTER COUNTKEYSINSLOT <slot> 返回槽 slot 目前包含的键值对数量。  

192.168.137.101:6379> CLUSTER COUNTKEYSINSLOT 741
(integer) 1

CLUSTER GETKEYSINSLOT <slot> <count> 返回 count 个 slot 槽中的键。  

2、查看集群信息命令

CLUSTER INFO 打印集群的信息  

192.168.137.102:6379> CLUSTER INFO 
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:3
cluster_stats_messages_ping_sent:3498
cluster_stats_messages_pong_sent:3701
cluster_stats_messages_meet_sent:3
cluster_stats_messages_sent:7202
cluster_stats_messages_ping_received:3698
cluster_stats_messages_pong_received:3501
cluster_stats_messages_meet_received:3
cluster_stats_messages_received:7202

CLUSTER NODES 列出集群当前已知的所有节点(node),以及这些节点的相关信息,这里就是前面再建立集群是的信息,可以通过这个命令获得对应的nodeid信息。

192.168.137.102:6379> CLUSTER NODES

CLUSTER ADDSLOTS <slot> [slot ...] 将一个或多个槽(slot)指派(assign)给当前节点。  这个命令仅在cluster 模式下生效,而且作用于redis集群以下操作:(1)创建新集群时,ADDSLOTS用于主节点初始化分配可用的hash slots。(2)为了修复有未分配slots的坏集群。命令需要谨慎使用

CLUSTER DELSLOTS <slot> [slot ...] 移除一个或多个槽对当前节点的指派,命令需要谨慎需要,删除后集群状态报错了,该槽上面的key也查询不到了。

192.168.137.101:6379> CLUSTER DELSLOTS 741
OK
192.168.137.101:6379> cluster info
cluster_state:fail
192.168.137.101:6379> get age
(error) CLUSTERDOWN Hash slot not served
192.168.137.101:6379> CLUSTER ADDSLOTS 741
OK
192.168.137.101:6379> get age
"10"

其他的一些集群命令,大家可以自己上redis的网址查看,或者看一下下面这个网址,这里就不啰嗦了。

http://www.redis.cn/commands/cluster-addslots.html

为了实验redis集群的高可用,我们可以尝试一下一些实验,比如停了部分的服务。我们先停了192.168.137.101:6379服务器,查看一下:

192.168.137.102:7379> cluster nodes
b3e0c8dbfe867a66925df6ca59f8956b262428ec 192.168.137.103:6379@16379 master - 0 1555491433000 5 connected 10923-16383
2de36d7b52cf5a3c373a83fcaf8587a31da1e4e9 192.168.137.101:7379@17379 slave b3e0c8dbfe867a66925df6ca59f8956b262428ec 0 1555491435027 5 connected
33f7db815b9d7219e4a46c078ed2b7f0441c78ff 192.168.137.101:6379@16379 master,fail - 1555491371289 1555491368000 1 disconnected
09d1f418c8cebad84ae8034a56f8e4f2d5f9e400 192.168.137.102:7379@17379 myself,master - 0 1555491432000 7 connected 0-5460

再看一下:

192.168.137.102:7379> cluster info
cluster_state:ok

可以看到进群状况也是OK的,虽然检测到一个已经fail了。但是并没有影响任何使用情况。重新启动服务,看一下,原来的192.168.137.101:6379已经变成了slave服务了。

192.168.137.102:7379> cluster nodes
b3e0c8dbfe867a66925df6ca59f8956b262428ec 192.168.137.103:6379@16379 master - 0 1555492066000 5 connected 10923-16383
2de36d7b52cf5a3c373a83fcaf8587a31da1e4e9 192.168.137.101:7379@17379 slave b3e0c8dbfe867a66925df6ca59f8956b262428ec 0 1555492067374 5 connected
33f7db815b9d7219e4a46c078ed2b7f0441c78ff 192.168.137.101:6379@16379 slave 09d1f418c8cebad84ae8034a56f8e4f2d5f9e400 0 1555492068000 7 connected

5、添加节点

1,分别192.168.137.101和192.168.137.102两个服务器新启动一个redis实例

添加配置文件 和启动文件

[root@localhost ~]# cd /etc/
[root@localhost etc]# cp redis7379.conf redis8379.conf && sed -i "s/7379/8379/g" redis8379.conf
[root@localhost etc]#  cd init.d/
[root@localhost init.d]# cp redis7379 redis8379 && sed -i "s/7379/8379/g" redis8379
[root@localhost init.d]# chmod 755 redis8379
[root@localhost init.d]# chkconfig --add redis8379
[root@localhost init.d]# firewall-cmd --zone=public --add-port=8379/tcp --permanent
[root@localhost init.d]# firewall-cmd --zone=public --add-port=18379/tcp --permanent
[root@localhost init.d]# firewall-cmd --reload

启动  redis实例

[root@localhost etc]# /etc/init.d/redis8379 start
[root@localhost init.d]# ps -ef | grep 'redis'
root      8805     1  0 17:07 ?        00:00:29 /usr/local/redis/bin/redis-server 192.168.137.101:6379 [cluster]
root      8841     1  0 17:13 ?        00:00:28 /usr/local/redis/bin/redis-server 192.168.137.101:7379 [cluster]
root      9929     1  0 20:14 ?        00:00:00 /usr/local/redis/bin/redis-server 192.168.137.101:8379 [cluster]

2,添加主节点

[root@localhost ~]# cd redis-4.0.13/src/
[root@localhost src]# ruby redis-trib.rb add-node 192.168.137.101:8379 192.168.137.101:6379 
....
>>> Send CLUSTER MEET to node 192.168.137.101:8379 to make it join the cluster.
[OK] New node added correctly.

说明:

192.168.137.101:8379  是新增的redis实例

192.168.137.101:6379  原来集群任意一个节点就行了。

检测是否已经加入了:

192.168.137.101:6379> cluster nodes
a4959f3478ff7101c635f59d35010e94ebba259e 192.168.137.103:7379@17379 slave 3536daf72fa8d037140c35e90f7ac0375421c476 0 1555503520000 6 connected
2de36d7b52cf5a3c373a83fcaf8587a31da1e4e9 192.168.137.101:7379@17379 slave b3e0c8dbfe867a66925df6ca59f8956b262428ec 0 1555503522000 5 connected
3536daf72fa8d037140c35e90f7ac0375421c476 192.168.137.102:6379@16379 master - 0 1555503520000 3 connected 5461-10922
8b5b4114f06561feb4c14dc75af4b83c29d2adc3 192.168.137.101:8379@18379 master - 0 1555503522000 0 connected
b3e0c8dbfe867a66925df6ca59f8956b262428ec 192.168.137.103:6379@16379 master - 0 1555503522879 5 connected 10923-16383
09d1f418c8cebad84ae8034a56f8e4f2d5f9e400 192.168.137.102:7379@17379 master - 0 1555503523888 7 connected 0-5460
33f7db815b9d7219e4a46c078ed2b7f0441c78ff 192.168.137.101:6379@16379 myself,slave 09d1f418c8cebad84ae8034a56f8e4f2d5f9e400 0 1555503521000 1 connected

留意上面的8379那一行就可以了。我们可以获得该节点的node id信息,目前还没有slave信息,最后面也没有槽位信息。

 

3,为刚才的主节点添加添加从节点

[root@localhost src]# ruby redis-trib.rb add-node --slave --master-id 8b5b4114f06561feb4c14dc75af4b83c29d2adc3 192.168.137.102:8379 192.168.137.101:6379
....
>>> Send CLUSTER MEET to node 192.168.137.102:8379 to make it join the cluster.
Waiting for the cluster to join.
>>> Configure node as replica of 192.168.137.101:8379.
[OK] New node added correctly.

注释:

--slave,表示添加的是从节点

--master-id 8b5b4114f06561feb4c14dc75af4b83c29d2adc3 ,主节点的node id,从上面

192.168.137.102:8379   新的从节点

192.168.137.101:6379 原来集群任意一个节点就行了。

再看一下集群信息,8379已经有从节点了,但是还没有任何槽位信息

192.168.137.101:6379> cluster nodes
0784a8aa8f86eb060788cfeeed24fc24ba7a9a55 192.168.137.102:8379@18379 slave 8b5b4114f06561feb4c14dc75af4b83c29d2adc3 0 1555503960000 0 connected
a4959f3478ff7101c635f59d35010e94ebba259e 192.168.137.103:7379@17379 slave 3536daf72fa8d037140c35e90f7ac0375421c476 0 1555503959747 6 connected
2de36d7b52cf5a3c373a83fcaf8587a31da1e4e9 192.168.137.101:7379@17379 slave b3e0c8dbfe867a66925df6ca59f8956b262428ec 0 1555503958000 5 connected
3536daf72fa8d037140c35e90f7ac0375421c476 192.168.137.102:6379@16379 master - 0 1555503961561 3 connected 5461-10922
8b5b4114f06561feb4c14dc75af4b83c29d2adc3 192.168.137.101:8379@18379 master - 0 1555503960754 0 connected
b3e0c8dbfe867a66925df6ca59f8956b262428ec 192.168.137.103:6379@16379 master - 0 1555503961763 5 connected 10923-16383
09d1f418c8cebad84ae8034a56f8e4f2d5f9e400 192.168.137.102:7379@17379 master - 0 1555503959000 7 connected 0-5460
33f7db815b9d7219e4a46c078ed2b7f0441c78ff 192.168.137.101:6379@16379 myself,slave 09d1f418c8cebad84ae8034a56f8e4f2d5f9e400 0 1555503960000 1 connected

4、给新节点分配槽位

[root@localhost src]# ruby redis-trib.rb reshard 192.168.137.101:8379 #任意一个集群节点
....
How many slots do you want to move (from 1 to 16384)? 3000  给新节点添加多少个槽位,这里我输入了3000
What is the receiving node ID? 8b5b4114f06561feb4c14dc75af4b83c29d2adc3 #接收主节点的node id,就是前面使用过的node id
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:all  #槽位是从所有的节点中移动过去,还是把指定的节点移动过去,为了更均匀这里把所有的节点移过去。
...
Do you want to proceed with the proposed reshard plan (yes/no)? yes # 确认
...
[ERR] Calling MIGRATE: ERR Syntax error, try CLIENT

报了一个错误,这个是redis4.0.x的redis-trib.rb的错误,开源代码有时候就是这样的了,有点小错误不可避免的。

也就是说,只需要将redis-trib.rb文件中原来的
                source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:keys,*keys])
                source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])
改为
                source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,"replace",:keys,*keys])
                source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])

然后在运行:

[root@localhost src]# ruby redis-trib.rb fix 192.168.137.101:8379

#重新运行
[root@localhost src]# ruby redis-trib.rb reshard 192.168.137.101:8379

再看看,已经有槽位信息了,留意8379的槽位信息,比较特殊。

192.168.137.101:6379> cluster nodes
0784a8aa8f86eb060788cfeeed24fc24ba7a9a55 192.168.137.102:8379@18379 slave 8b5b4114f06561feb4c14dc75af4b83c29d2adc3 0 1555509021000 9 connected
a4959f3478ff7101c635f59d35010e94ebba259e 192.168.137.103:7379@17379 slave 3536daf72fa8d037140c35e90f7ac0375421c476 0 1555509022000 6 connected
2de36d7b52cf5a3c373a83fcaf8587a31da1e4e9 192.168.137.101:7379@17379 slave b3e0c8dbfe867a66925df6ca59f8956b262428ec 0 1555509022313 5 connected
3536daf72fa8d037140c35e90f7ac0375421c476 192.168.137.102:6379@16379 master - 0 1555509021000 3 connected 6756-10922
8b5b4114f06561feb4c14dc75af4b83c29d2adc3 192.168.137.101:8379@18379 master - 0 1555509023320 9 connected 0-1021 5461-6755 10923-11943
b3e0c8dbfe867a66925df6ca59f8956b262428ec 192.168.137.103:6379@16379 master - 0 1555509020300 5 connected 11944-16383
09d1f418c8cebad84ae8034a56f8e4f2d5f9e400 192.168.137.102:7379@17379 master - 0 1555509022000 7 connected 1022-5460
33f7db815b9d7219e4a46c078ed2b7f0441c78ff 192.168.137.101:6379@16379 myself,slave 09d1f418c8cebad84ae8034a56f8e4f2d5f9e400 0 1555509019000 1 connected

测试一下数据是否可用:

192.168.137.101:6379> get name
-> Redirected to slot [5798] located at 192.168.137.101:8379
"liubao"

没有问题了。

6、删除节点

如果是删除从节点,那么可以直接删除就可以了。先使用cluster nodes查看要删除的从节点信息,然后执行命令:

[root@localhost src]# ruby redis-trib.rb del-node 192.168.137.102:8379 '0784a8aa8f86eb060788cfeeed24fc24ba7a9a55'

如果删除主节点,检查一下主节点是否从节点,如果有,需要先删除或者把从节点转移到其他主节点,然后再把主节点的slot,重新分配给其他节点,最后再删除该主节点。

下面我们把之前添加的8379的主节点删除,从节点指向其他主节点

1、先使用cluster nodes知道到8379的从节点是哪一个?留意:192.168.137.102:8379@18379 slave

2、先登录这个redis实例,然后使用命令改变其主节点:

192.168.137.102:8379> cluster replicate b3e0c8dbfe867a66925df6ca59f8956b262428ec #nodes id是另外一台master 的redis实例的node id

再使用cluster nodes看看就发现192.168.137.102:8379的主节点意见变了。

3、再使用ruby redis-trib.rb reshard命令把主节点192.168.137.101:8379的solt全部分配出去,这里我们把前面分给这个主节点的3000个solt全部分出去,分别执行3次,每次1000,平均的分回去

  

How many slots do you want to move (from 1 to 16384)? 1000   #移动多少个solt
What is the receiving node ID? 3536daf72fa8d037140c35e90f7ac0375421c476  #接收solt的node id
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:8b5b4114f06561feb4c14dc75af4b83c29d2adc3   #源node id,这里当然就是我们即将要删除的节点的node id
Source node #1:done

继续

How many slots do you want to move (from 1 to 16384)? 1000
What is the receiving node ID? b3e0c8dbfe867a66925df6ca59f8956b262428ec
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:8b5b4114f06561feb4c14dc75af4b83c29d2adc3
Source node #2:done

查看cluster node看看该主节点还有没有solt,如果没有了,可以删除了

[root@localhost src]# ruby redis-trib.rb del-node  192.168.137.101:8379 8b5b4114f06561feb4c14dc75af4b83c29d2adc3
>>> Removing node 8b5b4114f06561feb4c14dc75af4b83c29d2adc3 from cluster 192.168.137.101:8379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

再使用cluster node 就意见没有该节点了。检查一下原来的数据还有没有:

192.168.137.101:6379> get name
-> Redirected to slot [5798] located at 192.168.137.102:6379
"liubao"

好了,一切正常。可以用fix或者check等命令检查一下各个节点是否有问题:

[root@localhost src]# ruby redis-trib.rb fix  192.168.137.101:6379
>>> Performing Cluster Check (using node 192.168.137.101:6379)