环境信息
服务器IP | 端口 | 系统版本 | 服务 | 角色 |
127.0.0.1 | 6379 | Ubuntu 16.04.3 LTS | redis 5.0.5 + sentinel | 主节点 |
127.0.0.1 | 6380 | Ubuntu 16.04.3 LTS | redis 5.0.5 + sentinel | 从节点 一 |
127.0.0.1 | 6381 | Ubuntu 16.04.3 LTS | redis 5.0.5 + sentinel | 从节点二 |
哨兵
Redis 的主从复制架构,如果主库由于故障,不能进行自动故障转移,需要人工干预,将从节点提升为主节点,同时还要通知应用更新主节点的地址(如果没有配置VIP的环境下)。 这种方式应用方往往是不能接受的。
使用Redis Sentinel (哨兵模式)架构来解决这个问题,是Redis 高可用实现方案,在实际生产环境中,对提高整个系统的高可用性是非常有帮助的。
建议使用2.8以上版本,也就是V2的Redis Sentinel。
Redis Sentinel是一个分布式架构,其中包含若干个Sentinel节点和Redis数据节点, 每个Sentinel节点会对数据节点和其它Sentinel节点进行监控。当发现节点不可达时,会对节点做下线标识。如果被标识的是主节点,它还会和其它Sentinel节点进行“协商”。 在大多数Sentinel节点都认为主节点不可达时, 他们就会选举出一个Sentinel节点来完成自动故障转移的工作。同时会将这个变化通知给应用方,不需要人工介入处理。
- 监控:
从逻辑结构上看,Sentinel 节点集合会定期对所有节点进行监控,特别是对主节点的故障实现自动转移。 - 通知
Sentinel 节点会将故障转移的结果通知给应用方 - 主节点故障转移
实现从节点晋升为主节点并维护后续正确主从关系 - 配置提供者
在Redis Sentinel结构中,客户端在初始化的时候,连接的是Sentinel集合,并从中获取主节点信息的。 Redis Sentinel包含了苦于Sentinel节点,对于节点故障判断是由多个Sentinel节点共同完成。这样可以有效地防止误判。提供了Sentinel节点的高可用性。 即使某一个节点不可用, 也是不影响整个Sentinel集群。
部署Redis数据节点
- 环境安装
root@ubuntu:/opt# apt-get update
root@ubuntu:/opt# apt-get -y install tcl
root@ubuntu:/opt# apt-get -y install make
root@ubuntu:/opt# tar -zxvf redis-5.0.5.tar.gz
root@ubuntu:/opt# cd redis-5.0.5
root@ubuntu:/opt# make
...
LINK redis-benchmark
INSTALL redis-check-rdb
INSTALL redis-check-aof
Hint: It's a good idea to run make test' ;)
make[1]: Leaving directory '/opt/redis-5.0.5/src'
### 编译测试和安装 ###
#=============================#
root@ubuntu:/opt# make test
...
175 seconds - integration/replication
137 seconds - unit/obuf-limits
131 seconds - unit/memefficiency
\o/ All tests passed without errors!
Cleanup: may take some time... OK
make[1]: Leaving directory '/opt/redis-5.0.5/src'
### 执行安装 ###
#=============================#
root@ubuntu:/opt# make install
...
cd src && make install
make[1]: Entering directory '/opt/redis-5.0.5/src'
Hint: It's a good idea to run 'make test' ;)
INSTALL install
INSTALL install
INSTALL install
INSTALL install
INSTALL install
make[1]: Leaving directory '/opt/redis-5.0.5/src'
- 创建用户
root@ubuntu:/opt# groupadd redis && useradd -r -m -g redis -s /bin/bash redis
root@ubuntu:/opt# cat /etc/passwd | grep redis
redis:x:996:30002::/home/redis:/bin/bash
- 参数配置
vim /etc/redis.conf
vim /etc/redis6380.conf
vim /etc/redis6381.conf
bind 127.0.0.1
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize no
supervised no
pidfile /var/run/redis.pid
loglevel notice
logfile "/data/redis/log/redis.log"
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /data/backup/
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
注:不同的实例需要修改配置文件的端口号,把6379替换即可!如果是从服务器增加以下配置:
replicaof 127.0.0.1 6379
同时修改pid文件, log文件,数据目录文件
- 实例安装
root@ubuntu:/opt# mkdir -p /usr/local/redis/bin
root@ubuntu:/opt# mkdir -p /data/backup/redisbackup
root@ubuntu:/opt# cp ./redis-5.0.5/src/mkreleasehdr.sh /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-benchmark /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-check-aof /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-check-rdb /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-cli /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-sentinel /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-server /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-trib.rb /usr/local/redis/bin/
### 修改权限 ###
root@ubuntu:/opt# chown -R redis:redis /usr/local/redis
root@ubuntu:/opt# chmod -R 750 /usr/local/redis
### 环境变量设置 ###
root@ubuntu:/opt# echo "export PATH=\$PATH:/usr/local/redis/bin" >> /etc/profile
root@ubuntu:/opt# source /etc/profile
### 创建数据目录 ###
root@ubuntu:/opt# mkdir -p /data/{redis,redis6380,redis6381}/{data,log}
root@ubuntu:/opt# chown -R redis:redis /data/redis*
root@ubuntu:/opt# chmod -R 750 /data/redis*
### 备份目录 ###
root@ubuntu:/opt# mkdir -p /data/backup/{redisbackup,redisbackup6380,redisbackup6381}
root@ubuntu:/opt# chown -R redis:redis /data/backup/redis*
root@ubuntu:/opt# chmod -R 750 /data/backup/redis*
### 启动 ###
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-server /etc/redis.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-server /etc/redis6380.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-server /etc/redis6381.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# ps -ef | grep redis
### 检查主从关系(主库) ###
root@ubuntu:/opt# redis-cli -h 127.0.0.1 -p 6379 info replication
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6380,state=online,offset=1071,lag=1
slave1:ip=127.0.0.1,port=6381,state=online,offset=1071,lag=1
master_replid:99c3ac87530433cdb96e79f023f8e906795fdb47
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1071
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1071
### 检查主从关系(从库) ###
root@ubuntu:/opt# redis-cli -h 127.0.0.1 -p 6380 info replication
# Replication
role:slave
master_host:127.0.0.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:9
master_sync_in_progress:0
slave_repl_offset:1197
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:99c3ac87530433cdb96e79f023f8e906795fdb47
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1197
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1197
- 部署Sentinel节点
三个Sentinel节点的部署方法是完全一致的(端口不同)
vim /etc/redis-sentinel.conf
vim /etc/redis-sentinel26380.conf
vim /etc/redis-sentinel26381.conf
port 26379
daemonize no
pidfile /var/run/redis-sentinel.pid
logfile "/data/redis-sentinel/log/redis-sentinel.log"
dir /data/redis-sentinel
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
其它两个节点修改端口号,pid文件,工作目录、日志文件即可
### 创建目录 ###
root@ubuntu:/opt# mkdir -p /data/{redis-sentinel,redis-sentinel26380,redis-sentinel26381}/log
root@ubuntu:/opt# chmod 640 /etc/redis-sentinel*.conf
root@ubuntu:/opt# chown redis:redis /etc/redis-sentinel*.conf
### 启动sentinel节点 ###
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-sentinel /etc/redis-sentinel.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-sentinel /etc/redis-sentinel26380.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-sentinel /etc/redis-sentinel26381.conf 2>&1 >/dev/null & "
### 日志信息 ###
root@ubuntu:/opt# tail -f redis-sentinel.log
63921:X 15 Oct 2020 16:53:25.155 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
63921:X 15 Oct 2020 16:53:25.155 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=63921, just started
63921:X 15 Oct 2020 16:53:25.155 # Configuration loaded
63921:X 15 Oct 2020 16:53:25.157 * Running mode=sentinel, port=26379.
63921:X 15 Oct 2020 16:53:25.159 # Sentinel ID is 379dfea2acf3274c26a741255c7bc8cb16855f72
63921:X 15 Oct 2020 16:53:25.159 # +monitor master mymaster 127.0.0.1 6379 quorum 2
63921:X 15 Oct 2020 16:53:25.160 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 16:53:25.161 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 16:53:47.573 * +sentinel sentinel 002f6011948fe525666efa45ce4d9dafea5259b9 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 16:53:52.349 * +sentinel sentinel 864c102b9c432a4b3f0ef0b65d25ef623881d326 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
- 故障切换
关闭主库127.0.0.1的6379端口的实例,查看日志信息
### 关闭实例 ###
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-cli -h 127.0.0.1 -p 6379 shutdown"
### 日志信息 ###
root@ubuntu:/opt# tail -f redis.log
63193:M 15 Oct 2020 16:03:09.074 * Background saving started by pid 63689
63689:C 15 Oct 2020 16:03:09.081 * DB saved on disk
63689:C 15 Oct 2020 16:03:09.081 * RDB: 4 MB of memory used by copy-on-write
63193:M 15 Oct 2020 16:03:09.174 * Background saving terminated with success
63193:M 15 Oct 2020 17:26:17.401 # User requested shutdown...
63193:M 15 Oct 2020 17:26:17.401 * Calling fsync() on the AOF file.
63193:M 15 Oct 2020 17:26:17.401 * Saving the final RDB snapshot before exiting.
63193:M 15 Oct 2020 17:26:17.408 * DB saved on disk
63193:M 15 Oct 2020 17:26:17.408 * Removing the pid file.
63193:M 15 Oct 2020 17:26:17.408 # Redis is now ready to exit, bye bye...
### sentinel 日志信息 ###
root@ubuntu:/opt# tail -f redis-sentinel.log
63921:X 15 Oct 2020 17:26:48.450 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:48.450 * +slave-reconf-sent slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:48.652 # -odown master mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:49.214 * +slave-reconf-inprog slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:49.214 * +slave-reconf-done slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:49.275 # +failover-end master mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:49.275 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
63921:X 15 Oct 2020 17:26:49.275 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
63921:X 15 Oct 2020 17:26:49.276 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
63921:X 15 Oct 2020 17:27:19.345 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
- 查看主从关系
### 新主库 ###
root@ubuntu:/opt# /usr/local/redis/bin/redis-cli -h 127.0.0.1 -p 6380 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6381,state=online,offset=436333,lag=1
master_replid:a7c8a213e2fa443639d78fa2ea8e3a9e327fb682
master_replid2:99c3ac87530433cdb96e79f023f8e906795fdb47
master_repl_offset:436466
second_repl_offset:391168
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:436466
### 从库 ###
root@ubuntu:/opt# /usr/local/redis/bin/redis-cli -h 127.0.0.1 -p 6381 info replication
# Replication
role:slave
master_host:127.0.0.1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:437663
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:a7c8a213e2fa443639d78fa2ea8e3a9e327fb682
master_replid2:99c3ac87530433cdb96e79f023f8e906795fdb47
master_repl_offset:437663
second_repl_offset:391168
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:99
repl_backlog_histlen:437565
- 重启宕机的老主库
重新启动老主库, 会自动加入集群,成为从库
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-cli -h 127.0.0.1 -p 6379 info replication "
# Replication
role:slave
master_host:127.0.0.1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:687696
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:a7c8a213e2fa443639d78fa2ea8e3a9e327fb682
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:687696
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:678400
repl_backlog_histlen:9297
root@ubuntu:/opt# /usr/local/redis/bin/redis-cli -h 127.0.0.1 -p 26379 info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=127.0.0.1:6380,slaves=2,sentinels=3
- 配置
Redis安装目录有一个Sentinel.conf案例,是默认的Sentinel节点配置文件,其中port和dir分别代表Sentinel节点的端口和工作目录,Sentinel节点会定期监控主节点,所以从配置上必然也会有所体现,quorum 表示在判定主节点最终不可达所需要的票数。 但实际上Sentinel节点会对所有节点进行监控,在Sentinel节点的配置中没有看到有关从节点和其余Sentinel节点的配置。
启动后,Sentinel会重写“Generated by CONFIG REWRITE”,Sentinel自动发现从节点,其它Sentinel节点,去掉了默认配置,sentinel parallel-syncs mymaster 1、sentinel failover-timeout mymaster 180000参数,如:
port 26379
daemonize no
pidfile "/var/run/redis-sentinel.pid"
logfile "/data/redis-sentinel/log/redis-sentinel.log"
dir "/data/redis-sentinel"
sentinel myid 379dfea2acf3274c26a741255c7bc8cb16855f72
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 127.0.0.1 6380 2
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
# Generated by CONFIG REWRITE
protected-mode no
### 这里发现两个 slave 节点 ###
sentinel known-replica mymaster 127.0.0.1 6381
sentinel known-replica mymaster 127.0.0.1 6379
### 发现两个 sentinel 节点 ###
sentinel known-sentinel mymaster 127.0.0.1 26381 864c102b9c432a4b3f0ef0b65d25ef623881d326
sentinel known-sentinel mymaster 127.0.0.1 26380 002f6011948fe525666efa45ce4d9dafea5259b9
sentinel current-epoch 1
quorum 参数用于故障发现和判定, 如将 quorum 配置为2,代表至少有2个sentinel节点认为主节点不可达,那么这个不可达的判定才是更客观的。对于quorum 设置的越小,那么达到下线的条件就越宽松,反之则越严格。 一般建议将其设置为Sentinel节点的一半加1,比如三个节点,则设置为2更合理些。
sentinel down-after-milliseconds mymaster 30000 每个节点都通过定期发送ping命令来判断Redis数据节点和其余Sentinel节点是否可达,如果超过了配置的时间且没有回复。则判定节点不可达,times单位是毫秒,就是超过该时间,这个配置是对节点失败判定的重要依据。
当Sentinel节点集合对主节点故障判定达成一致时,Sentinel领导者节点会做故障转移操作,选出新的主节点,原来的从节点会向新的主节点发起复制操作。
sentinel parallel-syncs mymaster 1 就是用来限制在一次故障转移之后,每次向新的主节点发起复制操作的从节点个数。 如果这个参数配置大,那么多个从节点定向新的主节点同时发起复制操作,会对主节点机器造成一定的网络和磁盘IO开销。
- 总结
Sentinel 本质上是一个特殊的Redis节点,所以也可以通过info命令来查询它的相关信息,当三个Sentinel节点都启动后,至此Redis Sentinel搭建完毕,需要注意以下几点:
(1)生产环境中建议Redis节点安装在不同的物理机(或ECS主机上),Sentinel节点可以和Redis节点部署在同一台。 这样只需要三台物理机即可。
(2)Redis Sentinel中的数据节点和普通的Redis数据节点在配置上没有任何区别,只不过是增加了一些Sentinel节点对它们进行监控。