环境信息

服务器IP

端口

系统版本

服务

角色

127.0.0.1

6379

Ubuntu 16.04.3 LTS

redis 5.0.5 + sentinel

主节点

127.0.0.1

6380

Ubuntu 16.04.3 LTS

redis 5.0.5 + sentinel

从节点 一

127.0.0.1

6381

Ubuntu 16.04.3 LTS

redis 5.0.5 + sentinel

从节点二

哨兵

Redis 的主从复制架构,如果主库由于故障,不能进行自动故障转移,需要人工干预,将从节点提升为主节点,同时还要通知应用更新主节点的地址(如果没有配置VIP的环境下)。 这种方式应用方往往是不能接受的。

使用Redis Sentinel (哨兵模式)架构来解决这个问题,是Redis 高可用实现方案,在实际生产环境中,对提高整个系统的高可用性是非常有帮助的。

建议使用2.8以上版本,也就是V2的Redis Sentinel。

Redis Sentinel是一个分布式架构,其中包含若干个Sentinel节点和Redis数据节点, 每个Sentinel节点会对数据节点和其它Sentinel节点进行监控。当发现节点不可达时,会对节点做下线标识。如果被标识的是主节点,它还会和其它Sentinel节点进行“协商”。 在大多数Sentinel节点都认为主节点不可达时, 他们就会选举出一个Sentinel节点来完成自动故障转移的工作。同时会将这个变化通知给应用方,不需要人工介入处理。

  • 监控:
    从逻辑结构上看,Sentinel 节点集合会定期对所有节点进行监控,特别是对主节点的故障实现自动转移。
  • 通知
    Sentinel 节点会将故障转移的结果通知给应用方
  • 主节点故障转移
    实现从节点晋升为主节点并维护后续正确主从关系
  • 配置提供者
    在Redis Sentinel结构中,客户端在初始化的时候,连接的是Sentinel集合,并从中获取主节点信息的。 Redis Sentinel包含了苦于Sentinel节点,对于节点故障判断是由多个Sentinel节点共同完成。这样可以有效地防止误判。提供了Sentinel节点的高可用性。 即使某一个节点不可用, 也是不影响整个Sentinel集群。

部署Redis数据节点

  • 环境安装
root@ubuntu:/opt# apt-get update
root@ubuntu:/opt# apt-get -y install tcl 
root@ubuntu:/opt# apt-get -y install make 
root@ubuntu:/opt# tar -zxvf redis-5.0.5.tar.gz
root@ubuntu:/opt# cd redis-5.0.5
root@ubuntu:/opt# make
...
    LINK redis-benchmark
    INSTALL redis-check-rdb
    INSTALL redis-check-aof

Hint: It's a good idea to run make test' ;)

make[1]: Leaving directory '/opt/redis-5.0.5/src'

### 编译测试和安装 ###
#=============================# 
root@ubuntu:/opt# make test
...
  175 seconds - integration/replication
  137 seconds - unit/obuf-limits
  131 seconds - unit/memefficiency

\o/ All tests passed without errors!

Cleanup: may take some time... OK
make[1]: Leaving directory '/opt/redis-5.0.5/src'

### 执行安装  ### 
#=============================# 
root@ubuntu:/opt# make install
...
cd src && make install
make[1]: Entering directory '/opt/redis-5.0.5/src'

Hint: It's a good idea to run 'make test' ;)

    INSTALL install
    INSTALL install
    INSTALL install
    INSTALL install
    INSTALL install
make[1]: Leaving directory '/opt/redis-5.0.5/src'
  • 创建用户
root@ubuntu:/opt# groupadd redis && useradd -r -m -g redis -s /bin/bash redis
root@ubuntu:/opt# cat /etc/passwd | grep redis                               
redis:x:996:30002::/home/redis:/bin/bash
  • 参数配置
    vim /etc/redis.conf
    vim /etc/redis6380.conf
    vim /etc/redis6381.conf
bind 127.0.0.1 
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300

daemonize no
supervised no
pidfile /var/run/redis.pid
loglevel notice
logfile "/data/redis/log/redis.log"
databases 16
always-show-logo yes

save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /data/backup/

replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100

lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no

appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes

lua-time-limit 5000

slowlog-log-slower-than 10000
slowlog-max-len 128

latency-monitor-threshold 0

notify-keyspace-events ""

hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes

注:不同的实例需要修改配置文件的端口号,把6379替换即可!如果是从服务器增加以下配置:
replicaof 127.0.0.1 6379

同时修改pid文件, log文件,数据目录文件

  • 实例安装
root@ubuntu:/opt# mkdir -p /usr/local/redis/bin 
root@ubuntu:/opt# mkdir -p /data/backup/redisbackup
root@ubuntu:/opt# cp ./redis-5.0.5/src/mkreleasehdr.sh  /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-benchmark  /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-check-aof  /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-check-rdb  /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-cli        /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-sentinel   /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-server     /usr/local/redis/bin/
root@ubuntu:/opt# cp ./redis-5.0.5/src/redis-trib.rb    /usr/local/redis/bin/

###  修改权限 ### 
root@ubuntu:/opt# chown -R redis:redis /usr/local/redis
root@ubuntu:/opt# chmod -R 750 /usr/local/redis

### 环境变量设置 ### 
root@ubuntu:/opt# echo "export PATH=\$PATH:/usr/local/redis/bin" >> /etc/profile
root@ubuntu:/opt# source /etc/profile    

### 创建数据目录 ###
root@ubuntu:/opt# mkdir -p /data/{redis,redis6380,redis6381}/{data,log}
root@ubuntu:/opt# chown -R redis:redis  /data/redis* 
root@ubuntu:/opt# chmod -R 750 /data/redis*

### 备份目录 ###
root@ubuntu:/opt# mkdir -p /data/backup/{redisbackup,redisbackup6380,redisbackup6381}
root@ubuntu:/opt# chown -R redis:redis /data/backup/redis* 
root@ubuntu:/opt# chmod -R 750 /data/backup/redis*

### 启动 ### 
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-server /etc/redis.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-server /etc/redis6380.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-server /etc/redis6381.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# ps -ef | grep redis 


### 检查主从关系(主库) ### 
root@ubuntu:/opt# redis-cli  -h  127.0.0.1 -p 6379 info replication
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6380,state=online,offset=1071,lag=1
slave1:ip=127.0.0.1,port=6381,state=online,offset=1071,lag=1
master_replid:99c3ac87530433cdb96e79f023f8e906795fdb47
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1071
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1071


### 检查主从关系(从库) ###
root@ubuntu:/opt# redis-cli  -h  127.0.0.1 -p 6380 info replication  
# Replication
role:slave
master_host:127.0.0.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:9
master_sync_in_progress:0
slave_repl_offset:1197
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:99c3ac87530433cdb96e79f023f8e906795fdb47
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1197
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1197
  • 部署Sentinel节点
    三个Sentinel节点的部署方法是完全一致的(端口不同)
    vim /etc/redis-sentinel.conf
    vim /etc/redis-sentinel26380.conf
    vim /etc/redis-sentinel26381.conf
port 26379
daemonize no
pidfile /var/run/redis-sentinel.pid
logfile "/data/redis-sentinel/log/redis-sentinel.log"
dir /data/redis-sentinel
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes

其它两个节点修改端口号,pid文件,工作目录、日志文件即可

### 创建目录 ### 
root@ubuntu:/opt# mkdir -p /data/{redis-sentinel,redis-sentinel26380,redis-sentinel26381}/log
root@ubuntu:/opt# chmod 640 /etc/redis-sentinel*.conf 
root@ubuntu:/opt# chown redis:redis /etc/redis-sentinel*.conf 

### 启动sentinel节点 ### 
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-sentinel /etc/redis-sentinel.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-sentinel /etc/redis-sentinel26380.conf 2>&1 >/dev/null & "
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-sentinel /etc/redis-sentinel26381.conf 2>&1 >/dev/null & "

### 日志信息 ###
root@ubuntu:/opt# tail -f redis-sentinel.log 
63921:X 15 Oct 2020 16:53:25.155 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
63921:X 15 Oct 2020 16:53:25.155 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=63921, just started
63921:X 15 Oct 2020 16:53:25.155 # Configuration loaded
63921:X 15 Oct 2020 16:53:25.157 * Running mode=sentinel, port=26379.
63921:X 15 Oct 2020 16:53:25.159 # Sentinel ID is 379dfea2acf3274c26a741255c7bc8cb16855f72
63921:X 15 Oct 2020 16:53:25.159 # +monitor master mymaster 127.0.0.1 6379 quorum 2
63921:X 15 Oct 2020 16:53:25.160 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 16:53:25.161 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 16:53:47.573 * +sentinel sentinel 002f6011948fe525666efa45ce4d9dafea5259b9 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 16:53:52.349 * +sentinel sentinel 864c102b9c432a4b3f0ef0b65d25ef623881d326 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
  • 故障切换
    关闭主库127.0.0.1的6379端口的实例,查看日志信息
### 关闭实例 ### 
root@ubuntu:/opt# su - redis -c  "/usr/local/redis/bin/redis-cli -h  127.0.0.1 -p 6379 shutdown"

### 日志信息 ### 
root@ubuntu:/opt# tail -f redis.log 
63193:M 15 Oct 2020 16:03:09.074 * Background saving started by pid 63689
63689:C 15 Oct 2020 16:03:09.081 * DB saved on disk
63689:C 15 Oct 2020 16:03:09.081 * RDB: 4 MB of memory used by copy-on-write
63193:M 15 Oct 2020 16:03:09.174 * Background saving terminated with success
63193:M 15 Oct 2020 17:26:17.401 # User requested shutdown...
63193:M 15 Oct 2020 17:26:17.401 * Calling fsync() on the AOF file.
63193:M 15 Oct 2020 17:26:17.401 * Saving the final RDB snapshot before exiting.
63193:M 15 Oct 2020 17:26:17.408 * DB saved on disk
63193:M 15 Oct 2020 17:26:17.408 * Removing the pid file.
63193:M 15 Oct 2020 17:26:17.408 # Redis is now ready to exit, bye bye...

### sentinel 日志信息 ### 
root@ubuntu:/opt# tail -f redis-sentinel.log 
63921:X 15 Oct 2020 17:26:48.450 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:48.450 * +slave-reconf-sent slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:48.652 # -odown master mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:49.214 * +slave-reconf-inprog slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:49.214 * +slave-reconf-done slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:49.275 # +failover-end master mymaster 127.0.0.1 6379
63921:X 15 Oct 2020 17:26:49.275 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
63921:X 15 Oct 2020 17:26:49.275 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
63921:X 15 Oct 2020 17:26:49.276 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
63921:X 15 Oct 2020 17:27:19.345 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
  • 查看主从关系
### 新主库 ### 
root@ubuntu:/opt# /usr/local/redis/bin/redis-cli  -h  127.0.0.1  -p 6380 info replication   
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6381,state=online,offset=436333,lag=1
master_replid:a7c8a213e2fa443639d78fa2ea8e3a9e327fb682
master_replid2:99c3ac87530433cdb96e79f023f8e906795fdb47
master_repl_offset:436466
second_repl_offset:391168
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:436466

### 从库 ### 
root@ubuntu:/opt# /usr/local/redis/bin/redis-cli  -h  127.0.0.1  -p 6381 info replication  
# Replication
role:slave
master_host:127.0.0.1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:437663
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:a7c8a213e2fa443639d78fa2ea8e3a9e327fb682
master_replid2:99c3ac87530433cdb96e79f023f8e906795fdb47
master_repl_offset:437663
second_repl_offset:391168
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:99
repl_backlog_histlen:437565
  • 重启宕机的老主库
    重新启动老主库, 会自动加入集群,成为从库
root@ubuntu:/opt# su - redis -c "/usr/local/redis/bin/redis-cli -h 127.0.0.1 -p 6379 info replication " 
# Replication
role:slave
master_host:127.0.0.1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:687696
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:a7c8a213e2fa443639d78fa2ea8e3a9e327fb682
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:687696
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:678400
repl_backlog_histlen:9297

root@ubuntu:/opt# /usr/local/redis/bin/redis-cli -h 127.0.0.1 -p 26379 info Sentinel    
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=127.0.0.1:6380,slaves=2,sentinels=3
  • 配置
    Redis安装目录有一个Sentinel.conf案例,是默认的Sentinel节点配置文件,其中port和dir分别代表Sentinel节点的端口和工作目录,Sentinel节点会定期监控主节点,所以从配置上必然也会有所体现,quorum 表示在判定主节点最终不可达所需要的票数。 但实际上Sentinel节点会对所有节点进行监控,在Sentinel节点的配置中没有看到有关从节点和其余Sentinel节点的配置。
    启动后,Sentinel会重写“Generated by CONFIG REWRITE”,Sentinel自动发现从节点,其它Sentinel节点,去掉了默认配置,sentinel parallel-syncs mymaster 1、sentinel failover-timeout mymaster 180000参数,如:
port 26379
daemonize no
pidfile "/var/run/redis-sentinel.pid"
logfile "/data/redis-sentinel/log/redis-sentinel.log"
dir "/data/redis-sentinel"
sentinel myid 379dfea2acf3274c26a741255c7bc8cb16855f72
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 127.0.0.1 6380 2
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
# Generated by CONFIG REWRITE
protected-mode no
### 这里发现两个 slave 节点 ### 
sentinel known-replica mymaster 127.0.0.1 6381
sentinel known-replica mymaster 127.0.0.1 6379
### 发现两个 sentinel 节点 ### 
sentinel known-sentinel mymaster 127.0.0.1 26381 864c102b9c432a4b3f0ef0b65d25ef623881d326
sentinel known-sentinel mymaster 127.0.0.1 26380 002f6011948fe525666efa45ce4d9dafea5259b9
sentinel current-epoch 1

quorum 参数用于故障发现和判定, 如将 quorum 配置为2,代表至少有2个sentinel节点认为主节点不可达,那么这个不可达的判定才是更客观的。对于quorum 设置的越小,那么达到下线的条件就越宽松,反之则越严格。 一般建议将其设置为Sentinel节点的一半加1,比如三个节点,则设置为2更合理些。

sentinel down-after-milliseconds mymaster 30000 每个节点都通过定期发送ping命令来判断Redis数据节点和其余Sentinel节点是否可达,如果超过了配置的时间且没有回复。则判定节点不可达,times单位是毫秒,就是超过该时间,这个配置是对节点失败判定的重要依据。

当Sentinel节点集合对主节点故障判定达成一致时,Sentinel领导者节点会做故障转移操作,选出新的主节点,原来的从节点会向新的主节点发起复制操作。
sentinel parallel-syncs mymaster 1 就是用来限制在一次故障转移之后,每次向新的主节点发起复制操作的从节点个数。 如果这个参数配置大,那么多个从节点定向新的主节点同时发起复制操作,会对主节点机器造成一定的网络和磁盘IO开销。

  • 总结
    Sentinel 本质上是一个特殊的Redis节点,所以也可以通过info命令来查询它的相关信息,当三个Sentinel节点都启动后,至此Redis Sentinel搭建完毕,需要注意以下几点:
    (1)生产环境中建议Redis节点安装在不同的物理机(或ECS主机上),Sentinel节点可以和Redis节点部署在同一台。 这样只需要三台物理机即可。
    (2)Redis Sentinel中的数据节点和普通的Redis数据节点在配置上没有任何区别,只不过是增加了一些Sentinel节点对它们进行监控。