Rook部署ceph集群

1、版本信息

Rook版本

Ceph版本

K8s版本

Rook Ceph v1.8

Ceph Pacific v16.2.7 stable

v1.20.4

2、Rook简介

Rook is an open source cloud-native storage orchestrator for Kubernetes, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments。(Rook 是 Kubernetes 的开源云原生存储编排器,为各种存储解决方案提供平台、框架和支持,以与云原生环境进行原生集成。)

<u>Rook是一个自管理的分布式存储编排系统,可以为Kubernetes提供便利的存储解决方案,Rook本身并不提供存储,而是在Kubernetes和存储之间提供适配层,简化存储系统的部署和维护工作。目前,主要支持存储系统包括但不限于Ceph(主推)、Cassandra、NFS。</u>

项目地址:https://github.com/rook

3、Rook在k8s集群中不是Ceph

前提:

1) K8s集群,1.16版本+
2) K8s至少3个工作节点
3) 每个工作节点有一块未使用的硬盘
4) Rook仅支持部署Ceph Nautilus(14.2.22)以上版本

部署:

1)下载并解压
[root@master01 ~]# wget https://github.com/rook/rook/archive/refs/tags/v1.8.0.zip
[root@master01 ~]# unzip v1.8.0.zip
[root@master01 ~]# cd rook-1.8.0/deploy/examples/

2)部署所要用到的镜像
[root@master01 examples]# cat images.txt
k8s.gcr.io/sig-storage/csi-attacher:v3.3.0
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0
k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
k8s.gcr.io/sig-storage/csi-resizer:v1.3.0
k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
quay.io/ceph/ceph:v16.2.7
quay.io/cephcsi/cephcsi:v3.4.0
quay.io/csiaddons/volumereplication-operator:v0.1.0
rook/ceph:v1.8.0

3)部署Rook
[root@master01 examples]# kubectl apply -f crds.yaml -f common.yaml -f operator.yaml

4)部署Ceph集群
[root@master01 examples]# kubectl apply -f cluster.yaml

5)部署Rook Ceph工具
[root@master01 examples]# kubectl apply -f toolbox.yaml

6)部署Ceph UI
kubectl apply -f dashboard-external-https.yaml

7)获取Ceph Dashboard登录密码
[root@master01 examples]# echo `kubectl get secret rook-ceph-dashboard-password -n rook-ceph -o yaml|grep -E [[:space:]]password|awk -F'[ ]+' '{print $3}'`|base64 -d
或者
[root@master01 examples]# kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o \
jsonpath="{['data']['password']}" | base64 -d

4、查看组件运行状态

[root@master01 examples]# kubectl get pod -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-48k2z 3/3 Running 0 6d18h
csi-cephfsplugin-dwt9l 3/3 Running 0 6d18h
csi-cephfsplugin-gcgqt 3/3 Running 0 6d18h
csi-cephfsplugin-km4fw 3/3 Running 0 6d18h
csi-cephfsplugin-pjck4 3/3 Running 0 6d18h
csi-cephfsplugin-provisioner-5886cc894-778zt 6/6 Running 0 6d18h
csi-cephfsplugin-provisioner-5886cc894-f7fw2 6/6 Running 0 6d18h
csi-cephfsplugin-schz9 3/3 Running 0 6d18h
csi-cephfsplugin-vhsjs 3/3 Running 0 6d18h
csi-cephfsplugin-zw52x 3/3 Running 0 6d18h
csi-rbdplugin-4n5qr 3/3 Running 0 6d18h
csi-rbdplugin-4svrg 3/3 Running 0 6d18h
csi-rbdplugin-4t69p 3/3 Running 0 6d18h
csi-rbdplugin-hlhln 3/3 Running 0 6d18h
csi-rbdplugin-kqbvq 3/3 Running 0 6d18h
csi-rbdplugin-ml4m5 3/3 Running 0 6d18h
csi-rbdplugin-provisioner-7c6f58995b-58ffp 6/6 Running 0 6d18h
csi-rbdplugin-provisioner-7c6f58995b-c6rwj 6/6 Running 0 6d18h
csi-rbdplugin-s6zh2 3/3 Running 0 6d18h
csi-rbdplugin-tdbml 3/3 Running 0 6d18h
rook-ceph-crashcollector-work01-855c5494cd-p6lv6 1/1 Running 0 6d18h
rook-ceph-crashcollector-work02-5cc8c579b8-wnjjq 1/1 Running 0 5d22h
rook-ceph-crashcollector-work03-7784fc9666-brpkw 1/1 Running 0 6d18h
rook-ceph-crashcollector-work04-945895fc5-ghzcf 1/1 Running 0 6d18h
rook-ceph-crashcollector-work05-745495b9b7-xtvvx 1/1 Running 0 5d22h
rook-ceph-crashcollector-work06-58dd77474c-c8pmh 1/1 Running 0 6d18h
rook-ceph-crashcollector-work07-78c44c645b-rtkhb 1/1 Running 0 6d18h
rook-ceph-crashcollector-work08-6946cdc4f9-nsgfz 1/1 Running 0 6d18h
rook-ceph-mds-myfs-a-5c945c6989-9bqzn 1/1 Running 0 6d18h
rook-ceph-mds-myfs-b-9dd946b4-8dhv2 1/1 Running 0 6d18h
rook-ceph-mgr-a-5c6cb6f9cc-5phvk 1/1 Running 0 6d18h
rook-ceph-mon-a-bdb5d477-qgvnh 1/1 Running 0 6d18h
rook-ceph-mon-b-5878864997-wkrbg 1/1 Running 0 6d18h
rook-ceph-mon-c-55f99b7f46-nqmqr 1/1 Running 0 6d18h
rook-ceph-operator-c4f79bf67-bfrlf 1/1 Running 0 6d18h
rook-ceph-osd-0-6798c998c7-prtmr 1/1 Running 0 5d22h
rook-ceph-osd-1-64ddc88c74-25fpw 1/1 Running 0 5d22h
rook-ceph-osd-2-76fbd645f5-v74dx 1/1 Running 0 5d22h
rook-ceph-osd-3-86b6797cb6-n4znt 1/1 Running 0 5d22h
rook-ceph-osd-4-57d6d7f6c4-scmlz 1/1 Running 0 5d22h
rook-ceph-osd-5-6c56bb444d-pqxxk 1/1 Running 0 5d22h
rook-ceph-osd-6-8449fdf7bf-nwln4 1/1 Running 0 5d22h
rook-ceph-osd-7-5cf5699df8-267rx 1/1 Running 0 5d22h
rook-ceph-osd-prepare-work01-5rdwg 0/1 Completed 0 123m
rook-ceph-osd-prepare-work02-gwr6j 0/1 Completed 0 123m
rook-ceph-osd-prepare-work03-pnqnr 0/1 Completed 0 123m
rook-ceph-osd-prepare-work04-ppj6n 0/1 Completed 0 123m
rook-ceph-osd-prepare-work05-5gmqf 0/1 Completed 0 123m
rook-ceph-osd-prepare-work06-m9255 0/1 Completed 0 123m
rook-ceph-osd-prepare-work07-5c4v6 0/1 Completed 0 123m
rook-ceph-osd-prepare-work08-hg66b 0/1 Completed 0 123m
rook-ceph-tools-6979f5784f-275ps 1/1 Running 0 6d18h
[root@master01 examples]# kubectl get svc -n rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-cephfsplugin-metrics ClusterIP 10.0.0.113 <none> 8080/TCP,8081/TCP 6d18h
csi-rbdplugin-metrics ClusterIP 10.0.0.234 <none> 8080/TCP,8081/TCP 6d18h
rook-ceph-mgr ClusterIP 10.0.0.77 <none> 9283/TCP 6d18h
rook-ceph-mgr-dashboard ClusterIP 10.0.0.111 <none> 8443/TCP 6d18h
rook-ceph-mgr-dashboard-external-https NodePort 10.0.0.101 <none> 8443:32766/TCP 6d18h
rook-ceph-mon-a ClusterIP 10.0.0.78 <none> 6789/TCP,3300/TCP 6d18h
rook-ceph-mon-b ClusterIP 10.0.0.183 <none> 6789/TCP,3300/TCP 6d18h
rook-ceph-mon-c ClusterIP 10.0.0.172 <none> 6789/TCP,3300/TCP 6d18h

5、查看ceph集群状态

[root@master01 ~]# kubectl exec -it rook-ceph-tools-6979f5784f-275ps -n rook-ceph -- bash
[rook@rook-ceph-tools-6979f5784f-275ps /]$ ceph -s
cluster:
id: ed7ef2a6-c571-41fc-a660-27718b142011
health: HEALTH_OK

services:
mon: 3 daemons, quorum a,b,c (age 6d)
mgr: a(active, since 6d)
mds: 1/1 daemons up, 1 hot standby
osd: 8 osds: 8 up (since 5d), 8 in (since 6d)

data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 1.39k objects, 1.9 GiB
usage: 12 GiB used, 148 GiB / 160 GiB avail
pgs: 97 active+clean

io:
client: 1.7 KiB/s rd, 3.6 KiB/s wr, 1 op/s rd, 0 op/s wr

[rook@rook-ceph-tools-6979f5784f-275ps /]$ ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 160 GiB 148 GiB 12 GiB 12 GiB 7.48
TOTAL 160 GiB 148 GiB 12 GiB 12 GiB 7.48

--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 0 B 0 0 B 0 46 GiB
replicapool 2 32 879 MiB 291 2.6 GiB 1.85 46 GiB
myfs-metadata 3 32 419 MiB 257 1.2 GiB 0.89 46 GiB
myfs-data0 4 32 648 MiB 845 1.9 GiB 1.37 46 GiB

5、登录Ceph Dashboard UI

浏览器访问:https://node-ip:32766

登录页面

Rook部署ceph集群及简单使用_Rook

集群状态

Rook部署ceph集群及简单使用_Ceph_02

主机UI

Rook部署ceph集群及简单使用_Ceph_03

pool UI

Rook部署ceph集群及简单使用_Ceph_04

6、部署rbd和cephfs存储支持

rbd:
[root@k8s-master1 examples]# kubectl apply -f csi/rbd/storageclass.yaml # 创建一个名为replicapool的rbd pool

cephfs:
[root@k8s-master1 examples]# kubectl apply -f filesystem.yaml
[root@k8s-master1 examples]# kubectl apply -f csi/cephfs/storageclass.yaml

[root@master01 ~]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
rook-ceph-block rook-ceph.rbd.csi.ceph.com Delete Immediate true 6d18h
rook-cephfs rook-ceph.cephfs.csi.ceph.com Delete Immediate true 6d18h

k8s部署redis集群

本次使用ceph作为redis集群的后端存储,在k8s上部署一个redis集群和部署一个普通应用没有什么太大的区别,但需要注意下面几个问题:

  1. Redis是一个有状态应用

这也是部署有状态时我们最需要注意的问题,当我们把redis以pod的形式部署在k8s中时,每个pod里缓存的数据都是不一样的,而且pod的IP是会随时变化,这时候如果使用普通的deployment和service来部署redis-cluster就会出现很多问题,因此需要改用StatefulSet + Headless Service来解决

  1. 数据持久化

redis虽然是基于内存的缓存,但还是需要依赖于磁盘进行数据的持久化,以便服务出现问题重启时可以恢复已经缓存的数据

  1. Headless Service

headless Service就是没有指定Cluster IP的Service,相应的,在k8s的dns映射里,Headless Service的解析结果不是一个Cluster IP,而是它所关联的所有Pod的IP列表

  1. StatefulSet

​StatefulSet​​​是k8s中专门用于解决有状态应用部署的一种资源,总的来说可以认为它是​​Deployment/RC​​的一个变种,它有以下几个特性:

  1. StatefulSet管理的每个Pod都有唯一的文档/网络标识,并且按照数字规律生成,而不是像Deployment中那样名称和IP都是随机的(比如StatefulSet名字为redis,那么pod名就是redis-0, redis-1 ...)
  1. StatefulSet中ReplicaSet的启停顺序是严格受控的,操作第N个pod一定要等前N-1个执行完才可以
  2. StatefulSet中的Pod采用稳定的持久化储存,并且对应的PV不会随着Pod的删除而被销毁

另外需要说明的是,StatefulSet必须要配合Headless Service使用,它会在Headless Service提供的DNS映射上再加一层,最终形成精确到每个pod的域名映射,格式如下:

$(podname).$(headless service name)

1、yaml配置文件

redis 配置文件使用 configmap 方式进行挂载

redis.conf 配置文件

chage-pod-ip.sh 脚本的作用用于当 redis 集群某 pod 重建后 Pod IP 发生变化,在 /data/nodes.conf 中将新的 Pod IP 替换原 Pod IP。不然集群会出问题。

[root@master01 ~]# cd redis-cluster/
[root@master01 redis-cluster]# ll
total 12
-rw-r--r-- 1 root root 2372 Apr 6 16:59 redis-cluster-configmap.yml
-rw-r--r-- 1 root root 1853 Apr 12 09:16 redis-cluster.yml


[root@master01 redis-cluster]# cat redis-cluster-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-cluster
namespace: redis-cluster
data:
chage-pod-ip.sh: |
#!/bin/sh
CLUSTER_CONFIG="/data/nodes.conf"
if [ -f ${CLUSTER_CONFIG} ]; then
if [ -z "${POD_IP}" ]; then
echo "Unable to determine Pod IP address!"
exit 1
fi
echo "Updating my IP to ${POD_IP} in ${CLUSTER_CONFIG}"
sed -i.bak -e '/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/'${POD_IP}'/' ${CLUSTER_CONFIG}
fi
exec "$@"

redis.conf: |
bind 0.0.0.0
protected-mode yes
port 6379
tcp-backlog 2048
timeout 0
tcp-keepalive 300
daemonize no
supervised no
pidfile /var/run/redis.pid
loglevel notice
logfile /data/redis.log
databases 16
always-show-logo yes
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /data
masterauth liheng@2022
replica-serve-stale-data yes
replica-read-only no
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
requirepass liheng@2022
maxclients 32768
maxmemory-policy allkeys-lru
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
cluster-enabled yes
cluster-config-file /data/nodes.conf
cluster-node-timeout 15000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes


[root@master01 redis-cluster]# cat redis-cluster.yml 
---
apiVersion: v1
kind: Service
metadata:
namespace: redis-cluster
name: redis-cluster
spec:
clusterIP: None
ports:
- port: 6379
targetPort: 6379
name: client
- port: 16379
targetPort: 16379
name: gossip
selector:
app: redis-cluster
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
namespace: redis-cluster
name: redis-cluster
spec:
serviceName: redis-cluster
replicas: 6
selector:
matchLabels:
app: redis-cluster
template:
metadata:
labels:
app: redis-cluster
spec:
terminationGracePeriodSeconds: 20
# pod反亲和配置
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- redis-cluster
topologyKey: kubernetes.io/hostname
containers:
- name: redis
image: redis:5.0.13
ports:
- containerPort: 6379
name: client
- containerPort: 16379
name: gossip
command: ["/etc/redis/chage-pod-ip.sh", "redis-server", "/etc/redis/redis.conf"]
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- name: conf
mountPath: /etc/redis/
readOnly: false
- name: data
mountPath: /data
readOnly: false
volumes:
- name: conf
configMap:
name: redis-cluster
defaultMode: 0755
# 使用ceph集群文件存储动态提供pv配置段
volumeClaimTemplates:
- metadata:
name: data
spec:
storageClassName: "rook-cephfs"
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi

2、部署

[root@master01 redis-cluster]# kubectl apply -f redis-cluster-configmap.yml
[root@master01 redis-cluster]# kubectl apply -f redis-cluster.yml
[root@master01 redis-cluster]# kubectl get pod -n redis-cluster -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
redis-cluster-0 1/1 Running 0 6d8h 10.244.137.142 work03 <none> <none>
redis-cluster-1 1/1 Running 0 6d8h 10.244.21.77 work08 <none> <none>
redis-cluster-2 1/1 Running 0 6d8h 10.244.142.79 work06 <none> <none>
redis-cluster-3 1/1 Running 0 6d8h 10.244.46.11 work04 <none> <none>
redis-cluster-4 1/1 Running 0 6d8h 10.244.6.198 work05 <none> <none>
redis-cluster-5 1/1 Running 0 6d8h 10.244.39.71 work07 <none> <none>
redis-cluster-6 1/1 Running 0 34h 10.244.205.216 work01 <none> <none>
redisinsight-86f5f67fd8-cwt6f 1/1 Running 0 5d6h 10.244.46.21 work04 <none> <none>
[root@master01 redis-cluster]# kubectl get svc -n redis-cluster
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-cluster ClusterIP None <none> 6379/TCP,16379/TCP 6d8h
查看pvc及pv
[root@master01 redis-cluster]# kubectl get pvc,pv -n redis-cluster
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-redis-cluster-0 Bound pvc-fae1deeb-c171-4651-b51a-ea0c8322f5de 10Gi RWX rook-cephfs 6d8h
persistentvolumeclaim/data-redis-cluster-1 Bound pvc-72f90a5a-69e9-4fae-9ce5-1bec96159324 10Gi RWX rook-cephfs 6d8h
persistentvolumeclaim/data-redis-cluster-2 Bound pvc-c2a79b04-94c8-44bd-a636-bc4f06bdf1a9 10Gi RWX rook-cephfs 6d8h
persistentvolumeclaim/data-redis-cluster-3 Bound pvc-e1df7bb4-03b6-473b-9451-2390a6e7e55e 10Gi RWX rook-cephfs 6d8h
persistentvolumeclaim/data-redis-cluster-4 Bound pvc-adad3818-52a6-4174-a7cd-33b4ad23b300 10Gi RWX rook-cephfs 6d8h
persistentvolumeclaim/data-redis-cluster-5 Bound pvc-7f83f4d4-3e2d-4b97-a81f-715823881c80 10Gi RWX rook-cephfs 6d8h

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-72f90a5a-69e9-4fae-9ce5-1bec96159324 10Gi RWX Delete Bound redis-cluster/data-redis-cluster-1 rook-cephfs 6d8h
pvc-7f83f4d4-3e2d-4b97-a81f-715823881c80 10Gi RWX Delete Bound redis-cluster/data-redis-cluster-5 rook-cephfs 6d8h
pvc-adad3818-52a6-4174-a7cd-33b4ad23b300 10Gi RWX Delete Bound redis-cluster/data-redis-cluster-4 rook-cephfs 6d8h
pvc-c2a79b04-94c8-44bd-a636-bc4f06bdf1a9 10Gi RWX Delete Bound redis-cluster/data-redis-cluster-2 rook-cephfs 6d8h
pvc-e1df7bb4-03b6-473b-9451-2390a6e7e55e 10Gi RWX Delete Bound redis-cluster/data-redis-cluster-3 rook-cephfs 6d8h
pvc-fae1deeb-c171-4651-b51a-ea0c8322f5de 10Gi RWX Delete Bound redis-cluster/data-redis-cluster-0 rook-cephfs 6d8h

3、创建集群

[root@master01 redis-cluster]# kubectl exec -it redis-cluster-0 -n redis-cluster -- bash
root@redis-cluster-0:/data# redis-cli -a liheng@2022 --cluster create \
10.244.137.142:6379 \
10.244.21.77:6379 \
10.244.142.79:6379 \
10.244.46.11:6379 \
10.244.6.198:6379 \
10.244.39.71:6379 \
--cluster-replicas 1

按提示输入"yes"即可完成集群创建

4、验证集群

[root@master01 redis-cluster]# kubectl exec -it redis-cluster-0 -n redis-cluster -- bash
root@redis-cluster-0:/data# redis-cli -c -h redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local -a 'liheng@2022'
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:3
cluster_stats_messages_ping_sent:548994
cluster_stats_messages_pong_sent:564435
cluster_stats_messages_meet_sent:1
cluster_stats_messages_sent:1113430
cluster_stats_messages_ping_received:564435
cluster_stats_messages_pong_received:548995
cluster_stats_messages_received:1113430

redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379> cluster nodes
85f0ed57e875556d76bde8580d0c308fafbcc2dd 10.244.142.79:6379@16379 myself,master - 0 1649851441000 3 connected 10923-16383
14b22757b29c2928903f3a48e61e1c6eb5dc8699 10.244.46.11:6379@16379 slave 85f0ed57e875556d76bde8580d0c308fafbcc2dd 0 1649851439678 4 connected
eb60d6ebd25effbde58c4080366707fccec39371 10.244.21.77:6379@16379 master - 0 1649851442685 2 connected 5461-10922
8b7f544e76dc20a52a54d7e3a0ceea44eb05c6d6 10.244.39.71:6379@16379 slave eb60d6ebd25effbde58c4080366707fccec39371 0 1649851440679 6 connected
0395c891257901c13e23b65f351f858b04eb1437 10.244.137.142:6379@16379 master - 0 1649851442000 1 connected 0-5460
00738a34c0bf91df0b5ec017607070bbf167f9bf 10.244.6.198:6379@16379 slave 0395c891257901c13e23b65f351f858b04eb1437 0 1649851440000 5 connected
可以看到集群状态正常

5、故障测试

删除任意一个 pod(删除名称为redis-cluster-1的pod),可以看到k8s会自动拉起一个同样名称的pod(edis-cluster-1),自动绑定原来的pvc和pv,pod的IP也自动被chage-pod-ip.sh脚本修改为当前pod的IP

[root@master01 redis-cluster]# kubectl get pod -n redis-cluster -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
redis-cluster-0 1/1 Running 0 6d8h 10.244.137.142 work03 <none> <none>
redis-cluster-1 1/1 Running 0 6d8h 10.244.21.77 work08 <none> <none>
redis-cluster-2 1/1 Running 0 6d8h 10.244.142.79 work06 <none> <none>
redis-cluster-3 1/1 Running 0 6d8h 10.244.46.11 work04 <none> <none>
redis-cluster-4 1/1 Running 0 6d8h 10.244.6.198 work05 <none> <none>
redis-cluster-5 1/1 Running 0 6d8h 10.244.39.71 work07 <none> <none>
redisinsight-86f5f67fd8-cwt6f 1/1 Running 0 5d6h 10.244.46.21 work04 <none> <none>

# 删除redis-cluster-1 pod
[root@master01 redis-cluster]# kubectl delete pod redis-cluster-1 -n redis-cluster
pod "redis-cluster-1" deleted

# pod重建
[root@master01 redis-cluster]# kubectl get pod -n redis-cluster -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
redis-cluster-0 1/1 Running 0 6d8h 10.244.137.142 work03 <none> <none>
redis-cluster-1 0/1 ContainerCreating 0 4s <none> work08 <none> <none>
redis-cluster-2 1/1 Running 0 6d8h 10.244.142.79 work06 <none> <none>
redis-cluster-3 1/1 Running 0 6d8h 10.244.46.11 work04 <none> <none>
redis-cluster-4 1/1 Running 0 6d8h 10.244.6.198 work05 <none> <none>
redis-cluster-5 1/1 Running 0 6d8h 10.244.39.71 work07 <none> <none>
redisinsight-86f5f67fd8-cwt6f 1/1 Running 0 5d6h 10.244.46.21 work04 <none> <none>

# pod重建完成,ip地址由原先的10.244.21.77变为了10.244.21.102,且由于设置了pod反亲和,六个redis pod不会被调度到同一台虚拟机
[root@master01 redis-cluster]# kubectl get pod -n redis-cluster -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
redis-cluster-0 1/1 Running 0 6d8h 10.244.137.142 work03 <none> <none>
redis-cluster-1 1/1 Running 0 17s 10.244.21.102 work08 <none> <none>
redis-cluster-2 1/1 Running 0 6d8h 10.244.142.79 work06 <none> <none>
redis-cluster-3 1/1 Running 0 6d8h 10.244.46.11 work04 <none> <none>
redis-cluster-4 1/1 Running 0 6d8h 10.244.6.198 work05 <none> <none>
redis-cluster-5 1/1 Running 0 6d8h 10.244.39.71 work07 <none> <none>
redisinsight-86f5f67fd8-cwt6f 1/1 Running 0 5d6h 10.244.46.21 work04 <none> <none>

[root@master01 redis-cluster]# kubectl exec -it redis-cluster-0 -n redis-cluster -- ls
appendonly.aof dump.rdb nodes.conf redis.log
[root@master01 redis-cluster]# kubectl exec -it redis-cluster-0 -n redis-cluster -- cat nodes.conf
0395c891257901c13e23b65f351f858b04eb1437 10.244.137.142:6379@16379 myself,master - 0 1649852291000 1 connected 0-5460
eb60d6ebd25effbde58c4080366707fccec39371 10.244.21.102:6379@16379 master - 1649852281084 1649852280182 2 disconnected 5461-10922
85f0ed57e875556d76bde8580d0c308fafbcc2dd 10.244.142.79:6379@16379 master - 0 1649852292000 3 connected 10923-16383
00738a34c0bf91df0b5ec017607070bbf167f9bf 10.244.6.198:6379@16379 slave 0395c891257901c13e23b65f351f858b04eb1437 0 1649852292000 5 connected
8b7f544e76dc20a52a54d7e3a0ceea44eb05c6d6 10.244.39.71:6379@16379 slave eb60d6ebd25effbde58c4080366707fccec39371 0 1649852290202 6 connected
14b22757b29c2928903f3a48e61e1c6eb5dc8699 10.244.46.11:6379@16379 slave 85f0ed57e875556d76bde8580d0c308fafbcc2dd 0 1649852292207 4 connected
vars currentEpoch 6 lastVoteEpoch 0

[root@master01 redis-cluster]# kubectl exec -it redis-cluster-1 -n redis-cluster -- cat nodes.conf
00738a34c0bf91df0b5ec017607070bbf167f9bf 10.244.6.198:6379@16379 slave 0395c891257901c13e23b65f351f858b04eb1437 0 1649851416000 5 connected
85f0ed57e875556d76bde8580d0c308fafbcc2dd 10.244.142.79:6379@16379 master - 0 1649851417000 3 connected 10923-16383
0395c891257901c13e23b65f351f858b04eb1437 10.244.137.142:6379@16379 master - 0 1649851414000 1 connected 0-5460
eb60d6ebd25effbde58c4080366707fccec39371 10.244.21.102:6379@16379 myself,master - 0 1649851414000 2 connected 5461-10922
8b7f544e76dc20a52a54d7e3a0ceea44eb05c6d6 10.244.39.71:6379@16379 slave eb60d6ebd25effbde58c4080366707fccec39371 0 1649851417000 6 connected
14b22757b29c2928903f3a48e61e1c6eb5dc8699 10.244.46.11:6379@16379 slave 85f0ed57e875556d76bde8580d0c308fafbcc2dd 0 1649851417565 4 connected
vars currentEpoch 6 lastVoteEpoch 0
[root@master01 redis-cluster]# kubectl exec -it redis-cluster-1 -n redis-cluster -- ls
appendonly.aof dump.rdb nodes.conf nodes.conf.bak redis.log

# 查看集群状态
[root@master01 redis-cluster]# kubectl exec -it redis-cluster-0 -n redis-cluster -- bash
root@redis-cluster-0:/data# redis-cli -c -h redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local -a 'liheng@2022'
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:3
cluster_stats_messages_ping_sent:550420
cluster_stats_messages_pong_sent:565874
cluster_stats_messages_meet_sent:1
cluster_stats_messages_sent:1116295
cluster_stats_messages_ping_received:565874
cluster_stats_messages_pong_received:550417
cluster_stats_messages_received:1116291
redis-cluster-2.redis-cluster.redis-cluster.svc.cluster.local:6379> cluster nodes
85f0ed57e875556d76bde8580d0c308fafbcc2dd 10.244.142.79:6379@16379 myself,master - 0 1649852858000 3 connected 10923-16383
14b22757b29c2928903f3a48e61e1c6eb5dc8699 10.244.46.11:6379@16379 slave 85f0ed57e875556d76bde8580d0c308fafbcc2dd 0 1649852860000 4 connected
eb60d6ebd25effbde58c4080366707fccec39371 10.244.21.102:6379@16379 master - 0 1649852860292 2 connected 5461-10922
8b7f544e76dc20a52a54d7e3a0ceea44eb05c6d6 10.244.39.71:6379@16379 slave eb60d6ebd25effbde58c4080366707fccec39371 0 1649852857000 6 connected
0395c891257901c13e23b65f351f858b04eb1437 10.244.137.142:6379@16379 master - 0 1649852858000 1 connected 0-5460
00738a34c0bf91df0b5ec017607070bbf167f9bf 10.244.6.198:6379@16379 slave 0395c891257901c13e23b65f351f858b04eb1437 0 1649852858290 5 connected
由以上信息可以看到集群状态又恢复了正常