目录
1、安装Helm
2、安装redis-operator
3、配置secret
4、创建一个三主三从的redis集群
5、故障恢复测试
6、kubernetes内部访问redis集群
7、外部访问redis集群
1、安装Helm
$ wget https://get.helm.sh/helm-v3.6.3-linux-amd64.tar.gz
$ tar -zxvf helm-v3.6.3-linux-amd64.tar.gz
$ sudo mv linux-amd64/helm /usr/local/bin/helm
2、安装redis-operator
$ kubectl create namespace redis
# 拉取文件到本地以便自定义配置
$ helm pull ot-helm/redis-operator
$ tar -zxvf redis-operator-0.7.0.tgz
# 编辑redis-operator/values.yaml
$ helm install redis-operator ./redis-operator --namespace redis
helm可以用以下五种方式安装chart,上述用了第三种。
- By chart reference: helm install mymaria example/mariadb
- By path to a packaged chart: helm install mynginx ./nginx-1.2.3.tgz
- By path to an unpacked chart directory: helm install mynginx ./nginx
- By absolute URL: helm install mynginx https://example.com/charts/nginx-1.2.3.tgz
- By chart reference and repo url: helm install --repo https://example.com/charts/ mynginx nginx
检查redis-operator是否正常运行:
$ kubectl get po -n redis
NAME READY STATUS RESTARTS AGE
redis-operator-796bb6d6f6-zv8lc 1/1 Running 0 9m18s
3、配置secret
secret用于密码认证,secret默认名为redis-secret,字段为password,此处密码设置为password
$ kubectl create secret generic redis-secret --from-literal=password=password -n redis
4、创建一个三主三从的redis集群
$ helm pull ot-helm/redis-cluster
$ tar -zxvf redis-cluster-0.7.0.tgz
根据需要配置集群参数:
$ vi redis-cluster/values.yaml
---
redisCluster:
clusterSize: 3
image: quay.io/opstree/redis
tag: v6.2
imagePullPolicy: IfNotPresent
redisSecret:
secretName: redis-secret
secretKey: password
leaderServiceType: ClusterIP
followerServiceType: ClusterIP
resources: {}
# requests:
# cpu: 100m
# memory: 128Mi
# limits:
# cpu: 100m
# memory: 128Mi
externalService:
enabled: true
# annotations:
# foo: bar
serviceType: NodePort
port: 6379
serviceMonitor:
enabled: false
interval: 30s
scrapeTimeout: 10s
namespace: monitoring
redisExporter:
enabled: true
image: quay.io/opstree/redis-exporter
tag: "1.0"
imagePullPolicy: IfNotPresent
resources: {}
# requests:
# cpu: 100m
# memory: 128Mi
# limits:
# cpu: 100m
# memory: 128Mi
# priorityClassName: "-"
nodeSelector: {}
# memory: medium
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: rook-cephfs
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
# selector: {}
securityContext: {}
# runAsUser: 1000
affinity: {}
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: disktype
# operator: In
# values:
# - ssd
以上配置文件主要修改了redisCluster.redisSecret、externalService.serviceType、storageClassName,请根据生产实际情况修改。
安装redis-cluster:
$ helm install redis-cluster ./redis-cluster --namespace redis
验证主从pod的状态:
$ kubectl get po -n redis
NAME READY STATUS RESTARTS AGE
redis-cluster-follower-0 2/2 Running 0 17m
redis-cluster-follower-1 2/2 Running 0 9m54s
redis-cluster-follower-2 2/2 Running 0 8m51s
redis-cluster-leader-0 2/2 Running 0 17m
redis-cluster-leader-1 2/2 Running 0 10m
redis-cluster-leader-2 2/2 Running 0 8m55s
redis-operator-796bb6d6f6-zv8lc 1/1 Running 0 3h12m
持久化文件存储在pvc中:
[app@rook-ceph1 redis-cluster]$ kubectl get pvc -n redis
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
redis-cluster-follower-redis-cluster-follower-0 Bound pvc-c744ff2c-751f-4a9c-bd8f-636d2e3e519f 1Gi RWO rook-cephfs 18m
redis-cluster-follower-redis-cluster-follower-1 Bound pvc-05d3011b-02ac-47f8-acc2-e45bab7d1e62 1Gi RWO rook-cephfs 10m
redis-cluster-follower-redis-cluster-follower-2 Bound pvc-f32a3a33-5d3c-436a-b08b-457fe7e306ad 1Gi RWO rook-cephfs 9m37s
redis-cluster-leader-redis-cluster-leader-0 Bound pvc-b74cbfd3-1d69-45fe-bb46-354ac5851b82 1Gi RWO rook-cephfs 18m
redis-cluster-leader-redis-cluster-leader-1 Bound pvc-6eaf3f14-0810-48d5-b350-a6fa83ed511e 1Gi RWO rook-cephfs 11m
redis-cluster-leader-redis-cluster-leader-2 Bound pvc-58fb501f-7f77-4c95-9398-3efe700ec8bf 1Gi RWO rook-cephfs 9m41s
检查redis集群状态:
$ kubectl exec -it redis-cluster-leader-0 -c redis-cluster-leader -n redis -- redis-cli -a password cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cadc46b6046b090e4e486de48af68892d6ec013f 10.100.33.234:6379@16379 myself,master - 0 1629787691000 1 connected 0-5460
b9fc9e9fc140b34189155ffaf644722c0c73e6e3 10.100.39.202:6379@16379 slave eab9b30ec864c164d7896249892871b8eea0e5be 0 1629787691531 2 connected
d1429a4954690bfa287965fc9245b3a466a8fbd1 10.100.33.232:6379@16379 slave d73f39b073e151f6199870cc0061dbea6e201b21 0 1629787691831 3 connected
eab9b30ec864c164d7896249892871b8eea0e5be 10.100.39.201:6379@16379 master - 0 1629787692332 2 connected 5461-10922
2699102ec3d7600088640d84308abad429c66ac5 10.100.33.228:6379@16379 slave cadc46b6046b090e4e486de48af68892d6ec013f 0 1629787691000 1 connected
d73f39b073e151f6199870cc0061dbea6e201b21 10.100.39.198:6379@16379 master - 0 1629787691330 3 connected 10923-16383
5、故障恢复测试
首先向集群中写入一对键值tony:stark:
$ kubectl exec -it redis-cluster-leader-0 -c redis-cluster-leader -n redis -- /bin/bash
bash-4.4# redis-cli -a password -c
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> set testname zrj
-> Redirected to slot [15494] located at 10.100.39.198:6379
OK
10.100.39.198:6379> get testname
"zrj"
可以看到数据被写入了 10.100.39.198 这个分片中,查看此IP属于哪个pod:
$ kubectl get po -n redis -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
redis-cluster-follower-0 2/2 Running 0 51m 10.100.33.228 rook-ceph3 <none> <none>
redis-cluster-follower-1 2/2 Running 0 43m 10.100.39.202 rook-ceph4 <none> <none>
redis-cluster-follower-2 2/2 Running 0 42m 10.100.33.232 rook-ceph3 <none> <none>
redis-cluster-leader-0 2/2 Running 0 51m 10.100.33.234 rook-ceph3 <none> <none>
redis-cluster-leader-1 2/2 Running 0 44m 10.100.39.201 rook-ceph4 <none> <none>
redis-cluster-leader-2 2/2 Running 0 42m 10.100.39.198 rook-ceph4 <none> <none>
redis-operator-796bb6d6f6-zv8lc 1/1 Running 0 3h46m 10.100.39.196 rook-ceph4 <none> <none>
可以看到数据被写入redis-cluster-leader-2,现在删除这个pod:
$ kubectl delete po redis-cluster-leader-2 -n redis
pod "redis-cluster-leader-2" deleted
新的pod自动创建:
$ kubectl get po -n redis -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
redis-cluster-follower-0 2/2 Running 0 59m 10.100.33.228 rook-ceph3 <none> <none>
redis-cluster-follower-1 2/2 Running 0 52m 10.100.39.202 rook-ceph4 <none> <none>
redis-cluster-follower-2 2/2 Running 0 51m 10.100.33.232 rook-ceph3 <none> <none>
redis-cluster-leader-0 2/2 Running 0 59m 10.100.33.234 rook-ceph3 <none> <none>
redis-cluster-leader-1 2/2 Running 0 52m 10.100.39.201 rook-ceph4 <none> <none>
redis-cluster-leader-2 2/2 Running 0 53s 10.100.33.231 rook-ceph3 <none> <none>
redis-operator-796bb6d6f6-zv8lc 1/1 Running 0 3h54m 10.100.39.196 rook-ceph4 <none> <none>
再次检查集群状态和数据情况:
$ kubectl exec -it redis-cluster-leader-0 -c redis-cluster-leader -n redis -- redis-cli -a password cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cadc46b6046b090e4e486de48af68892d6ec013f 10.100.33.234:6379@16379 myself,master - 0 1629789865000 1 connected 0-5460
b9fc9e9fc140b34189155ffaf644722c0c73e6e3 10.100.39.202:6379@16379 slave eab9b30ec864c164d7896249892871b8eea0e5be 0 1629789867000 2 connected
d1429a4954690bfa287965fc9245b3a466a8fbd1 10.100.33.232:6379@16379 master - 0 1629789866000 4 connected 10923-16383
eab9b30ec864c164d7896249892871b8eea0e5be 10.100.39.201:6379@16379 master - 0 1629789866571 2 connected 5461-10922
2699102ec3d7600088640d84308abad429c66ac5 10.100.33.228:6379@16379 slave cadc46b6046b090e4e486de48af68892d6ec013f 0 1629789866671 1 connected
d73f39b073e151f6199870cc0061dbea6e201b21 10.100.33.231:6379@16379 slave d1429a4954690bfa287965fc9245b3a466a8fbd1 0 1629789867000 4 connected
$ kubectl exec -it redis-cluster-leader-0 -c redis-cluster-leader -n redis -- redis-cli -a password -c get testname
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
"zrj"
6、kubernetes内部访问redis集群
redis-operator提供了两个服务来访问redis主节点leader和从节点follower,分别是redis-cluster-leader、redis-cluster-follower:
$ kubectl get svc -n redis
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-cluster-follower ClusterIP 10.96.33.76 <none> 6379/TCP,9121/TCP 118m
redis-cluster-follower-external-service NodePort 10.96.117.75 <none> 6379:30515/TCP 118m
redis-cluster-follower-headless ClusterIP None <none> 6379/TCP 118m
redis-cluster-leader ClusterIP 10.96.9.210 <none> 6379/TCP,9121/TCP 118m
redis-cluster-leader-external-service NodePort 10.96.72.42 <none> 6379:30779/TCP 118m
redis-cluster-leader-headless ClusterIP None <none> 6379/TCP 118m
在kubernetes内部,可以使用FQDN来访问这两个服务,redis-cluster-leader.redis.svc.cluster.local、 redis-cluster-follower.redis.svc.cluster.local。其中6379端口为redis cluster端口,9121端口为redis_exporter端口。测试如下(此处用了个自定义镜像,也可直接跑centos镜像安装telnet工具测试):
$ kubectl run net-tools --image=.../net-tools-collection:v1.0.0
$ kubectl exec -it net-tools -- /bin/bash
root@net-tools:/# telnet redis-cluster-follower.redis.svc.cluster.local 6379
Trying 10.96.33.76...
Connected to redis-cluster-follower.redis.svc.cluster.local.
Escape character is '^]'.
^]
telnet> q
Connection closed.
root@net-tools:/#
root@net-tools:/# telnet redis-cluster-leader.redis.svc.cluster.local 6379
Trying 10.96.9.210...
Connected to redis-cluster-leader.redis.svc.cluster.local.
Escape character is '^]'.
^]
telnet> q
Connection closed.
root@net-tools:/#
root@net-tools:/# telnet redis-cluster-follower.redis.svc.cluster.local 9121
Trying 10.96.33.76...
Connected to redis-cluster-follower.redis.svc.cluster.local.
Escape character is '^]'.
^]
telnet> q
Connection closed.
root@net-tools:/#
root@net-tools:/# telnet redis-cluster-leader.redis.svc.cluster.local 9121
Trying 10.96.9.210...
Connected to redis-cluster-leader.redis.svc.cluster.local.
Escape character is '^]'.
^]
telnet> q
Connection closed.
7、外部访问redis集群
一般情况下redis仅需在kubernetes内部访问,如果确实需要从外部访问,也可以使用NodePort或者LoadBalancer将服务暴露到外部。在第四步的配置文件中,我们已经启用了此功能:
externalService:
enabled: true
# annotations:
# foo: bar
serviceType: NodePort
port: 6379
服务名称为redis-cluster-leader-external-service、redis-cluster-follower-external-service:
$ kubectl get svc -n redis
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-cluster-follower ClusterIP 10.96.33.76 <none> 6379/TCP,9121/TCP 167m
redis-cluster-follower-external-service NodePort 10.96.117.75 <none> 6379:30515/TCP 167m
redis-cluster-follower-headless ClusterIP None <none> 6379/TCP 167m
redis-cluster-leader ClusterIP 10.96.9.210 <none> 6379/TCP,9121/TCP 167m
redis-cluster-leader-external-service NodePort 10.96.72.42 <none> 6379:30779/TCP 167m
redis-cluster-leader-headless ClusterIP None <none> 6379/TCP 167m
但实际测试发现,访问节点的30515、30779端口不通。以下为排查此故障的步骤。
由第六步可以看到,从kubernetes内部可访问6379端口,因此排除pod本身存在问题的可能。
检查iptables:
# 在工作node上执行
$ sudo iptables -L -n|grep 30515
REJECT tcp -- 0.0.0.0/0 0.0.0.0/0 /* redis/redis-cluster-follower-external-service:client has no endpoints */ ADDRTYPE match dst-type LOCAL tcp dpt:30515 reject-with icmp-port-unreachable
$ sudo iptables -L -n|grep 30779
REJECT tcp -- 0.0.0.0/0 0.0.0.0/0 /* redis/redis-cluster-leader-external-service:client has no endpoints */ ADDRTYPE match dst-type LOCAL tcp dpt:30779 reject-with icmp-port-unreachable
策略存在,但看到提示client has no endpoints。检查endpoint:
$ kubectl get endpoints -n redis
NAME ENDPOINTS AGE
redis-cluster-follower 10.100.33.228:9121,10.100.33.232:9121,10.100.39.202:9121 + 3 more... 174m
redis-cluster-follower-external-service <none> 174m
redis-cluster-follower-headless 10.100.33.228:6379,10.100.33.232:6379,10.100.39.202:6379 174m
redis-cluster-leader 10.100.33.231:9121,10.100.33.234:9121,10.100.39.201:9121 + 3 more... 174m
redis-cluster-leader-external-service <none> 174m
redis-cluster-leader-headless 10.100.33.231:6379,10.100.33.234:6379,10.100.39.201:6379 174m
可以看到 external-service 未路由到相应的pod。检查redis-cluster-follower-external-service服务的yaml文件:
$ kubectl get svc redis-cluster-follower-external-service -n redis -o yaml
......
selector:
app: redis-cluster-follower
redis_setup_type: follower
role: follower
......
$ kubectl get svc redis-cluster-leader-external-service -n redis -o yaml
......
selector:
app: redis-cluster-leader
redis_setup_type: leader
role: leader
......
检查pod的yaml文件:
$ kubectl get po redis-cluster-follower-0 -n redis -o yaml
......
labels:
app: redis-cluster-follower
controller-revision-hash: redis-cluster-follower-9b9c5cfdb
redis_setup_type: cluster
role: follower
statefulset.kubernetes.io/pod-name: redis-cluster-follower-0
......
$ kubectl get po redis-cluster-leader-0 -n redis -o yaml
......
labels:
app: redis-cluster-leader
controller-revision-hash: redis-cluster-leader-6b4b6d8c76
redis_setup_type: cluster
role: leader
statefulset.kubernetes.io/pod-name: redis-cluster-leader-0
......
可以看到, external-service的selector.redis_setup_type标签值与pod不匹配。编辑第四步中helm pull拉取下来的redis-cluster源文件中,修改选择器标签redis_setup_type值为cluster:
$ vi redis-cluster/templates/follower-service.yaml
......
selector:
app: {{ .Release.Name }}-follower
redis_setup_type: cluster
role: follower
......
$ vi redis-cluster/templates/leader-service.yaml
......
selector:
app: {{ .Release.Name }}-leader
redis_setup_type: cluster
role: leader
......
安装更新:
$ helm upgrade redis-cluster ./redis-cluster --namespace redis
Release "redis-cluster" has been upgraded. Happy Helming!
NAME: redis-cluster
LAST DEPLOYED: Tue Aug 24 18:03:20 2021
NAMESPACE: redis
STATUS: deployed
REVISION: 2
TEST SUITE: None
再次测试节点的30515和30779端口(kubectl get svc -n redis查询),访问正常。