目录

1、安装Helm

2、安装redis-operator

3、配置secret

4、创建一个三主三从的redis集群

5、故障恢复测试

6、kubernetes内部访问redis集群

7、外部访问redis集群


1、安装Helm

$ wget https://get.helm.sh/helm-v3.6.3-linux-amd64.tar.gz
$ tar -zxvf helm-v3.6.3-linux-amd64.tar.gz
$ sudo mv linux-amd64/helm /usr/local/bin/helm

2、安装redis-operator

$ kubectl create namespace redis
# 拉取文件到本地以便自定义配置
$ helm pull ot-helm/redis-operator
$ tar -zxvf redis-operator-0.7.0.tgz
# 编辑redis-operator/values.yaml
$ helm install redis-operator ./redis-operator --namespace redis

helm可以用以下五种方式安装chart,上述用了第三种。

  1. By chart reference: helm install mymaria example/mariadb
  2. By path to a packaged chart: helm install mynginx ./nginx-1.2.3.tgz
  3. By path to an unpacked chart directory: helm install mynginx ./nginx
  4. By absolute URL: helm install mynginx https://example.com/charts/nginx-1.2.3.tgz
  5. By chart reference and repo url: helm install --repo https://example.com/charts/ mynginx nginx

检查redis-operator是否正常运行:

$ kubectl get po -n redis
NAME                              READY   STATUS    RESTARTS   AGE
redis-operator-796bb6d6f6-zv8lc   1/1     Running   0          9m18s

3、配置secret

secret用于密码认证,secret默认名为redis-secret,字段为password,此处密码设置为password

$ kubectl create secret generic redis-secret --from-literal=password=password -n redis

4、创建一个三主三从的redis集群

k8s 安装redis 集群 k8s搭建redis集群_k8s

$ helm pull ot-helm/redis-cluster
$ tar -zxvf redis-cluster-0.7.0.tgz

根据需要配置集群参数:

$ vi redis-cluster/values.yaml
---
redisCluster:
  clusterSize: 3
  image: quay.io/opstree/redis
  tag: v6.2
  imagePullPolicy: IfNotPresent
  redisSecret:
    secretName: redis-secret
    secretKey: password
  leaderServiceType: ClusterIP
  followerServiceType: ClusterIP
  resources: {}
    # requests:
    #   cpu: 100m
    #   memory: 128Mi
    # limits:
    #   cpu: 100m
    #   memory: 128Mi

externalService:
  enabled: true
  # annotations:
  #   foo: bar
  serviceType: NodePort
  port: 6379

serviceMonitor:
  enabled: false
  interval: 30s
  scrapeTimeout: 10s
  namespace: monitoring

redisExporter:
  enabled: true
  image: quay.io/opstree/redis-exporter
  tag: "1.0"
  imagePullPolicy: IfNotPresent
  resources: {}
    # requests:
    #   cpu: 100m
    #   memory: 128Mi
    # limits:
    #   cpu: 100m
    #   memory: 128Mi

# priorityClassName: "-"

nodeSelector: {}
  # memory: medium

storageSpec:
  volumeClaimTemplate:
    spec:
      storageClassName: rook-cephfs
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 1Gi
  #   selector: {}

securityContext: {}
  # runAsUser: 1000

affinity: {}
  # nodeAffinity:
  #   requiredDuringSchedulingIgnoredDuringExecution:
  #     nodeSelectorTerms:
  #     - matchExpressions:
  #       - key: disktype
  #         operator: In
  #         values:
  #         - ssd

以上配置文件主要修改了redisCluster.redisSecret、externalService.serviceType、storageClassName,请根据生产实际情况修改。

安装redis-cluster:

$ helm install redis-cluster ./redis-cluster --namespace redis

验证主从pod的状态:

$ kubectl get po -n redis
NAME                              READY   STATUS    RESTARTS   AGE
redis-cluster-follower-0          2/2     Running   0          17m
redis-cluster-follower-1          2/2     Running   0          9m54s
redis-cluster-follower-2          2/2     Running   0          8m51s
redis-cluster-leader-0            2/2     Running   0          17m
redis-cluster-leader-1            2/2     Running   0          10m
redis-cluster-leader-2            2/2     Running   0          8m55s
redis-operator-796bb6d6f6-zv8lc   1/1     Running   0          3h12m

持久化文件存储在pvc中:

[app@rook-ceph1 redis-cluster]$ kubectl get pvc -n redis 
NAME                                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
redis-cluster-follower-redis-cluster-follower-0   Bound    pvc-c744ff2c-751f-4a9c-bd8f-636d2e3e519f   1Gi        RWO            rook-cephfs    18m
redis-cluster-follower-redis-cluster-follower-1   Bound    pvc-05d3011b-02ac-47f8-acc2-e45bab7d1e62   1Gi        RWO            rook-cephfs    10m
redis-cluster-follower-redis-cluster-follower-2   Bound    pvc-f32a3a33-5d3c-436a-b08b-457fe7e306ad   1Gi        RWO            rook-cephfs    9m37s
redis-cluster-leader-redis-cluster-leader-0       Bound    pvc-b74cbfd3-1d69-45fe-bb46-354ac5851b82   1Gi        RWO            rook-cephfs    18m
redis-cluster-leader-redis-cluster-leader-1       Bound    pvc-6eaf3f14-0810-48d5-b350-a6fa83ed511e   1Gi        RWO            rook-cephfs    11m
redis-cluster-leader-redis-cluster-leader-2       Bound    pvc-58fb501f-7f77-4c95-9398-3efe700ec8bf   1Gi        RWO            rook-cephfs    9m41s

检查redis集群状态:

$ kubectl exec -it redis-cluster-leader-0 -c redis-cluster-leader -n redis -- redis-cli -a password cluster nodes   
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cadc46b6046b090e4e486de48af68892d6ec013f 10.100.33.234:6379@16379 myself,master - 0 1629787691000 1 connected 0-5460
b9fc9e9fc140b34189155ffaf644722c0c73e6e3 10.100.39.202:6379@16379 slave eab9b30ec864c164d7896249892871b8eea0e5be 0 1629787691531 2 connected
d1429a4954690bfa287965fc9245b3a466a8fbd1 10.100.33.232:6379@16379 slave d73f39b073e151f6199870cc0061dbea6e201b21 0 1629787691831 3 connected
eab9b30ec864c164d7896249892871b8eea0e5be 10.100.39.201:6379@16379 master - 0 1629787692332 2 connected 5461-10922
2699102ec3d7600088640d84308abad429c66ac5 10.100.33.228:6379@16379 slave cadc46b6046b090e4e486de48af68892d6ec013f 0 1629787691000 1 connected
d73f39b073e151f6199870cc0061dbea6e201b21 10.100.39.198:6379@16379 master - 0 1629787691330 3 connected 10923-16383

5、故障恢复测试

首先向集群中写入一对键值tony:stark:

$ kubectl exec -it redis-cluster-leader-0 -c redis-cluster-leader -n redis -- /bin/bash
bash-4.4# redis-cli -a password -c 
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> set testname zrj
-> Redirected to slot [15494] located at 10.100.39.198:6379
OK
10.100.39.198:6379> get testname
"zrj"

可以看到数据被写入了 10.100.39.198 这个分片中,查看此IP属于哪个pod:

$ kubectl get po -n redis -o wide
NAME                              READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
redis-cluster-follower-0          2/2     Running   0          51m     10.100.33.228   rook-ceph3   <none>           <none>
redis-cluster-follower-1          2/2     Running   0          43m     10.100.39.202   rook-ceph4   <none>           <none>
redis-cluster-follower-2          2/2     Running   0          42m     10.100.33.232   rook-ceph3   <none>           <none>
redis-cluster-leader-0            2/2     Running   0          51m     10.100.33.234   rook-ceph3   <none>           <none>
redis-cluster-leader-1            2/2     Running   0          44m     10.100.39.201   rook-ceph4   <none>           <none>
redis-cluster-leader-2            2/2     Running   0          42m     10.100.39.198   rook-ceph4   <none>           <none>
redis-operator-796bb6d6f6-zv8lc   1/1     Running   0          3h46m   10.100.39.196   rook-ceph4   <none>           <none>

可以看到数据被写入redis-cluster-leader-2,现在删除这个pod:

$ kubectl delete po redis-cluster-leader-2 -n redis
pod "redis-cluster-leader-2" deleted

新的pod自动创建:

$ kubectl get po -n redis -o wide
NAME                              READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
redis-cluster-follower-0          2/2     Running   0          59m     10.100.33.228   rook-ceph3   <none>           <none>
redis-cluster-follower-1          2/2     Running   0          52m     10.100.39.202   rook-ceph4   <none>           <none>
redis-cluster-follower-2          2/2     Running   0          51m     10.100.33.232   rook-ceph3   <none>           <none>
redis-cluster-leader-0            2/2     Running   0          59m     10.100.33.234   rook-ceph3   <none>           <none>
redis-cluster-leader-1            2/2     Running   0          52m     10.100.39.201   rook-ceph4   <none>           <none>
redis-cluster-leader-2            2/2     Running   0          53s     10.100.33.231   rook-ceph3   <none>           <none>
redis-operator-796bb6d6f6-zv8lc   1/1     Running   0          3h54m   10.100.39.196   rook-ceph4   <none>           <none>

再次检查集群状态和数据情况:

$ kubectl exec -it redis-cluster-leader-0 -c redis-cluster-leader -n redis -- redis-cli -a password cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cadc46b6046b090e4e486de48af68892d6ec013f 10.100.33.234:6379@16379 myself,master - 0 1629789865000 1 connected 0-5460
b9fc9e9fc140b34189155ffaf644722c0c73e6e3 10.100.39.202:6379@16379 slave eab9b30ec864c164d7896249892871b8eea0e5be 0 1629789867000 2 connected
d1429a4954690bfa287965fc9245b3a466a8fbd1 10.100.33.232:6379@16379 master - 0 1629789866000 4 connected 10923-16383
eab9b30ec864c164d7896249892871b8eea0e5be 10.100.39.201:6379@16379 master - 0 1629789866571 2 connected 5461-10922
2699102ec3d7600088640d84308abad429c66ac5 10.100.33.228:6379@16379 slave cadc46b6046b090e4e486de48af68892d6ec013f 0 1629789866671 1 connected
d73f39b073e151f6199870cc0061dbea6e201b21 10.100.33.231:6379@16379 slave d1429a4954690bfa287965fc9245b3a466a8fbd1 0 1629789867000 4 connected


$ kubectl exec -it redis-cluster-leader-0 -c redis-cluster-leader -n redis -- redis-cli -a password -c get testname
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
"zrj"

6、kubernetes内部访问redis集群

redis-operator提供了两个服务来访问redis主节点leader和从节点follower,分别是redis-cluster-leader、redis-cluster-follower:

$ kubectl get svc -n redis
NAME                                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
redis-cluster-follower                    ClusterIP   10.96.33.76    <none>        6379/TCP,9121/TCP   118m
redis-cluster-follower-external-service   NodePort    10.96.117.75   <none>        6379:30515/TCP      118m
redis-cluster-follower-headless           ClusterIP   None           <none>        6379/TCP            118m
redis-cluster-leader                      ClusterIP   10.96.9.210    <none>        6379/TCP,9121/TCP   118m
redis-cluster-leader-external-service     NodePort    10.96.72.42    <none>        6379:30779/TCP      118m
redis-cluster-leader-headless             ClusterIP   None           <none>        6379/TCP            118m

在kubernetes内部,可以使用FQDN来访问这两个服务,redis-cluster-leader.redis.svc.cluster.local、 redis-cluster-follower.redis.svc.cluster.local。其中6379端口为redis cluster端口,9121端口为redis_exporter端口。测试如下(此处用了个自定义镜像,也可直接跑centos镜像安装telnet工具测试):

$ kubectl run net-tools --image=.../net-tools-collection:v1.0.0
$ kubectl exec -it net-tools -- /bin/bash
root@net-tools:/# telnet redis-cluster-follower.redis.svc.cluster.local 6379                                            
Trying 10.96.33.76...
Connected to redis-cluster-follower.redis.svc.cluster.local.
Escape character is '^]'.
^]
telnet> q
Connection closed.
root@net-tools:/#                                                           
root@net-tools:/# telnet redis-cluster-leader.redis.svc.cluster.local 6379        
Trying 10.96.9.210...
Connected to redis-cluster-leader.redis.svc.cluster.local.
Escape character is '^]'.
^]
telnet> q
Connection closed.
root@net-tools:/# 
root@net-tools:/# telnet redis-cluster-follower.redis.svc.cluster.local 9121
Trying 10.96.33.76...
Connected to redis-cluster-follower.redis.svc.cluster.local.
Escape character is '^]'.
^]
telnet> q
Connection closed.
root@net-tools:/# 
root@net-tools:/# telnet redis-cluster-leader.redis.svc.cluster.local 9121  
Trying 10.96.9.210...
Connected to redis-cluster-leader.redis.svc.cluster.local.
Escape character is '^]'.
^]
telnet> q
Connection closed.

7、外部访问redis集群

一般情况下redis仅需在kubernetes内部访问,如果确实需要从外部访问,也可以使用NodePort或者LoadBalancer将服务暴露到外部。在第四步的配置文件中,我们已经启用了此功能:

externalService:
  enabled: true
  # annotations:
  #   foo: bar
  serviceType: NodePort
  port: 6379

服务名称为redis-cluster-leader-external-service、redis-cluster-follower-external-service:

$ kubectl get svc -n redis
NAME                                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)             AGE
redis-cluster-follower                    ClusterIP   10.96.33.76    <none>        6379/TCP,9121/TCP   167m
redis-cluster-follower-external-service   NodePort    10.96.117.75   <none>        6379:30515/TCP      167m
redis-cluster-follower-headless           ClusterIP   None           <none>        6379/TCP            167m
redis-cluster-leader                      ClusterIP   10.96.9.210    <none>        6379/TCP,9121/TCP   167m
redis-cluster-leader-external-service     NodePort    10.96.72.42    <none>        6379:30779/TCP      167m
redis-cluster-leader-headless             ClusterIP   None           <none>        6379/TCP            167m

但实际测试发现,访问节点的30515、30779端口不通。以下为排查此故障的步骤。

由第六步可以看到,从kubernetes内部可访问6379端口,因此排除pod本身存在问题的可能。

检查iptables:

# 在工作node上执行
$ sudo iptables -L -n|grep 30515
REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0            /* redis/redis-cluster-follower-external-service:client has no endpoints */ ADDRTYPE match dst-type LOCAL tcp dpt:30515 reject-with icmp-port-unreachable

$ sudo iptables -L -n|grep 30779
REJECT     tcp  --  0.0.0.0/0            0.0.0.0/0            /* redis/redis-cluster-leader-external-service:client has no endpoints */ ADDRTYPE match dst-type LOCAL tcp dpt:30779 reject-with icmp-port-unreachable

策略存在,但看到提示client has no endpoints。检查endpoint:

$ kubectl get endpoints -n redis
NAME                                      ENDPOINTS                                                              AGE
redis-cluster-follower                    10.100.33.228:9121,10.100.33.232:9121,10.100.39.202:9121 + 3 more...   174m
redis-cluster-follower-external-service   <none>                                                                 174m
redis-cluster-follower-headless           10.100.33.228:6379,10.100.33.232:6379,10.100.39.202:6379               174m
redis-cluster-leader                      10.100.33.231:9121,10.100.33.234:9121,10.100.39.201:9121 + 3 more...   174m
redis-cluster-leader-external-service     <none>                                                                 174m
redis-cluster-leader-headless             10.100.33.231:6379,10.100.33.234:6379,10.100.39.201:6379               174m

可以看到 external-service 未路由到相应的pod。检查redis-cluster-follower-external-service服务的yaml文件:

$ kubectl get svc redis-cluster-follower-external-service -n redis -o yaml
......
  selector:
    app: redis-cluster-follower
    redis_setup_type: follower
    role: follower
......
$ kubectl get svc redis-cluster-leader-external-service -n redis -o yaml
......
  selector:
    app: redis-cluster-leader
    redis_setup_type: leader
    role: leader
......

检查pod的yaml文件:

$ kubectl get po redis-cluster-follower-0 -n redis -o yaml
......
  labels:
    app: redis-cluster-follower
    controller-revision-hash: redis-cluster-follower-9b9c5cfdb
    redis_setup_type: cluster
    role: follower
    statefulset.kubernetes.io/pod-name: redis-cluster-follower-0
......
$ kubectl get po redis-cluster-leader-0 -n redis -o yaml
......
  labels:
    app: redis-cluster-leader
    controller-revision-hash: redis-cluster-leader-6b4b6d8c76
    redis_setup_type: cluster
    role: leader
    statefulset.kubernetes.io/pod-name: redis-cluster-leader-0
......

可以看到, external-service的selector.redis_setup_type标签值与pod不匹配。编辑第四步中helm pull拉取下来的redis-cluster源文件中,修改选择器标签redis_setup_type值为cluster:

$ vi redis-cluster/templates/follower-service.yaml
......
  selector:
    app: {{ .Release.Name }}-follower
    redis_setup_type: cluster
    role: follower
......

$ vi redis-cluster/templates/leader-service.yaml
......
  selector:
    app: {{ .Release.Name }}-leader
    redis_setup_type: cluster
    role: leader
......

安装更新:

$ helm upgrade redis-cluster ./redis-cluster --namespace redis
Release "redis-cluster" has been upgraded. Happy Helming!
NAME: redis-cluster
LAST DEPLOYED: Tue Aug 24 18:03:20 2021
NAMESPACE: redis
STATUS: deployed
REVISION: 2
TEST SUITE: None

再次测试节点的30515和30779端口(kubectl get svc -n redis查询),访问正常。