参考文章:

https://ieevee.com/tech/2018/05/16/k8s-rbd.html
https://zhangchenchen.github.io/2017/11/17/kubernetes-integrate-with-ceph/
https://docs.openshift.com/container-platform/3.5/install_config/storage_examples/ceph_rbd_dynamic_example.html
https://jimmysong.io/kubernetes-handbook/practice/using-ceph-for-persistent-storage.html

感谢以上作者提供的技术参考,这里我加以整理,分别实现了多主数据库集群和主从数据库结合Ceph RDB的实现方式。以下配置只为测试使用,不能做为生产配置。

K8S中存储的分类

在K8S的持久化存储中主要有以下几种分类:

  • volume: 就是直接挂载在pod上的组件,k8s中所有的其他存储组件都是通过volume来跟pod直接联系的。volume有个type属性,type决定了挂载的存储是什么,常见的比如:emptyDir,hostPath,nfs,rbd,以及下文要说的persistentVolumeClaim等。跟docker里面的volume概念不同的是,docker里的volume的生命周期是跟docker紧紧绑在一起的。这里根据type的不同,生命周期也不同,比如emptyDir类型的就是跟docker一样,pod挂掉,对应的volume也就消失了,而其他类型的都是永久存储。详细介绍可以参考Volumes

  • Persistent Volumes:顾名思义,这个组件就是用来支持永久存储的,Persistent Volumes组件会抽象后端存储的提供者(也就是上文中volume中的type)和消费者(即具体哪个pod使用)。该组件提供了PersistentVolume和PersistentVolumeClaim两个概念来抽象上述两者。一个PersistentVolume(简称PV)就是后端存储提供的一块存储空间,具体到ceph rbd中就是一个image,一个PersistentVolumeClaim(简称PVC)可以看做是用户对PV的请求,PVC会跟某个PV绑定,然后某个具体pod会在volume 中挂载PVC,就挂载了对应的PV。

  • Dynamic Volume Provisioning: 动态volume发现,比如上面的Persistent Volumes,我们必须先要创建一个存储块,比如一个ceph中的image,然后将该image绑定PV,才能使用。这种静态的绑定模式太僵硬,每次申请存储都要向存储提供者索要一份存储快。Dynamic Volume Provisioning就是解决这个问题的。它引入了StorageClass这个概念,StorageClass抽象了存储提供者,只需在PVC中指定StorageClass,然后说明要多大的存储就可以了,存储提供者会根据需求动态创建所需存储快。甚至于,我们可以指定一个默认StorageClass,这样,只需创建PVC就可以了。

配置初始化环境

  • 已经有一个k8s集群
  • 已经有一个Ceph 集群

所有节点安装ceph-common

添加ceph的yum源:

[Ceph]
name=Ceph packages for $basearch
baseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

[Ceph-noarch]
name=Ceph noarch packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

安装ceph-common:

yum install ceph-common -y

如果安装过程出现依赖报错,可以通过如下方式解决:

yum install -y yum-utils && \
yum-config-manager --add-repo https://dl.fedoraproject.org/pub/epel/7/x86_64/ && \
yum install --nogpgcheck -y epel-release && \
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 && \
rm -f /etc/yum.repos.d/dl.fedoraproject.org*

yum -y install ceph-common

配置ceph配置文件

将ceph配置文件拷贝到各个k8s的node节点

[root@ceph-1 ~]# scp /etc/ceph k8s-node:/etc/

测试volume

通过使用一个简单的volume,测试集群环境是否正常,在实际的应用中,需要永久保存的数据不能使用volume的方式。

在Ceph集群中创建images

创建新的镜像时,需要禁用某些不支持的属性:

 rbd create foobar -s 1024 -p k8s
 rbd feature disable k8s/foobar object-map fast-diff deep-flatten

查看镜像信息:

# rbd info k8s/foobar
rbd image 'foobar':
    size 1 GiB in 256 objects
    order 22 (4 MiB objects)
    id: ad9b6b8b4567
    block_name_prefix: rbd_data.ad9b6b8b4567
    format: 2
    features: layering, exclusive-lock
    op_features: 
    flags: 
    create_timestamp: Tue Apr 23 17:37:39 2019

使用POD直接挂载volume

这里指定了ceph的 admin.keyring文件作为认证密钥:

# cat test.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: rbd
spec:
  containers:
    - image: nginx
      name: rbd-rw
      volumeMounts:
      - name: rbdpd
        mountPath: /mnt
  volumes:
    - name: rbdpd
      rbd:
        monitors:
        - '192.168.20.41:6789'
        pool: k8s
        image: foobar
        fsType: xfs
        readOnly: false
        user: admin
        keyring: /etc/ceph/ceph.client.admin.keyring

使用PV和PVC

如果需要永久保存数据(当pod删除后数据不会丢失),我们需要使用PV(PersistentVolume),和PVC(PersistentVolumeClaim)的方式。

在Ceph集群中创建images

rbd create -s 1024 k8s/pv
rbd feature disable k8s/pv object-map fast-diff deep-flatten

查看镜像信息:

# rbd info k8s/pv
rbd image 'pv':
    size 1 GiB in 256 objects
    order 22 (4 MiB objects)
    id: adaa6b8b4567
    block_name_prefix: rbd_data.adaa6b8b4567
    format: 2
    features: layering, exclusive-lock
    op_features: 
    flags: 
    create_timestamp: Tue Apr 23 19:09:58 2019

创建一个secret

  1. 生成一个加密的key
grep key /etc/ceph/ceph.client.admin.keyring |awk '{printf "%s", $NF}'|base64
  1. 将生成的key创建一个Secret
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
type: "kubernetes.io/rbd"  
data:
  key: QVFBbk1MaGNBV2laSGhBQUVOQThRWGZyQ3haRkJDNlJaWTNJY1E9PQ==
---

创建PV和PVC文件

# cat ceph-rbd-pv.yaml 

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ceph-rbd-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  rbd:
    monitors:
      - '192.168.20.41:6789'
    pool: k8s
    image: pv
    user: admin
    secretRef:
      name: ceph-secret
    fsType: xfs
    readOnly: false
  persistentVolumeReclaimPolicy: Recycle

# cat ceph-rbd-pvc.yaml 

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-rbd-pv-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

创建POD

# cat test3-pvc.yaml 

apiVersion: v1
kind: Pod
metadata:
  name: rbd-nginx
spec:
  containers:
    - image: nginx
      name: rbd-rw
      volumeMounts:
      - name: rbd-pvc
        mountPath: /mnt
  volumes:
    - name: rbd-pvc
      persistentVolumeClaim:
        claimName: ceph-rbd-pv-claim

使用StorageClass

Storage Class的作用

简单来说,storage配置了要访问ceph RBD的IP/Port、用户名、keyring、pool,等信息,我们不需要提前创建image;当用户创建一个PVC时,k8s查找是否有符合PVC请求的storage class类型,如果有,则依次执行如下操作:

  • 到ceph集群上创建image
  • 创建一个PV,名字为pvc-xx-xxx-xxx,大小pvc请求的storage。
  • 将上面的PV与PVC绑定,格式化后挂到容器中

通过这种方式管理员只要创建好storage class就行了,后面的事情用户自己就可以搞定了。如果想要防止资源被耗尽,可以设置一下Resource Quota。

当pod需要一个卷时,直接通过PVC声明,就可以根据需求创建符合要求的持久卷。

创建storage class

# cat storageclass.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast
provisioner: kubernetes.io/rbd
parameters:
  monitors: 192.168.20.41:6789
  adminId: admin
  adminSecretName: ceph-secret
  pool: k8s
  userId: admin
  userSecretName: ceph-secret
  fsType: xfs
  imageFormat: "2"
  imageFeatures: "layering"

创建PVC

RBD只支持 ReadWriteOnce 和 ReadOnlyAll,不支持ReadWriteAll。注意这两者的区别点是,不同nodes之间是否可以同时挂载。同一个node上,即使是ReadWriteOnce,也可以同时挂载到2个容器上的。

创建应用的时候,需要同时创建 pv和pod,二者通过storageClassName关联。pvc中需要指定其storageClassName为上面创建的sc的name(即fast)。

# cat pvc.yaml 
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: rbd-pvc-pod-pvc
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: fast

创建pod

# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: rbd-pvc-pod
  name: ceph-rbd-sc-pod1
spec:
  containers:
  - name: ceph-rbd-sc-nginx
    image: nginx
    volumeMounts:
    - name: ceph-rbd-vol1
      mountPath: /mnt
      readOnly: false
  volumes:
  - name: ceph-rbd-vol1
    persistentVolumeClaim:
      claimName: rbd-pvc-pod-pvc

补充

在使用Storage Class时,除了使用PVC的方式声明要使用的持久卷,还可通过创建一个volumeClaimTemplates进行声明创建(StatefulSets中的存储设置),如果涉及到多个副本,可以使用StatefulSets配置:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx"
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: nginx
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "fast"
      resources:
        requests:
          storage: 1Gi

但注意不要用Deployment。因为,如果Deployment的副本数是1,那么还是可以用的,跟Pod一致;但如果副本数 >1 ,此时创建deployment后会发现,只启动了1个Pod,其他Pod都在ContainerCreating状态。过一段时间describe pod可以看到,等volume等很久都没等到。

示例一:创建一个mysql-galera集群(多主)

官方文档:https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/

statefulset简介

statefulset(1.5之前叫做petset),statefulset与deployment,replicasets是一个级别的。不过Deployments和ReplicaSets是为无状态服务而设计。statefulset则是为了解决有状态服务的问题。它的应用场景如下:

  • 稳定的持久化存储,即Pod重新调度后还是能访问到相同的持久化数据,基于PVC来实现
  • 稳定的网络标志,即Pod重新调度后其PodName和HostName不变,基于Headless Service(即没有Cluster IP的Service)来实现。
  • 有序部署,有序扩展,即Pod是有顺序的,在部署或者扩展的时候要依据定义的顺序依次依次进行(即从0到N-1,在下一个Pod运行之- 前所有之前的Pod必须都是Running和Ready状态),基于init containers来实现。
  • 有序收缩,有序删除(即从N-1到0)。

由应用场景可知,statefuleset特别适合mqsql,redis等数据库集群。相应的,一个statefuleset有以下三个部分:

  • 用于定义网络标志(DNS domain)的HeadlessService,参考文档
  • 用于创建PersistentVolumes的volumeClaimTemplates
  • 定义具体应用的StatefulSet

1. 生成并创建ceph secret

如果k8s集群中已经创建了ceph 的secret可以跳过此步

生成一个加密的key

grep key /etc/ceph/ceph.client.admin.keyring |awk '{printf "%s", $NF}'|base64

将生成的key创建一个Secret

apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
  namespace: galera
type: "kubernetes.io/rbd"  
data:
  key: QVFBbk1MaGNBV2laSGhBQUVOQThRWGZyQ3haRkJDNlJaWTNJY1E9PQ==
---

2. 创建StorageClass

# cat storageclass.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast
provisioner: kubernetes.io/rbd
parameters:
  monitors: 192.168.20.41:6789,192.168.20.42:6789,192.168.20.43:6789
  adminId: admin
  adminSecretName: ceph-secret
  pool: k8s
  userId: admin
  userSecretName: ceph-secret
  fsType: xfs
  imageFormat: "2"
  imageFeatures: "layering"

3. 创建headless Service

galera-service.yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
  name: galera
  namespace: galera
  labels:
    app: mysql
spec:
  ports:
  - port: 3306
    name: mysql
  # *.galear.default.svc.cluster.local
  clusterIP: None
  selector:
    app: mysql

4. 创建statefulset

这里使用V1版本的StatefulSet,和之前的版本相比,v1版本是当前的稳定版本,同时与之前的beta版的区别是v1版本需要添加spec.selector.matchLabels的参数,此参数需要与spec.template.metadata.labels保持一致。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: galera
spec:
  selector:
    matchLabels:
      app: mysql
  serviceName: "galera"
  replicas: 3
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
      - name: install
        image: mirrorgooglecontainers/galera-install:0.1
        imagePullPolicy: Always
        args:
        - "--work-dir=/work-dir"
        volumeMounts:
        - name: workdir
          mountPath: "/work-dir"
        - name: config
          mountPath: "/etc/mysql"
      - name: bootstrap
        image: debian:jessie
        command:
        - "/work-dir/peer-finder"
        args:
        - -on-start="/work-dir/on-start.sh"
        - "-service=galera"
        env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        volumeMounts:
        - name: workdir
          mountPath: "/work-dir"
        - name: config
          mountPath: "/etc/mysql"
      containers:
      - name: mysql
        image: mirrorgooglecontainers/mysql-galera:e2e
        ports:
        - containerPort: 3306
          name: mysql
        - containerPort: 4444
          name: sst
        - containerPort: 4567
          name: replication
        - containerPort: 4568
          name: ist
        args:
        - --defaults-file=/etc/mysql/my-galera.cnf
        - --user=root
        readinessProbe:
          # TODO: If docker exec is buggy just use gcr.io/google_containers/mysql-healthz:1.0
          exec:
            command:
            - sh
            - -c
            - "mysql -u root -e 'show databases;'"
          initialDelaySeconds: 15
          timeoutSeconds: 5
          successThreshold: 2
        volumeMounts:
        - name: datadir
          mountPath: /var/lib/
        - name: config
          mountPath: /etc/mysql
      volumes:
      - name: config
        emptyDir: {}
      - name: workdir
        emptyDir: {}
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.beta.kubernetes.io/storage-class: "fast"
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

5. 检查pod

查看pod状态已经正常

[root@master-1 ~]# kubectl  get pod  -n galera 
NAME      READY   STATUS    RESTARTS   AGE
mysql-0   1/1     Running   0          48m
mysql-1   1/1     Running   0          43m
mysql-2   1/1     Running   0          38m

数据库集群建立:

[root@master-1 ~]# kubectl exec mysql-1  -n galera  -- mysql -uroot -e 'show status like "wsrep_cluster_size";'
Variable_name   Value
wsrep_cluster_size  3

查看pv绑定:

[root@master-1 mysql-cluster]# kubectl get pvc -l app=mysql -n galera
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-mysql-0   Bound    pvc-6e5a1c45-666b-11e9-ad20-000c29016590   1Gi        RWO            fast           3d20h
datadir-mysql-1   Bound    pvc-25683cfd-666c-11e9-ad20-000c29016590   1Gi        RWO            fast           3d20h
datadir-mysql-2   Bound    pvc-c024b422-666c-11e9-ad20-000c29016590   1Gi        RWO            fast           3d20h

测试数据库:

kubectl  exec mysql-2 -n galera -- mysql -uroot -e <<EOF 'CREATE DATABASE demo;
CREATE TABLE demo.messages (message VARCHAR(250));
INSERT INTO demo.messages VALUES ("hello");'
EOF

查看数据:

# kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --  mysql -h 10.2.58.7 -e "SELECT * FROM demo.messages"

If you don't see a command prompt, try pressing enter.

+---------+
| message |
+---------+
| hello   |
+---------+
pod "mysql-client" deleted

定义集群内部访问数据库

如果pod之间互相访问,查询数据库就需要定义一个svc, 这里定义一个连接mysql的svc:

apiVersion: v1
kind: Service
metadata:
  name: mysql-read
  namespace: galera
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  selector:
    app: mysql

通过使用Pod来访问数据库:

# kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --  mysql -h mysql-read.galera -e "SELECT * FROM demo.messages"
+---------+
| message |
+---------+
| hello   |
+---------+
pod "mysql-client" deleted

示例二: 部署mysql主从集群

官方参考文档

1. ceph集群中创建pool

在ceph 集群中创建一个kube的pool,用于数据库的存储池:

[root@ceph-1 ~]# ceph osd pool create kube 128
pool 'kube' created

2. 使用之前创建的secretkey创建Storageclass

新定义一个storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: mysql
provisioner: kubernetes.io/rbd
parameters:
  monitors: 192.168.20.41:6789,192.168.20.42:6789,192.168.20.43:6789
  adminId: admin
  adminSecretName: ceph-secret
  pool: kube
  userId: admin
  userSecretName: ceph-secret
  fsType: xfs
  imageFormat: "2"
  imageFeatures: "layering"

3. 创建headless Service

由于要使用statefulSet进行主从数据库的部署,这里需要创建一个headless的service,和一个用于读库的service:

# Headless service for stable DNS entries of StatefulSet members.
apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  clusterIP: None
  selector:
    app: mysql
---
# Client service for connecting to any MySQL instance for reads.
# For writes, you must instead connect to the master: mysql-0.mysql.
apiVersion: v1
kind: Service
metadata:
  name: mysql-read
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  selector:
    app: mysql

4. 创建用于主从同步的配置文件configmap

由于要进行主从同步,所以必须主库和从库必须要有相应的配置:

apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql
  labels:
    app: mysql
data:
  master.cnf: |
    # Apply this config only on the master.
    [mysqld]
    log-bin
  slave.cnf: |
    # Apply this config only on slaves.
    [mysqld]
    super-read-only

5 创建statefulSet

这里指定了使用StorageClass,使用RBD存储,同时需要使用一个xtrabackup的镜像进行数据同步:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  selector:
    matchLabels:
      app: mysql
  serviceName: mysql
  replicas: 3
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
      - name: init-mysql
        image: mysql:5.7
        command:
        - bash
        - "-c"
        - |
          set -ex
          # Generate mysql server-id from pod ordinal index.
          [[ `hostname` =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          echo [mysqld] > /mnt/conf.d/server-id.cnf
          # Add an offset to avoid reserved server-id=0 value.
          echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
          # Copy appropriate conf.d files from config-map to emptyDir.
          if [[ $ordinal -eq 0 ]]; then
            cp /mnt/config-map/master.cnf /mnt/conf.d/
          else
            cp /mnt/config-map/slave.cnf /mnt/conf.d/
          fi
        volumeMounts:
        - name: conf
          mountPath: /mnt/conf.d
        - name: config-map
          mountPath: /mnt/config-map
      - name: clone-mysql
        image: tangup/xtrabackup:1.0
        command:
        - bash
        - "-c"
        - |
          set -ex
          # Skip the clone if data already exists.
          [[ -d /var/lib/mysql/mysql ]] && exit 0
          # Skip the clone on master (ordinal index 0).
          [[ `hostname` =~ -([0-9]+)$ ]] || exit 1
          ordinal=${BASH_REMATCH[1]}
          [[ $ordinal -eq 0 ]] && exit 0
          # Clone data from previous peer.
          ncat --recv-only mysql-$(($ordinal-1)).mysql 3307 | xbstream -x -C /var/lib/mysql
          # Prepare the backup.
          xtrabackup --prepare --target-dir=/var/lib/mysql
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
      containers:
      - name: mysql
        image: mysql:5.7
        env:
        - name: MYSQL_ALLOW_EMPTY_PASSWORD
          value: "1"
        ports:
        - name: mysql
          containerPort: 3306
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
        livenessProbe:
          exec:
            command: ["mysqladmin", "ping"]
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          exec:
            # Check we can execute queries over TCP (skip-networking is off).
            command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
          initialDelaySeconds: 5
          periodSeconds: 2
          timeoutSeconds: 1
      - name: xtrabackup
        image: tangup/xtrabackup:1.0 
        ports:
        - name: xtrabackup
          containerPort: 3307
        command:
        - bash
        - "-c"
        - |
          set -ex
          cd /var/lib/mysql

          # Determine binlog position of cloned data, if any.
          if [[ -f xtrabackup_slave_info ]]; then
            # XtraBackup already generated a partial "CHANGE MASTER TO" query
            # because we're cloning from an existing slave.
            mv xtrabackup_slave_info change_master_to.sql.in
            # Ignore xtrabackup_binlog_info in this case (it's useless).
            rm -f xtrabackup_binlog_info
          elif [[ -f xtrabackup_binlog_info ]]; then
            # We're cloning directly from master. Parse binlog position.
            [[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
            rm xtrabackup_binlog_info
            echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',\
                  MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in
          fi

          # Check if we need to complete a clone by starting replication.
          if [[ -f change_master_to.sql.in ]]; then
            echo "Waiting for mysqld to be ready (accepting connections)"
            until mysql -h 127.0.0.1 -e "SELECT 1"; do sleep 1; done

            echo "Initializing replication from clone position"
            # In case of container restart, attempt this at-most-once.
            mv change_master_to.sql.in change_master_to.sql.orig
            mysql -h 127.0.0.1 <<EOF
          $(<change_master_to.sql.orig),
            MASTER_HOST='mysql-0.mysql',
            MASTER_USER='root',
            MASTER_PASSWORD='',
            MASTER_CONNECT_RETRY=10;
          START SLAVE;
          EOF
          fi

          # Start a server to send backups when requested by peers.
          exec ncat --listen --keep-open --send-only --max-conns=1 3307 -c \
            "xtrabackup --backup --slave-info --stream=xbstream --host=127.0.0.1 --user=root"
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
          subPath: mysql
        - name: conf
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
      volumes:
      - name: conf
        emptyDir: {}
      - name: config-map
        configMap:
          name: mysql
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: "mysql"
      resources:
        requests:
          storage: 1Gi

6. 检查集群状态

查看pod:

[root@master-1 ~]# kubectl  get po
NAME      READY   STATUS    RESTARTS   AGE
mysql-0   2/2     Running   2          110m
mysql-1   2/2     Running   0          109m
mysql-2   2/2     Running   0          16m

pvc:

[root@master-1 ~]# kubectl get pvc |grep mysql|grep -v fast
data-mysql-0        Bound    pvc-3737108a-6a2a-11e9-ac56-000c296b46ac   1Gi        RWO            mysql          5h43m
data-mysql-1        Bound    pvc-279bdca0-6a4a-11e9-ac56-000c296b46ac   1Gi        RWO            mysql          114m
data-mysql-2        Bound    pvc-fbe153bc-6a52-11e9-ac56-000c296b46ac   1Gi        RWO            mysql          51m

Ceph集群上自动创建的镜像:

[root@ceph-1 ~]# rbd list kube
kubernetes-dynamic-pvc-2ee47370-6a4a-11e9-bb82-000c296b46ac
kubernetes-dynamic-pvc-39a42869-6a2a-11e9-bb82-000c296b46ac
kubernetes-dynamic-pvc-fbead120-6a52-11e9-bb82-000c296b46ac

7.测试数据库集群

向主库写入数据,使用headless server所提供的 podname.headlessname 的形式就可以直接访问POD, 这在DNS解析中是固定的。这里访问mysql-0就使用mysql-0.mysql:

kubectl run mysql-client --image=mysql:5.7 -i --rm --restart=Never --\
  mysql -h mysql-0.mysql <<EOF
CREATE DATABASE test;
CREATE TABLE test.messages (message VARCHAR(250));
INSERT INTO test.messages VALUES ('hello');
EOF

使用mysql-read去访问数据库数据:

# kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --  mysql -h mysql-read -e "SELECT * FROM test.messages"
+---------+
| message |
+---------+
| hello   |
+---------+

可以使用如下命令去循环的查看当前是mysql-read连接的数据库:

kubectl run mysql-client-loop --image=mysql:5.7 -i -t --rm --restart=Never --\
  bash -ic "while sleep 1; do mysql -h mysql-read -e 'SELECT @@server_id,NOW()'; done"

  +-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         102 | 2019-04-28 20:24:11 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         101 | 2019-04-28 20:27:35 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         100 | 2019-04-28 20:18:38 |
+-------------+---------------------+