背景
当前很多开源系统都是基于k8s,而部署时会遇到组件包含mysql的情况。理想的情况下,是将mysql迁移到云上托管。但实际情况可能比较复杂,比如这个开源框架需要mysql的DML权限,但是公司DBA不提供;
而基于k8s部署mysql的问题是,这个mysql集群往往是单点的。一般mysql会依赖于一个本地挂载盘的pv,如果mysql主节点所在的宿主机宕机了,那么mysql集群就挂了。最终导致的结果是整个系统不可用。很明显,这不满足高可用。
方案
基于以上场景,我们尝试构建一个mysql集群。但是如果是一主多从的架构,其实也很难满足我们的需求;因为主节点挂了,还是会导致系统不可用。因此我们需要搭建双主-多从的mysql集群。本文探索的方案是在k8s中构建。
思路
基于StatefulSet,pod内通过hostname可以获取当前pod名,根据pod名可以探索到当前svc下其他pod的地址。在获取地址后,开启同步。
步骤
前置条件
- 拥有一个k8s集群,并且至少2个节点
- 根据个人习惯,设置别名
vi ~/.bash_profile
alias kc=kubectl
alias kd='kubectl describe'
执行
kc get node
显示楼下
NAME STATUS ROLES AGE VERSION
node02 Ready <none> 405d v1.20.12
node05 Ready <none> 388d v1.21.14
node06 Ready control-plane,master 410d v1.21.14
开始
- 创建ns
kc create ns mysql
- 创建sc
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: mysql-local-storage
provisioner: kubernetesio/no-provisioner
volumeBindingMode: WaitForFirstConsumer
- 创建pv和pvc
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-pv-1
spec:
capacity:
storage: 15Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: mysql-local-storage
local:
path: /home/k8s-mysql
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node02
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-pv-2
spec:
capacity:
storage: 15Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: mysql-local-storage
local:
path: /home/k8s-mysql
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node05
- 创建cm
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql
namespace: mysql
labels:
app: mysql
data:
master-01.cnf: |
# Master配置
[mysqld]
log-bin=mysqllog
skip-name-resolve
replicate-ignore-db=mysql
replicate-ignore-db = information_schema
replicate-ignore-db = performance_schema
replicate-ignore-db = sys
auto_increment_increment=2
auto_increment_offset=1
master-02.cnf: |
# Master配置
[mysqld]
log-bin=mysqllog
skip-name-resolve
replicate-ignore-db=mysql
replicate-ignore-db = information_schema
replicate-ignore-db = performance_schema
replicate-ignore-db = sys
auto_increment_increment=2
auto_increment_offset=2
slave.cnf: |
# Slave配置
[mysqld]
super-read-only
skip-name-resolve
log-bin=mysql-bin
replicate-ignore-db=mysql
replicate-ignore-db = information_schema
replicate-ignore-db = performance_schema
replicate-ignore-db = sys
- 创建secret
apiVersion: v1
kind: Secret
metadata:
name: mysql-secret
namespace: mysql
labels:
app: mysql
type: Opaque
data:
password: MTIzNDU2 # echo -n "123456" | base64
- 创建svc
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: mysql
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
selector:
app: mysql
- 创建sts
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: mysql
labels:
app: mysql
spec:
selector:
matchLabels:
app: mysql
serviceName: mysql
replicas: 2
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: init-mysql
image: wenyangchou/centos:7-mysql
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
command:
- bash
- "-c"
- |
set -ex
# 从Pod的序号,生成server-id
[[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
# 由于server-id不能为0,因此给ID加100来避开它
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
# 如果Pod的序号为0,说明它是Master节点,从ConfigMap里把Master的配置文件拷贝到/mnt/conf.d目录下
# 否则,拷贝ConfigMap里的Slave的配置文件
if [[ ${ordinal} -eq 0 ]]; then
cp /mnt/config-map/master-01.cnf /mnt/conf.d
elif [[ ${ordinal} -eq 1 ]]; then
cp /mnt/config-map/master-02.cnf /mnt/conf.d
else
cp /mnt/config-map/slave.cnf /mnt/conf.d
fi
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
- name: config-map
mountPath: /mnt/config-map
containers:
- name: mysql
image: wenyangchou/mysql:5.7
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
ports:
- name: mysql
containerPort: 3306
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 500m
memory: 1Gi
livenessProbe:
exec:
command: ["mysqladmin", "ping", "-uroot", "-p${MYSQL_ROOT_PASSWORD}"]
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
command: ["mysqladmin", "ping", "-uroot", "-p${MYSQL_ROOT_PASSWORD}"]
initialDelaySeconds: 5
periodSeconds: 2
timeoutSeconds: 1
- name: sync
image: wenyangchou/centos:7-mysql
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
# command: [ "sh", "-c" ]
# args: [ "while true; do sleep 3600; done" ]
command:
- bash
- "-c"
- |
set -ex
[[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
if [[ $ordinal -eq 0 ]]; then
MASTER_HOST=mysql-1.mysql.mysql
else
MASTER_HOST=mysql-0.mysql.mysql
fi
cd /var/lib/mysql
echo "Waiting for mysqld to be ready(accepting connections)"
until mysql -uroot -p${MYSQL_ROOT_PASSWORD} -h127.0.0.1 -e "SELECT 1"; do sleep 1; done
until mysql -uroot -p${MYSQL_ROOT_PASSWORD} -h${MASTER_HOST} -e "SELECT 1"; do sleep 1; done
result=$(mysql -uroot -p${MYSQL_ROOT_PASSWORD} -h127.0.0.1 -e "SHOW SLAVE STATUS\G")
if [[ $result == *"Slave_IO_Running: Yes"* && $result == *"Slave_SQL_Running: Yes"* ]]; then
echo "recover START SLAVE "
else
mysql -uroot -p${MYSQL_ROOT_PASSWORD} -h127.0.0.1 -e "CHANGE MASTER TO MASTER_HOST='${MASTER_HOST}',MASTER_USER='root',MASTER_PASSWORD='${MYSQL_ROOT_PASSWORD}',MASTER_CONNECT_RETRY=10;START SLAVE;"
fi
# 这个是为了hold这个容器的进程,如果不hold住的话,pod会重启
# 当然这里可以使用更优雅的方式,比如TTL等,这里不做探讨
while true; do sleep 3600; done
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
volumes:
- name: conf
emptyDir: {}
- name: config-map
configMap:
name: mysql
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: mysql-local-storage
resources:
requests:
storage: 3Gi
说明:基于上述的sts,可以看到,这边是通过containers中的第二个容器进行监听mysql进程。当发现mysql进程ready后,我们执行start slaver。当副本为2时,为主主模式,当副本为3或以上时,为主-主-多从模式。
如果是主-主-多从模式(读写分离),主节点不能通过该svc访问,以上配置不是最优方案。此时需要再构建一个sts和svc,
svc例如:
apiVersion: v1
kind: Service
metadata:
name: mysql-read
namespace: mysql
labels:
app: mysql-read
spec:
ports:
- name: mysql
port: 3306
selector:
app: mysql
同时存在读、写两个svc以及sts,构成读写分离,该配置较为简单。本文主要解决k8s的主-主模式,不做赘述,自行思考。
注意: 上述使用的镜像是基于官方mysql镜像修改。官方镜像没有hostname这个命令,上述个人镜像已经做过镜像修改,可以拉取或自行构建镜像。
验证
- 首先进入第一个mysql副本
kc exec -it mysql-0 bash -n mysql -c mysql
mysql -uroot -p123456 -h127.0.0.1
create database test;
- 进入第二个mysql副本
kc exec -it mysql-1 bash -n mysql -c mysql
mysql -uroot -p123456 -h127.0.0.1
show databases;
显示结果如下
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
| test |
+--------------------+
其他验证类似,这里不做补充,自行尝试。测试内容包括如下
- 0节点创建数据库,观察1节点同步状态
- 1节点创建数据库,观察0节点同步状态
- 重启0节点,在1节点创建数据,观察0节点同步状态
- 重启0节点,在0节点创建数据,观察1节点同步状态
- 重启1节点,在0节点创建数据,观察1节点同步状态
- 重启1节点,在1节点创建数据,观察0节点同步状态