keepalived+mysql双主原理:
(1)主库1上的keepalived启动之后,会检查mysql服务是否活着,如果活着,keepalived进入master状态,获得VIP;
(2)主库2上的keepalived启动之后,也会检查mysql是否活着,然后检查keepalived组内是否有master状态,如果有,则主库2上的keepalived进入backup状态,处于随时接管VIP状态;
(3)如果主库1上的mysql挂了,keepalived进入fault状态,释放VIP,主库2上的keepalived会变成master状态,获得VIP;
实验环境:
OS:CentOS release 6.6 (Final)
数据库:mysql 5.7.14
A: master :192.168.91.23
B: slave :192.168.91.22
VIP:192.168.91.100
操作系统时间一致更改:
date -s "20170227 16:25"
hwclock --systohc
主从参数:
A:
server_id = 330623
gtid_mode=ON
log_slave_updates = 0
enforce_gtid_consistency = ON
auto_increment_offset =1
auto_increment_increment =2
B:
server_id = 330622
gtid_mode=ON
log_slave_updates = 0
enforce_gtid_consistency = ON
auto_increment_offset=2
auto_increment_increment=2
配置AB互为主从:
A:
创建复制账户:
create user rep@'192.168.91.%' identified by '147258';
grant replication slave on *.* to rep@'192.168.91.%';
把A做个全备,还原到B上(这里省略不写)
B:添加A为B的主库:
change master to master_host='192.168.91.23', master_port=3306, master_user='rep',master_password='147258', master_auto_position=1;
start slave;
A:添加B为A的主库:
change master to master_host='192.168.91.22', master_port=3306, master_user='rep',master_password='147258', master_auto_position=1;
start slave;
创建一个监控账户:(后面checkMySQL.py 脚本会用到,用于检测mysql数据库状态,这个用户只要有usage权限即可)
GRANT REPLICATION CLIENT ON *.* TO 'monitor'@'%' IDENTIFIED BY 'm0n1tor';
A和B都要安装keepalived软件:
yum install keepalived -y
yum install MySQL-python -y
A的keepalived配置文件:
[root@Darren1 keepalived]#cat << EOF > keepalived.conf
vrrp_script vs_mysql_23 { #可以根据实际情况命名
script "/etc/keepalived/checkMySQL.py -h 192.168.91.23 -P 3306"
interval 60 #切换时间
}
vrrp_instance VI_23 { #可以根据实际情况命名
state BACKUP #刚开始时使其处于backup状态
nopreempt #设置为不抢占,m1挂了,m2接管VIP,m1重启不会自动抢回VIP
interface eth0 #VIP用的网卡
virtual_router_id 23 #路由id,范围是0-255,不能和路由器高可用的id一样,同一集群中该数值要相同
priority 100 #优先级,同一个vrrp_instance的MASTER优先级必须比BACKUP高。
advert_int 5
authentication {
auth_type PASS #认证加密
auth_pass 1111 # 认证密码,但密码不要超过 8 位
}
track_script {
vs_mysql_23 #调用这个脚本,返回0就持有VIP,返回1就释放VIP
}
virtual_ipaddress {
192.168.11.100 #VIP地址
}
}
EOF
B配置文件:
[root@Darren2 keepalived]# cat << EOF > keepalived.conf
vrrp_script vs_mysql_22 {
script "/etc/keepalived/checkMySQL.py -h 192.168.91.22 -P 3306" #此处和A不同,其他都相同
interval 60
}
vrrp_instance VI_22 {
state BACKUP
nopreempt
interface eth0
virtual_router_id 23
priority 90
advert_int 5
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
vs_mysql_22
}
virtual_ipaddress {
192.168.91.100
}
}
EOF
checkMySQL.py脚本作用(这里省略不写):
脚本的作用是判断mysql进程是否存在,如果存在返回0,如果不存在返回1;
A和B启用keepalived
/etc/init.d/keepalived start (开始开的时候,A和B谁先启动,VIP就先在谁上)
chkconfig –level 2345 keepalived on
keepalived启动过程:
此时A开启keepalived服务:
[root@Darren1 ~]# /etc/init.d/keepalived start
[root@Darren1 ~]# tail -f /var/log/messages
May 9 14:41:05 Darren1 Keepalived[28172]: Starting Keepalived v1.2.13 (03/19,2015)
May 9 14:41:05 Darren1 Keepalived[28173]: Starting Healthcheck child process, pid=28175
May 9 14:41:05 Darren1 Keepalived[28173]: Starting VRRP child process, pid=28176
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: Netlink reflector reports IP 192.168.91.23 added
May 9 14:41:05 Darren1 Keepalived_healthcheckers[28175]: Netlink reflector reports IP 192.168.91.23 added
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: Netlink reflector reports IP fe80::20c:29ff:fe56:5380 added
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: Registering Kernel netlink reflector
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: Registering Kernel netlink command channel
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: Registering gratuitous ARP shared channel
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: Opening file '/etc/keepalived/keepalived.conf'.
May 9 14:41:05 Darren1 Keepalived_healthcheckers[28175]: Netlink reflector reports IP fe80::20c:29ff:fe56:5380 added
May 9 14:41:05 Darren1 Keepalived_healthcheckers[28175]: Registering Kernel netlink reflector
May 9 14:41:05 Darren1 Keepalived_healthcheckers[28175]: Registering Kernel netlink command channel
May 9 14:41:05 Darren1 Keepalived_healthcheckers[28175]: Opening file '/etc/keepalived/keepalived.conf'.
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: Configuration is using : 62873 Bytes
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: Using LinkWatch kernel netlink reflector...
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) Entering BACKUP STATE
May 9 14:41:05 Darren1 Keepalived_healthcheckers[28175]: Configuration is using : 5173 Bytes
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
May 9 14:41:05 Darren1 Keepalived_healthcheckers[28175]: Using LinkWatch kernel netlink reflector...
May 9 14:41:05 Darren1 Keepalived_vrrp[28176]: VRRP_Script(vs_mysql_23) succeeded
May 9 14:41:21 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) Transition to MASTER STATE
May 9 14:41:26 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) Entering MASTER STATE
May 9 14:41:26 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) setting protocol VIPs.
May 9 14:41:26 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) Sending gratuitous ARPs on eth0 for 192.168.91.100
May 9 14:41:26 Darren1 Keepalived_healthcheckers[28175]: Netlink reflector reports IP 192.168.91.100 added
May 9 14:41:31 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) Sending gratuitous ARPs on eth0 for 192.168.91.100
总结启动过程:
(1)启动keepalived三个进程,分别是主进程,健康检查子进程,VRRP协议子进程;
(2)启动结束后,VRRP_Instance开始进入backup状态;
(3)进入backup成功后,VRRP_Instance转变状态为master,然后进入master状态;
(4)获取VIP,并且用ARP广播告诉其他服务器;
keepalived切换过程:
停止A的mysql服务:
[root@Darren1 ~]# /etc/init.d/mysqld stop
Shutting down MySQL............ SUCCESS!
此时A的变化:
[root@Darren1 ~]# tail -f /var/log/messages
May 9 14:43:25 Darren1 Keepalived_vrrp[28176]: VRRP_Script(vs_mysql_23) failed
May 9 14:43:26 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) Entering FAULT STATE
May 9 14:43:26 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) removing protocol VIPs.
May 9 14:43:26 Darren1 Keepalived_vrrp[28176]: VRRP_Instance(VI_23) Now in FAULT state
May 9 14:43:26 Darren1 Keepalived_healthcheckers[28175]: Netlink reflector reports IP 192.168.91.100 removed
总结A的变化:VRRP_Instance进入fault状态,释放VIP;
此时B的变化:
[root@Darren2 ~]# tail -f /var/log/messages
May 9 14:43:26 Darren2 Keepalived_vrrp[35138]: VRRP_Instance(VI_22) Transition to MASTER STATE
May 9 14:43:31 Darren2 Keepalived_vrrp[35138]: VRRP_Instance(VI_22) Entering MASTER STATE
May 9 14:43:31 Darren2 Keepalived_vrrp[35138]: VRRP_Instance(VI_22) setting protocol VIPs.
May 9 14:43:31 Darren2 Keepalived_vrrp[35138]: VRRP_Instance(VI_22) Sending gratuitous ARPs on eth0 for 192.168.91.100
May 9 14:43:31 Darren2 Keepalived_healthcheckers[35137]: Netlink reflector reports IP 192.168.91.100 added
May 9 14:43:36 Darren2 Keepalived_vrrp[35138]: VRRP_Instance(VI_22) Sending gratuitous ARPs on eth0 for 192.168.91.100
总结B的变化:VRRP_Instance进入master状态,获得VIP,ARP广播通知;
使用VIP登陆数据库:
创建登陆用户:
create user 'keepalived'@'%' identified by '147258';
grant all on *.* to 'keepalived'@'%';
此时VIP在B上:
[root@Darren2 ~]# ip addr |grep 192
inet 192.168.91.22/24 brd 192.168.91.255 scope global eth0
inet 192.168.91.100/32 scope global eth0
在A上用'keepalived'@'%'账户登陆,是可以登陆成功的,证明此VIP是有效的:
[root@Darren1 ~]# mysql -ukeepalived -p147258 -h192.168.91.100
keepalived@192.168.91.100 [(none)]>select user(),current_user();
+--------------------------+----------------+
| user() | current_user() |
+--------------------------+----------------+
| keepalived@192.168.91.23 | keepalived@% |
+--------------------------+----------------+
总结:
几种VIP切换情况:
(1)m1主机宕机,VIP会切换到m2;
(2)m1上的mysql挂了,VIP会切换到m2;
(3)m1上的keepalived服务挂了,又分为两种情况:
/etc/init.d/keepalived stop:正常切换
kill -9 keepalived_pid:因为keepalived是直接退出,m1和m2都有VIP,但是连接时候只有一个是生效的;
(4)脑裂的情况,m1和m2都各自认为自己是master状态,抢占VIP,VIP一会在m1上,一会在m2上;
在同一个交换机下不存在脑裂情况,这个在比较复杂的网络环境中会发生。
可以在脚本中防范:ping一下网关,如果连网关都ping不通,vrrp_script脚本就放回1,keepalived进入fault状态;
m1挂了,m2接管VIP,m1修复好了之后怎么办?
(1)如果是GTID复制直接把m1change master to m2上,如果是传统复制,需要找到binlog位置;
(2)等待m1同步完成;
(3)启动keepalived;
keepalived+mysql双主缺点和对应方法:
(1)数据库一致性难保障:
可以使用增强半同步,把主库等待从库回应的参数rpl_semi_sync_master_timeout 调大点,
出现master的日志没能实时的传到slave上,需要手工把binlog截取出来补到从库上;如果系统不存在了,可以通过binlog server 补日志;
(2)需要手动把出现故障的主库加入到原来的结构中;