文档课题:MySQL MHA配置与故障切换演练——MySQL 5.7.21
系统:rhel 7.9 64位
数据库:MySQL 5.7.21
MHA安装包:mha4mysql-node-0.58.tar.gz、mha4mysql-manager-0.58.tar.gz
一、MySQL MHA简介
1、MHA的概述
MHA(MasterHigh Availability)是一套优秀的MySQL高可用环境下故障切换和主从复制的软件.MHA的出现解决MySQL单点问题.MySQL故障切换过程中MHA能做到0-30秒内自动完成故障切换操作.MHA能在故障切换的过程中最大程度上保证数据的一致性,以达到真正意义上的高可用.

2、MHA的组成
MHA Node(数据节点)   ——MHA Node运行在每台MySQL服务器上
MHA Manager(管理节点)——MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上

说明:MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master.整个故障转移过程对应用程序完全透明.

3、MHA的特点
自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证不丢失数据.使用半同步复制可以大大降低数据丢失的风险,如果只有一个slave已经收到最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性.目前MHA支持一主多从架构,最少三台服务,即一主两从.

4、MHA 工作原理
a、从宕机崩溃的master保存二进制日志事件(binlog events);
b、识别含有最新更新的slave;
c、应用差异的中继日志(relay log)到其他的slave;
d、应用从master保存的二进制日志事件(binlog events);
e、提升一个slave为新的master;
f、使其他的slave连接新的master进行复制.

二、搭建 MySQL MHA
1、整体思路
a、数据库安装
一主两从、MHA搭建
b、故障模拟
模拟主库失效、备选主库成为主库、原故障主库恢复重新加入到MHA成为从库.

2、搭建环境
按如下信息搭建环境:

角色

主机名

系统

IP地址

组件

Master节点

MHA-Master

rhel 7.9 64

192.168.133.71

mysql 5.7.21、MHA-Node组件

Slave1节点

MHA-Slave1

rhel 7.9 64

192.168.133.72

mysql 5.7.21、MHA-Node组件

Slave2节点

MHA-Slave2

rhel 7.9 64

192.168.133.73

mysql 5.7.21、MHA-Node组件

MHA-Manager节点

MHA-Manager

rhel 7.9 64

192.168.133.70

MHA-Node组件、MHA-Manager组件

2.1、关闭防火墙
说明:所有节点均操作
[root@mha-manager ~]# systemctl stop firewalld
[root@mha-manager ~]# systemctl disable firewalld

2.2、禁用selinux
说明:所有节点均操作
[root@mha-manager ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
[root@mha-manager ~]# init 6

2.3、安装MySQL 5.7.21
重启系统后在MHA-Master、MHA-Slave1、MHA-Slave2上安装mysql 5.7.21
确认MHA-Master、MHA-Slave1、MHA-Slave2的配置文件/home/mysql/etc/my.cnf
[mysql@leo-master etc]$ cat /home/mysql/etc/my.cnf
......
server_id=1
......

说明:主要确认三个节点的server_id值不同.

分别在MHA-Master、MHA-Slave1、MHA-Slave2节点创建两个软连接
[root@mha-master ~]# ln -s /usr/local/mysql/bin/mysql /usr/sbin/
[root@mha-master ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin/

[root@mha-slave1 ~]# ln -s /usr/local/mysql/bin/mysql /usr/sbin/
[root@mha-slave1 ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin/

[root@mha-slave2 ~]# ln -s /usr/local/mysql/bin/mysql /usr/sbin/
[root@mha-slave2 ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/sbin/

3、配置一主两从
3.1、授权
说明:所有数据库节点进行用户授权.
MHA-Master节点
[mysql@mha-master ~]$ mysql -uroot -p
mysql> grant replication slave on *.* to 'repl'@'192.168.%' identified by 'repl@123';
mysql> grant all privileges on *.* to 'mha'@'192.168.%' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mha-master' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mha-slave1' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mha-slave2' identified by 'manager';
mysql> flush privileges;

MHA-Slave1节点
[mysql@leo-slave1 ~]$ mysql -uroot -p
mysql> grant replication slave on *.* to 'repl'@'192.168.%' identified by 'repl@123';
mysql> grant all privileges on *.* to 'mha'@'192.168.%' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mha-master' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mha-slave1' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mha-slave2' identified by 'manager';
mysql> flush privileges;

MHA-Slave2节点
[mysql@leo-slave2 ~]$ mysql -uroot -p
mysql> grant replication slave on *.* to 'repl'@'192.168.%' identified by 'repl@123';
mysql> grant all privileges on *.* to 'mha'@'192.168.%' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mhamaster' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mhaslave1' identified by 'manager';
mysql> grant all privileges on *.* to 'mha'@'mhaslave2' identified by 'manager';
mysql> flush privileges;

3.2、确认点位
在MHA-Master节点查看二进制文件和同步点.
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 |     2442 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

3.3、从库开启同步
在mha-slave1、mha-slave2节点执行同步操作.
mysql> change master to master_host='192.168.133.71',master_user='repl',master_password='repl@123',master_log_file='mysql-bin.000002',master_log_pos=2442;
mysql> start slave;
mysql> set global relay_log_purge=0;

3.4、同步确认
mha-slave1、mha-slave2节点确认数据同步结果.
mysql> show slave status \G

#确保Slave_IO_Running和Slave_SQL_Running都是Yes,代表同步正常.
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

3.5、设置只读模式
mha-slave1、mha-slave2节点设置只读模式
mysql> set global super_read_only=off;
mysql> set global read_only=on;

3.6、同步验证
在mha-master节点创建数据库并insert数据.
mysql> create database fruitsDB;
Query OK, 1 row affected (0.00 sec)

mysql> use fruitsDB;
Database changed
mysql> create table fruits
    -> (
    -> f_id char(10) not null,
    -> s_id int not null,
    -> f_name char(255) not null,
    -> f_price decimal(8,2) not null,
    -> primary key(f_id)
    -> ) ENGINE=innodb default charset=utf8 auto_increment=1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into fruits (f_id,s_id,f_name,f_price)
    -> values('a1',101,'apple',5.2),
    -> ('b1',101,'blackberry',10.2),
    -> ('bs1',102,'orange',11.2),
    -> ('bs2',105,'melon',8.2),
    -> ('t1',102,'banana',10.3),
    -> ('t2',102,'grape',5.3),
    -> ('o2',103,'coconut',9.2),
    -> ('c0',101,'cherry',3.2),
    -> ('a2',103,'apricot',2.2),
    -> ('l2',104,'lemon',6.4),
    -> ('b2',104,'berry',7.6),
    -> ('m1',106,'mango',15.7),
    -> ('m2',105,'xbabay',2.6),
    -> ('t4',107,'xbababa',3.6),
    -> ('m3',105,'xxtt',11.6),
    -> ('b5',107,'xxxx',3.6);
Query OK, 16 rows affected (0.07 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql> select * from fruits;
+------+------+------------+---------+
| f_id | s_id | f_name     | f_price |
+------+------+------------+---------+
| a1   |  101 | apple      |    5.20 |
| a2   |  103 | apricot    |    2.20 |
| b1   |  101 | blackberry |   10.20 |
| b2   |  104 | berry      |    7.60 |
| b5   |  107 | xxxx       |    3.60 |
| bs1  |  102 | orange     |   11.20 |
| bs2  |  105 | melon      |    8.20 |
| c0   |  101 | cherry     |    3.20 |
| l2   |  104 | lemon      |    6.40 |
| m1   |  106 | mango      |   15.70 |
| m2   |  105 | xbabay     |    2.60 |
| m3   |  105 | xxtt       |   11.60 |
| o2   |  103 | coconut    |    9.20 |
| t1   |  102 | banana     |   10.30 |
| t2   |  102 | grape      |    5.30 |
| t4   |  107 | xbababa    |    3.60 |
+------+------+------------+---------+
16 rows in set (0.00 sec)

mha-slave1节点查看
mysql> use fruitsDB;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from fruits;
+------+------+------------+---------+
| f_id | s_id | f_name     | f_price |
+------+------+------------+---------+
| a1   |  101 | apple      |    5.20 |
| a2   |  103 | apricot    |    2.20 |
| b1   |  101 | blackberry |   10.20 |
| b2   |  104 | berry      |    7.60 |
| b5   |  107 | xxxx       |    3.60 |
| bs1  |  102 | orange     |   11.20 |
| bs2  |  105 | melon      |    8.20 |
| c0   |  101 | cherry     |    3.20 |
| l2   |  104 | lemon      |    6.40 |
| m1   |  106 | mango      |   15.70 |
| m2   |  105 | xbabay     |    2.60 |
| m3   |  105 | xxtt       |   11.60 |
| o2   |  103 | coconut    |    9.20 |
| t1   |  102 | banana     |   10.30 |
| t2   |  102 | grape      |    5.30 |
| t4   |  107 | xbababa    |    3.60 |
+------+------+------------+---------+
16 rows in set (0.00 sec)

mha-slave2节点查看
mysql> use fruitsDB
mysql> select count(*) from fruits;
+----------+
| count(*) |
+----------+
|       16 |
+----------+
1 row in set (0.00 sec)

4、安装MHA软件
4.1、安装依赖包
在所有服务器上都安装MHA依赖的环境,首先安装epel源
# wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
# yum makecache
# yum install epel-release --nogpgcheck -y

说明:安装epel环境也可使用如下指令.
# wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# rpm -ivh epel-release-latest-7.noarch.rpm -y

[root@mha-manager ~]# cd /etc/yum.repos.d
[root@mha-manager yum.repos.d]# mkdir bak
[root@mha-manager yum.repos.d]# mv *.repo bak
[root@mha-manager yum.repos.d]# mv epel.repo.rpmnew bak
sftp> lcd F:\package
sftp> pwd
/root
sftp> put perl-Config-Tiny-2.14-7.el7.noarch.rpm
sftp> put perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm

sftp> lcd F:\package\mha-pack
sftp> put perl-Email-Date-Format-1.002-15.el7.noarch.rpm
sftp> put perl-Mail-Sender-0.8.23-1.el7.noarch.rpm
sftp> put perl-MIME-Types-1.38-2.el7.noarch.rpm
sftp> put perl-MIME-Lite-3.030-1.el7.noarch.rpm
sftp> put perl-Sys-Syslog-0.33-3.el7.x86_64.rpm
sftp> put perl-Params-Validate-1.08-4.el7.x86_64.rpm
sftp> put perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
sftp> put perl-MailTools-2.12-2.el7.noarch.rpm
sftp> put perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm
sftp> put perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
[root@mha-manager ~]# mount /dev/sr0 /mnt
[root@mha-manager ~]# cat <<EOF>>/etc/yum.repos.d/local.repo
[local]
name=local
baseurl=file:///mnt
gpgcheck=0
enabled=1
EOF

[root@mha-manager ~]# yum makecache
[root@mha-manager ~]# yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-IO-Socket-SSL \
perl-TimeDate \
perl-Net-SMTP-SSL \
perl-Class-Load \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN

[root@mha-manager ~]# yum -y install perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm
[root@mha-manager ~]# rpm -ivh perl-Config-Tiny-2.14-7.el7.noarch.rpm
[root@mha-manager ~]# rpm -ivh perl-Email-Date-Format-1.002-15.el7.noarch.rpm
[root@mha-manager ~]# rpm -ivh perl-Mail-Sender-0.8.23-1.el7.noarch.rpm
[root@mha-manager ~]# rpm -ivh perl-MIME-Types-1.38-2.el7.noarch.rpm
[root@mha-manager ~]# rpm -ivh perl-MIME-Lite-3.030-1.el7.noarch.rpm
[root@mha-manager ~]# rpm -ivh perl-Sys-Syslog-0.33-3.el7.x86_64.rpm
[root@mha-manager ~]# yum -y install perl-Params-Validate-1.08-4.el7.x86_64.rpm
[root@mha-manager ~]# rpm -ivh perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
[root@mha-manager ~]# rpm -ivh perl-MailTools-2.12-2.el7.noarch.rpm
[root@mha-manager ~]# rpm -ivh perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm

[root@mha-manager yum.repos.d]# rpm -q perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN | grep "not installed"

4.2、安装MHA软件包
说明:对于每个操作系统版本不一样,此处rhel 7.9选择0.58版本.在所有服务器上必须先安装node组件,最后在MHA-Manager节点安装manager组件,因为manager依赖node组件.
sftp> lcd F:\package\mha-pack
sftp> pwd
/root
sftp> put mha4mysql-node-0.58.tar.gz
sftp> put mha4mysql-manager-0.58.tar.gz

[root@mha-manager ~]# mkdir /opt
[root@mha-manager ~]# mv mha4mysql-node-0.58.tar.gz mha4mysql-manager-0.58.tar.gz /opt
[root@mha-manager ~]# cd /opt
[root@mha-manager opt]# tar -zxf mha4mysql-node-0.58.tar.gz
[root@mha-manager opt]# cd mha4mysql-node-0.58
[root@mha-manager mha4mysql-node-0.58]# perl Makefile.PL
[root@mha-manager mha4mysql-node-0.58]# make && make install

4.3、安装manager组件
注意:该组件仅在MHA manager节点上安装.
[root@mha-manager ~]# cd /opt
[root@mha-manager opt]# tar -zxf mha4mysql-manager-0.58.tar.gz
[root@mha-manager opt]# cd mha4mysql-manager-0.58
[root@mha-manager mha4mysql-manager-0.58]# perl Makefile.PL
[root@mha-manager mha4mysql-manager-0.58]# make && make install

说明:manager 组件安装后在/usr/local/bin 下面会生成几个工具,主要包括以下:

[root@mha-manager mha4mysql-manager-0.58]# cd /usr/local/bin
[root@mha-manager bin]# ls -tlr
total 88
-r-xr-xr-x 1 root root  7525 Jun 29 15:43 save_binary_logs
-r-xr-xr-x 1 root root  8337 Jun 29 15:43 purge_relay_logs
-r-xr-xr-x 1 root root  4807 Jun 29 15:43 filter_mysqlbinlog
-r-xr-xr-x 1 root root 17639 Jun 29 15:43 apply_diff_relay_logs
-r-xr-xr-x 1 root root  1739 Jun 29 16:09 masterha_stop
-r-xr-xr-x 1 root root  5172 Jun 29 16:09 masterha_secondary_check
-r-xr-xr-x 1 root root  2373 Jun 29 16:09 masterha_master_switch
-r-xr-xr-x 1 root root  2165 Jun 29 16:09 masterha_master_monitor
-r-xr-xr-x 1 root root  2517 Jun 29 16:09 masterha_manager
-r-xr-xr-x 1 root root  3201 Jun 29 16:09 masterha_conf_host
-r-xr-xr-x 1 root root  1865 Jun 29 16:09 masterha_check_status
-r-xr-xr-x 1 root root  1779 Jun 29 16:09 masterha_check_ssh
-r-xr-xr-x 1 root root  1995 Jun 29 16:09 masterha_check_repl

masterha_check_ssh      检查MHA的SSH配置状况
masterha_check_repl     检查MySQL复制状况
masterha_manger         启动manager的脚本
masterha_check_status   检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch  控制故障转移(自动或者手动)
masterha_conf_host      添加或删除配置的server信息
masterha_stop           关闭manager

node组件安装后也会在/usr/local/bin下面会生成几个脚本(这些工具通常由MHAManager的脚本触发,无需人为操作)主要如下:
save_binary_logs        保存和复制master的二进制日志
apply_diff_relay_logs   识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog      去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs        清除中继日志(不会阻塞SQL线程)

5、配置互信
说明:在所有服务器上配置无密码认证.
5.1、mha-master到mha-slave1和mha-slave2和mha-manager
在mha-master上配置到数据库节点mha-slave1和mha-slave2的无密码认证.
[root@mha-master ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):    --直接回车
Enter passphrase (empty for no passphrase):                 --直接回车
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Uj87SgQG+iPafSdQj8Eneenc7xb50yQ4XucIy9GdQ5w root@mha-master
The key's randomart image is:
+---[RSA 2048]----+
|    .            |
|   . o . .       |
|  .   O =     . .|
|   . o % o     E |
|  . + o S +  +...|
| o o o o   +B +o+|
|. . . o o oo.O B.|
|     . + . o= + o|
|        .  ..  . |
+----[SHA256]-----+

[root@mha-master ~]# ssh-copy-id 192.168.133.70
[root@mha-master ~]# ssh-copy-id 192.168.133.71
[root@mha-master ~]# ssh-copy-id 192.168.133.72
[root@mha-master ~]# ssh-copy-id 192.168.133.73

互信测试
[root@mha-master ~]# ssh root@192.168.133.72
Last login: Sat Jun 29 10:36:47 2024 from 192.168.133.1
[root@mha-slave1 ~]# exit
logout
Connection to 192.168.133.72 closed.
[root@mha-master ~]# ssh root@192.168.133.73
Last login: Sat Jun 29 08:40:55 2024 from 192.168.133.1
[root@mha-master ~]# ssh root@192.168.133.70
Last login: Sun Jun 30 10:38:13 2024 from 192.168.133.73

5.2、mha-slave1到mha-master和mha-slave2和mha-manager
说明:在mha-slave1上配置到数据库节点mha-master和mha-slave2和mha-manager的无密码认证
[root@mha-slave1 ~]# ssh-keygen -t rsa
[root@mha-slave1 ~]# ssh-copy-id 192.168.133.70
[root@mha-slave1 ~]# ssh-copy-id 192.168.133.71
[root@mha-slave1 ~]# ssh-copy-id 192.168.133.72
[root@mha-slave1 ~]# ssh-copy-id 192.168.133.73
互信测试
[root@mha-slave1 ~]# ssh root@192.168.133.71
Last login: Sat Jun 29 17:03:36 2024 from 192.168.133.1
[root@mha-master ~]# exit
logout
Connection to 192.168.133.71 closed.
[root@mha-slave1 ~]# ssh root@192.168.133.73
Last login: Sat Jun 29 17:09:04 2024 from 192.168.133.71
[root@mha-slave2 ~]# exit
logout
[root@mha-slave1 ~]# ssh root@192.168.133.70
Last login: Sun Jun 30 10:36:56 2024 from 192.168.133.71
[root@mha-manager ~]# exit
logout

5.3、mha-slave2到mha-master和mha-slave1以及mha-manager
在mha-slave2上配置到数据库节点mha-master和mha-slave1的无密码认证
[root@mha-slave2 ~]# ssh-keygen -t rsa
[root@mha-slave2 ~]# ssh-copy-id 192.168.133.70
[root@mha-slave2 ~]# ssh-copy-id 192.168.133.71
[root@mha-slave2 ~]# ssh-copy-id 192.168.133.72
[root@mha-slave2 ~]# ssh-copy-id 192.168.133.73

互信测试
[root@mha-slave2 ~]# ssh root@192.168.133.70
Last login: Sun Jun 30 10:37:59 2024 from 192.168.133.73
[root@mha-manager ~]# exit
logout
[root@mha-slave2 ~]# ssh root@192.168.133.71
Last login: Sat Jun 29 17:13:03 2024 from 192.168.133.72
[root@mha-master ~]# exit
logout
Connection to 192.168.133.71 closed.
[root@mha-slave2 ~]# ssh root@192.168.133.72
Last login: Sat Jun 29 17:07:45 2024 from 192.168.133.71
[root@mha-slave1 ~]#

5.4、mha-manager到mha-master和mha-slave1&slave2
说明:在mha-manager节点上配置到所有数据库节点的无密码认证
[root@mha-manager ~]# ssh-keygen -t rsa
[root@mha-manager ~]# ssh-copy-id 192.168.133.70
[root@mha-manager ~]# ssh-copy-id 192.168.133.71
[root@mha-manager ~]# ssh-copy-id 192.168.133.72
[root@mha-manager ~]# ssh-copy-id 192.168.133.73

互信测试
[root@mha-manager ~]# ssh root@192.168.133.71
Last login: Sat Jun 29 17:19:45 2024 from 192.168.133.73
[root@mha-master ~]# exit
logout
Connection to 192.168.133.71 closed.
[root@mha-manager ~]# ssh root@192.168.133.72
Last login: Sat Jun 29 17:19:50 2024 from 192.168.133.73
[root@mha-slave1 ~]# exit
logout
Connection to 192.168.133.72 closed.
[root@mha-manager ~]# ssh root@192.168.133.73
Last login: Sat Jun 29 17:13:11 2024 from 192.168.133.72
[root@mha-slave2 ~]#

6、manager节点配置MHA
6.1、编辑master_ip_failover文件
说明:在manager节点上复制相关脚本到/usr/local/bin目录.
[root@mha-manager ~]# cp -rp /opt/mha4mysql-manager-0.58/samples/scripts /usr/local/bin
[root@mha-manager ~]# ls -ltr /usr/local/bin/scripts/
total 32
-rwxr-xr-x 1 liujun liujun  1360 Mar 23  2018 send_report
-rwxr-xr-x 1 liujun liujun 11867 Mar 23  2018 power_manager
-rwxr-xr-x 1 liujun liujun  9870 Mar 23  2018 master_ip_online_change
-rwxr-xr-x 1 liujun liujun  3648 Mar 23  2018 master_ip_failover

说明:
master_ip_failover  		#自动切换时vip的管理脚本
master_ip_online_change 	#在线切换时vip的管理脚本
power_manager 				#故障发生后关闭主机的脚本
send_report 				#因故障切换后发送报警的脚本

复制上述自动切换时vip的管理脚本到/usr/local/bin目录,此处使用master_ip_failover脚本来管理VIP和故障切换
[root@mha-manager scripts]# cp /usr/local/bin/scripts/master_ip_failover /usr/local/bin
[root@mha-manager ~]# cd /usr/local/bin
[root@mha-manager bin]# pwd
/usr/local/bin
[root@mha-manager bin]# cp master_ip_failover master_ip_failover.bak
[root@mha-manager bin]# vi master_ip_failover
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

my $vip = '192.168.133.75/24';
my $key = '88';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

说明:如上脚本仅对VIP进行过修改.

6.2、编辑app1.cnf文件
说明:创建MHA软件目录并拷贝配置文件,此处使用app1.cnf配置文件来管理mysql节点服务器.
[root@mha-manager ~]# mkdir -p /etc/masterha
[root@mha-manager ~]# cp /opt/mha4mysql-manager-0.58/samples/conf/app1.cnf /etc/masterha
[root@mha-manager ~]# cp /usr/local/bin/scripts/master_ip_online_change /usr/local/bin
[root@mha-manager ~]# cd /etc/masterha
[root@mha-manager masterha]# ls
app1.cnf
[root@mha-manager masterha]# cp app1.cnf app1.cnf.bak
[root@mha-manager masterha]# vi app1.cnf
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/mysql/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
user=mha
password=manager
ping_interval=1
remote_workdir=/tmp
repl_password=repl@123
repl_user=repl
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.133.72 -s 192.168.133.73
shutdown_script=""
ssh_user=root

[server1]
hostname=192.168.133.71
port=3306
 
[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.133.72
port=3306
 
[server3]
hostname=192.168.133.73
port=3306

[root@mha-manager bin]# mkdir -p /var/log/masterha/app1/

说明:
manager_workdir=/var/log/masterha/app1                         设置manager的工作目录
manager_log=/var/log/masterha/app1/manager.log                 设置manager的日志文件
master_binlog_dir=/mysql/binlog                                设置master保存binlog的位置,以便MHA可以找到master的日志
master_ip_failover_script=/usr/local/bin/master_ip_failover    设置自动failover时候的切换脚本
master_ip_online_change_script= /usr/local/bin/master_ip_online_change 设置手动切换时的切换脚本
user=root                                                      设置监控mysql的用户
password=dayi123                                               设置监控mysql的用户,需要授权能够在manager节点远程登录
ping_interval=1                                         设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
remote_workdir=/tmp                                            设置远端mysql在发生切换时binlog的保存位置
repl_user=repl                                                 设置mysql中用于复制的用户
repl_password=repl@123                                         设置mysql中用于复制的用户密码
report_script=/usr/local/send_report                           设置发生切换后发送的报警的脚本
shutdown_script=""                             设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)
ssh_user=root                                                  设置ssh的登录用户名
candidate_master=1                                             在节点下设置,设置当前节点为候选的master
slave check_repl_delay=0                                       在节点配置下设置,默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master;这个选项对于对于设置了candidate_master=1的主机非常有用

7、开启虚拟IP
说明:第一次配置需要在mha-master节点上手动开启虚拟ip.
[root@mha-master ~]# /sbin/ifconfig ens33:88 192.168.133.75/24
[root@mha-master ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.133.71  netmask 255.255.255.0  broadcast 192.168.133.255
        inet6 fe80::393:63fb:d67:6681  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:34:1b:67  txqueuelen 1000  (Ethernet)
        RX packets 348575  bytes 496406237 (473.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 39891  bytes 5098358 (4.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33:88: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.133.75  netmask 255.255.255.0  broadcast 192.168.133.255
        ether 00:50:56:34:1b:67  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 52  bytes 4416 (4.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 52  bytes 4416 (4.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:7e:02:d3  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

8、测试无密码认证
说明:在mha-manager节点测试ssh无密码认证,若输出successfully表示正常.
[root@mha-manager ~]# masterha_check_ssh -conf=/etc/masterha/app1.cnf
Sun Jun 30 10:38:20 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun 30 10:38:20 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jun 30 10:38:20 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jun 30 10:38:20 2024 - [info] Starting SSH connection tests..
Sun Jun 30 10:38:21 2024 - [debug] 
Sun Jun 30 10:38:20 2024 - [debug]  Connecting via SSH from root@192.168.133.70(192.168.133.70:22) to root@192.168.133.71(192.168.133.71:22)..
Sun Jun 30 10:38:20 2024 - [debug]   ok.
Sun Jun 30 10:38:20 2024 - [debug]  Connecting via SSH from root@192.168.133.70(192.168.133.70:22) to root@192.168.133.72(192.168.133.72:22)..
Sun Jun 30 10:38:21 2024 - [debug]   ok.
Sun Jun 30 10:38:22 2024 - [debug] 
Sun Jun 30 10:38:21 2024 - [debug]  Connecting via SSH from root@192.168.133.71(192.168.133.71:22) to root@192.168.133.70(192.168.133.70:22)..
Sun Jun 30 10:38:21 2024 - [debug]   ok.
Sun Jun 30 10:38:21 2024 - [debug]  Connecting via SSH from root@192.168.133.71(192.168.133.71:22) to root@192.168.133.72(192.168.133.72:22)..
Sun Jun 30 10:38:21 2024 - [debug]   ok.
Sun Jun 30 10:38:23 2024 - [debug] 
Sun Jun 30 10:38:21 2024 - [debug]  Connecting via SSH from root@192.168.133.72(192.168.133.72:22) to root@192.168.133.70(192.168.133.70:22)..
Sun Jun 30 10:38:21 2024 - [debug]   ok.
Sun Jun 30 10:38:21 2024 - [debug]  Connecting via SSH from root@192.168.133.72(192.168.133.72:22) to root@192.168.133.71(192.168.133.71:22)..
Sun Jun 30 10:38:22 2024 - [debug]   ok.
Sun Jun 30 10:38:23 2024 - [info] All SSH connection tests passed successfully.

9、测试主从连接情况
在mha-manager节点测试mysql主从连接情况
[root@mha-manager ~]# which masterha_check_repl
/usr/local/bin/masterha_check_repl
[root@mha-manager bin]# masterha_check_repl -conf=/etc/masterha/app1.cnf
Sun Jun 30 12:49:30 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun 30 12:49:30 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jun 30 12:49:30 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jun 30 12:49:30 2024 - [info] MHA::MasterMonitor version 0.58.
Sun Jun 30 12:49:31 2024 - [info] GTID failover mode = 0
Sun Jun 30 12:49:31 2024 - [info] Dead Servers:
Sun Jun 30 12:49:31 2024 - [info] Alive Servers:
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.72(192.168.133.72:3306)
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.73(192.168.133.73:3306)
Sun Jun 30 12:49:31 2024 - [info] Alive Slaves:
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.72(192.168.133.72:3306)  Version=5.7.21-20-log (oldest major version between slaves) log-bin:enabled
Sun Jun 30 12:49:31 2024 - [info]     Replicating from 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:49:31 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.73(192.168.133.73:3306)  Version=5.7.21-20-log (oldest major version between slaves) log-bin:enabled
Sun Jun 30 12:49:31 2024 - [info]     Replicating from 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:49:31 2024 - [info] Current Alive Master: 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:49:31 2024 - [info] Checking slave configurations..
Sun Jun 30 12:49:31 2024 - [info] Checking replication filtering settings..
Sun Jun 30 12:49:31 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Sun Jun 30 12:49:31 2024 - [info]  Replication filtering check ok.
Sun Jun 30 12:49:31 2024 - [info] GTID (with auto-pos) is not supported
Sun Jun 30 12:49:31 2024 - [info] Starting SSH connection tests..
Sun Jun 30 12:49:33 2024 - [info] All SSH connection tests passed successfully.
Sun Jun 30 12:49:33 2024 - [info] Checking MHA Node version..
Sun Jun 30 12:49:34 2024 - [info]  Version check ok.
Sun Jun 30 12:49:34 2024 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jun 30 12:49:34 2024 - [info] HealthCheck: SSH to 192.168.133.71 is reachable.
Sun Jun 30 12:49:34 2024 - [info] Master MHA Node version is 0.58.
Sun Jun 30 12:49:34 2024 - [info] Checking recovery script configurations on 192.168.133.71(192.168.133.71:3306)..
Sun Jun 30 12:49:34 2024 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000002 
Sun Jun 30 12:49:34 2024 - [info]   Connecting to root@192.168.133.71(192.168.133.71:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /mysql/binlog, up to mysql-bin.000002
Sun Jun 30 12:49:34 2024 - [info] Binlog setting check done.
Sun Jun 30 12:49:34 2024 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Jun 30 12:49:34 2024 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.133.72 --slave_ip=192.168.133.72 --slave_port=3306 --workdir=/tmp --target_version=5.7.21-20-log --manager_version=0.58 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
Sun Jun 30 12:49:34 2024 - [info]   Connecting to root@192.168.133.72(192.168.133.72:22).. 
  Checking slave recovery environment settings..
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to relay-bin.000002
    Temporary relay log file is /mysql/data/relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 30 12:49:35 2024 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.133.73 --slave_ip=192.168.133.73 --slave_port=3306 --workdir=/tmp --target_version=5.7.21-20-log --manager_version=0.58 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
Sun Jun 30 12:49:35 2024 - [info]   Connecting to root@192.168.133.73(192.168.133.73:22).. 
  Checking slave recovery environment settings..
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to relay-bin.000002
    Temporary relay log file is /mysql/data/relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 30 12:49:35 2024 - [info] Slaves settings check done.
Sun Jun 30 12:49:35 2024 - [info] 
192.168.133.71(192.168.133.71:3306) (current master)
 +--192.168.133.72(192.168.133.72:3306)
 +--192.168.133.73(192.168.133.73:3306)

Sun Jun 30 12:49:35 2024 - [info] Checking replication health on 192.168.133.72..
Sun Jun 30 12:49:35 2024 - [info]  ok.
Sun Jun 30 12:49:35 2024 - [info] Checking replication health on 192.168.133.73..
Sun Jun 30 12:49:35 2024 - [info]  ok.
Sun Jun 30 12:49:35 2024 - [info] Checking master_ip_failover_script status:
Sun Jun 30 12:49:35 2024 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.133.71 --orig_master_ip=192.168.133.71 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig ens33:88 down==/sbin/ifconfig ens33:88 192.168.133.75/24===

Checking the Status of the script.. OK 
Sun Jun 30 12:49:35 2024 - [info]  OK.
Sun Jun 30 12:49:35 2024 - [warning] shutdown_script is not defined.
Sun Jun 30 12:49:35 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

10、启动MHA
在manager节点上启动MHA.
[root@mha-manager ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

说明:
remove_dead_master_conf:该参数代表当发生主从切换后,老的主库的 ip 将会从配置文件中移除.
manager.log:识别差异的中继日志事件并将其差异的事件应用于其他的 slave
ignore_last_failover:在缺省情况下,如果 MHA 检测到连续发生宕机,且两次宕机间隔不足 8 小时的话,则不会进行 Failover, 之所以这样限制是为了避免 ping-pong 效应.该参数代表忽略上次 MHA 触发切换产生的文件,默认情况下,MHA 发生切换后会在日志记录,也就是上面设置的日志app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover.

11、检查MHA状态
说明:可以看到当前的master是mha-master节点,IP为192.168.133.71
[root@mha-manager ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:32233) is running(0:PING_OK), master:192.168.133.71
查看mha日志
[root@mha-manager ~]# cat /var/log/masterha/app1/manager.log | grep "current master"
Sun Jun 30 14:01:09 2024 - [info] Checking SSH publickey authentication settings on the current master..
192.168.133.71(192.168.133.71:3306) (current master)

三、模拟故障
1、日志监控
在manager节点打开动态监控观察日志记录
[root@mha-manager ~]# tail -500f /var/log/masterha/app1/manager.log

2、主库停止mysql服务
在mha-master节点停止mysql服务
[mysql@mha-master ~]$ ps -ef | grep mysql
mysql      2586      1  0 Jun29 ?        00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/home/mysql/etc/my.cnf
mysql      2831   2586  0 Jun29 ?        00:00:57 /usr/local/mysql/bin/mysqld --defaults-file=/home/mysql/etc/my.cnf --basedir=/usr/local/mysql --datadir=/mysql/data --plugin-dir=/usr/local/mysql/lib/mysql/plugin --log-error=/mysql/data/mysql_error.log --pid-file=/mysql/data/mysql.pid --socket=/mysql/data/mysql.sock --port=3306
root      23271  20031  0 14:10 pts/2    00:00:00 su - mysql
mysql     23272  23271  0 14:10 pts/2    00:00:00 -bash
mysql     23356  23272  0 14:11 pts/2    00:00:00 ps -ef
mysql     23357  23272  0 14:11 pts/2    00:00:00 grep --color=auto mysql

mysql> shutdown;
Query OK, 0 rows affected (0.00 sec)

[mysql@mha-master ~]$ ps -ef | grep mysql
root      23271  20031  0 14:10 pts/2    00:00:00 su - mysql
mysql     23272  23271  0 14:10 pts/2    00:00:00 -bash
mysql     23406  23272  0 14:11 pts/2    00:00:00 ps -ef
mysql     23407  23272  0 14:11 pts/2    00:00:00 grep --color=auto mysql
manager后台日志:
----- Failover Report -----

app1: MySQL Master failover 192.168.133.71(192.168.133.71:3306) to 192.168.133.72(192.168.133.72:3306) succeeded

Master 192.168.133.71(192.168.133.71:3306) is down!

Check MHA Manager logs at mha-manager:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.133.71(192.168.133.71:3306)
The latest slave 192.168.133.72(192.168.133.72:3306) has all relay logs for recovery.
Selected 192.168.133.72(192.168.133.72:3306) as a new master.
192.168.133.72(192.168.133.72:3306): OK: Applying all logs succeeded.
192.168.133.72(192.168.133.72:3306): OK: Activated master IP address.
192.168.133.73(192.168.133.73:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.133.73(192.168.133.73:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.133.72(192.168.133.72:3306)
192.168.133.72(192.168.133.72:3306): Resetting slave info succeeded.
Master failover to 192.168.133.72(192.168.133.72:3306) completed successfully.

3、相关状态
正常切换一次后,mha进程会退出,并且MHA会自动修改app1.cnf文件内容,将宕机的mha-master节点删除,查看mha-slave1是否接管VIP,如下所示mha-slave1成功接管VIP.
[root@mha-slave1 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.133.72  netmask 255.255.255.0  broadcast 192.168.133.255
        inet6 fe80::93bb:eb31:11db:a282  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:31:78:1a  txqueuelen 1000  (Ethernet)
        RX packets 338783  bytes 496285559 (473.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 37194  bytes 4951957 (4.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens33:88: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.133.75  netmask 255.255.255.0  broadcast 192.168.133.255
        ether 00:50:56:31:78:1a  txqueuelen 1000  (Ethernet)

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 233  bytes 35692 (34.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 233  bytes 35692 (34.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:7e:02:d3  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

并且原主库mha-master IP状态如下:
[mysql@mha-master ~]$ ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.133.71  netmask 255.255.255.0  broadcast 192.168.133.255
        inet6 fe80::db2d:d44b:2af8:51e5  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:2d:ed:da  txqueuelen 1000  (Ethernet)
        RX packets 342295  bytes 496541160 (473.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 37479  bytes 5001164 (4.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 796  bytes 92024 (89.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 796  bytes 92024 (89.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:7e:02:d3  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

说明:VIP已漂移到mha-slave1节点.

当前app1.cnf文档内容如下:
[root@mha-manager masterha]# cat app1.cnf
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/mysql/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=manager
ping_interval=1
remote_workdir=/tmp
repl_password=repl@123
repl_user=repl
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.133.72 -s 192.168.133.73
shutdown_script=""
ssh_user=root
user=mha

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.133.72
port=3306

[server3]
hostname=192.168.133.73
port=3306

说明:如上所示,server1的内容确实已被删除.

--mha状态信息如下
[root@mha-manager masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

4、备选主的算法
a、一般从position/GTID判断从库优劣,若数据有差异,最接近于master的slave将成为备选主.
b、数据一致的情况下,按照配置文件顺序,选择备选主库.
c、设定有权重(candidate_master=1),按照权重强制指定备选主.
默认情况下如果一个slave落后master 100M的relay logs的话,即使有权重,也会失效.
如果check_repl_delay=0的话,即使落后很多日志,也强制选择其为备选主.

四、故障修复
1、修复mha-master
[mysql@mha-master ~]$ /usr/local/mysql/bin/mysqld_safe --defaults-file=/home/mysql/etc/my.cnf &
[mysql@mha-master ~]$ ps -ef | grep mysql
root      23271  20031  0 14:10 pts/2    00:00:00 su - mysql
mysql     23272  23271  0 14:10 pts/2    00:00:00 -bash
mysql     23504  23272  0 14:21 pts/2    00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/home/mysql/etc/my.cnf
mysql     23748  23504 11 14:21 pts/2    00:00:01 /usr/local/mysql/bin/mysqld --defaults-file=/home/mysql/etc/my.cnf --basedir=/usr/local/mysql --datadir=/mysql/data --plugin-dir=/usr/local/mysql/lib/mysql/plugin --log-error=/mysql/data/mysql_error.log --pid-file=/mysql/data/mysql.pid --socket=/mysql/data/mysql.sock --port=3306
mysql     23777  23272  0 14:21 pts/2    00:00:00 ps -ef
mysql     23778  23272  0 14:21 pts/2    00:00:00 grep --color=auto mysql

2、修复主从
在现主库mha-slave1查看二进制文件和同步点
mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 |     1992 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

3、同步
在原主库mha-master执行同步操作.
mysql> change master to master_host='192.168.133.72',master_user='repl',master_password='repl@123',master_log_file='mysql-bin.000002',master_log_pos=1992;
Query OK, 0 rows affected, 2 warnings (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.133.72
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1992
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1992
              Relay_Log_Space: 521
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: 2ac7d8a6-35b6-11ef-9b6e-00505627516f
             Master_Info_File: /mysql/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

4、修改配置文件app1.cnf
在manager节点修改配置文件app1.cnf
[root@mha-manager masterha]# cat app1.cnf
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/mysql/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=manager
ping_interval=1
remote_workdir=/tmp
repl_password=repl@123
repl_user=repl
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 192.168.133.71 -s 192.168.133.73
shutdown_script=""
ssh_user=root
user=mha
[server1]
hostname=192.168.133.71
port=3306

[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.133.72
port=3306

[server3]
hostname=192.168.133.73
port=3306

5、重启mha
在manager节点重启mha
[root@mha-manager masterha]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

查看mha状态
[root@mha-manager masterha]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:33248) is running(0:PING_OK), master:192.168.133.72

关闭 mha manager
# masterha_stop --conf=/etc/masterha/app1.cnf

参考文档:
https://blog.csdn.net/m0_59439550/article/details/121106544
https://www.cnblogs.com/helontian/p/10084498.html
https://www.yisu.com/zixun/258871.html
https://www.cnblogs.com/yhq1314/p/10001694.html
https://developer.aliyun.com/article/1056138