主库宕机后系统发邮件提醒要再Manager所在的机器进行配置,本例在db03中
1上传发邮件脚本给脚本增加执行权限并测试发邮件
[root@db03 ~]# cd /usr/local/bin #上传发邮件脚本
[root@db03 bin]# chmod +x * #给脚本增加可执行权限
[root@db03 bin]# ll
总用量 92
-rwxr-xr-x. 1 root root 2226 3月 5 22:34 master_ip_failover
-rwxr-xr-x. 1 root root 35 12月 27 2017 send
-rwxr-xr-x. 1 root root 80213 9月 30 2009 sendEmail
-rwxr-xr-x. 1 root root 203 4月 19 2019 testpl
[root@db03 bin]# vim testpl #配置自己的发件箱smtp和收件箱
[root@db03 bin]# ./testpl #测试发邮件
发送邮件脚本下载
链接:https://pan.baidu.com/s/11hQa39LsRScn3O6xVPxPdQ
提取码:abcd
2将发邮件脚本路径写入到Manager配置文件中
[root@db03 bin]# vim /etc/mha/appl.cnf
[root@db03 bin]# cat /etc/mha/appl.cnf
[server default]
manager_log=/var/log/mha/appl/manager
manager_workdir=/var/log/mha/appl
master_binlog_dir=/data/binlog
password=mha
ping_interval=2
repl_password=123
repl_user=repl
ssh_user=root
user=mha
master_ip_failover_script=/usr/local/bin/master_ip_failover
report_script=/usr/local/bin/send
3重启MHA服务
[root@db03 bin]# masterha_stop --conf=/etc/mha/appl.cnf< /dev/null > /var/log/mha/appl/manager.log 2>&1nohup masterha_manager --conf=/etc/mha/appl.cnf --remove_dead_master_conf --ignore_last_failover </dev/null> /var/log/mha/appl/manager.log 2>&1 &
[1] 1245
4在db03中进行Node节点的ssh连接状态检查和复制环境检查和MHA状态检查并查看主机是哪台:
masterha_check_ssh --conf=/etc/mha/appl.cnf #检查互信状态
masterha_check_repl --conf=/etc/mha/appl.cnf #检查主从状态
[root@db03 tools]# masterha_check_status --conf=/etc/mha/appl.cnf #查看状态查看主库
5停主库测试邮件通知
[root@db02 ~]# /etc/init.d/mysqld stop
Shutting down MySQL............. SUCCESS!
[root@db02 ~]#
6检查db01后发现vip已经飘过来了,然后登录到数据库查看一下是否自动变为主库了:
[root@db01 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:2d:7c:19 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.201/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::b23b:2bba:e233:1170/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::3e78:8d00:2f7:d39f/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@db01 ~]# mysql -uroot -p123456
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 20
Server version: 5.7.26-log MySQL Community Server (GPL)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
db01 [(none)]>show slave status \G
Empty set (0.00 sec)
db01 [(none)]>
7检查另外一个从库db03是不是指针指向了新的主库:
[root@db03 bin]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 19
Server version: 5.7.26-log MySQL Community Server (GPL)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
db03 [(none)]>show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.201
Master_User: repl
Master_Port: 3306
8环境恢复(MHA是一次性的高可用,用过一次就不能再用了,因此需要修复)
如何排查和解决MHA的故障
8.1首先检查Manager所在的db03机器中的MHA 的运行状态,发现已经宕掉了:
[root@db03 bin]# masterha_check_status --conf=/etc/mha/appl.cnf
appl is stopped(2:NOT_RUNNING).
8.2判断哪台机器是主机并启动数据库:
(1)登录每天机器数据库,查看主从状态,没有状态的数据库指定是主库show slave status \G
(2)登录从库数据库看看指针指向那台机器,那么这台机器就是主库
(3)在manager机器中通过日志查看change master to执行语句(看最后一条):
[root@db03 bin]# grep -i 'change master to' /var/log/mha/appl/manager
Thu Mar 4 21:45:00 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER T MASTER_HOST='10.0.0.202', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Mar 9 20:18:39 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER T MASTER_HOST='10.0.0.201', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
[root@db03 bin]#
(4)启动宕掉的哪台机器数据库db02,并将其加入到一主两从这个环境里:
[root@db02 ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.26-log MySQL Community Server (GPL)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
db02 [(none)]>show slave status \G
Empty set (0.00 sec)
db02 [(none)]>CHANGE MASTER TO MASTER_HOST='10.0.0.201', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='123';
Query OK, 0 rows affected, 2 warnings (0.45 sec)
db02 [(none)]>start slave;
Query OK, 0 rows affected (0.10 sec)
db02 [(none)]>show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.201
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000011
Read_Master_Log_Pos: 234
Relay_Log_File: db02-relay-bin.000002
Relay_Log_Pos: 367
Relay_Master_Log_File: mysql-bin.000011
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
(5)检查vip是否已在主库中设置(未发现):
[root@db01 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:2d:7c:19 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.201/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::b23b:2bba:e233:1170/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::3e78:8d00:2f7:d39f/64 scope link noprefixroute
valid_lft forever preferred_lft forever
(6)到Manager机器中查看高可用脚本里的vip网段
[root@db03 ~]# cd /usr/local/bin/
[root@db03 bin]# vim master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '10.0.0.55/24'; #配置一个闲置的网段
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip"; #换成自己的网卡名称
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down"; #换成自己的网卡名称
my $ssh_Bcast_arp= "/sbin/arping -I ens33 -c 3 -A 10.0.0.55"; #换成自己的网卡名称
(7)把vip手动添加到主库中:
[root@db01 ~]# yum install -y net-tools
[root@db01 ~]# ifconfig eth0:1 10.0.0.55/24
[root@db01 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:2d:7c:19 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.201/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.55/24 brd 10.0.0.255 scope global secondary eth0:1
valid_lft forever preferred_lft forever
inet6 fe80::b23b:2bba:e233:1170/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::3e78:8d00:2f7:d39f/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@db01 ~]#
(8)检查Manager所在的机器db03中的配置文件,看看之前宕掉的故障机器的节点是否在,如果不在就加进来(不在):
[root@db03 bin]# cat /etc/mha/appl.cnf
[server default]
manager_log=/var/log/mha/appl/manager
manager_workdir=/var/log/mha/appl
master_binlog_dir=/data/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha
ping_interval=2
repl_password=123
repl_user=repl
report_script=/usr/local/bin/send
ssh_user=root
user=mha
[server1]
hostname=10.0.0.201
port=3306
[server3]
hostname=10.0.0.203
port=3306
[root@db03 bin]# vim /etc/mha/appl.cnf
[root@db03 bin]# cat /etc/mha/appl.cnf
[server default]
manager_log=/var/log/mha/appl/manager
manager_workdir=/var/log/mha/appl
master_binlog_dir=/data/binlog
master_ip_failover_script=/usr/local/bin/master_ip_failover
password=mha
ping_interval=2
repl_password=123
repl_user=repl
report_script=/usr/local/bin/send
ssh_user=root
user=mha
[server1]
hostname=10.0.0.201
port=3306
[server2]
hostname=10.0.0.202
port=3306
(9)在Manager所在的机器中进行启动预检查:
[root@db03 bin]# masterha_check_ssh --conf=/etc/mha/appl.cnf #检查互信
[root@db03 bin]# masterha_check_repl --conf=/etc/mha/appl.cnf #检查主从
(10)启动MHA并检查状态:
[root@db03 bin]# nohup masterha_manager --conf=/etc/mha/appl.cnf --remove_dead_master_conf --ignore_last_failover </dev/null> /var/log/mha/appl/manager.log 2>&1 &
[1] 3187
[root@db03 bin]# masterha_check_status --conf=/etc/mha/appl.cnf
appl (pid:3187) is running(0:PING_OK), master:10.0.0.201
[root@db03 bin]#