MySQL高可用架构之MHA

简介:

1、MHA简介

  1. MHA介绍

MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

  1. MHA组成

该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)

MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。

MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。另外对于想快速搭建的可以参考:MHA快速搭建

我们自己使用其实也可以使用1主1从,但是master主机宕机后无法切换,以及无法补全binlog。master的mysqld进程crash后,还是可以切换成功,以及补全binlog的。

  1. MHA工作原理

(1)从宕机崩溃的master保存二进制日志事件(binlog events);

(2)识别含有最新更新的slave;

(3)应用差异的中继日志(relay log)到其他的slave;

(4)应用从master保存的二进制日志事件(binlog events);

(5)提升一个slave为新的master;

(6)使其他的slave连接新的master进行复制;

1.4 MHA工具

MHA软件由两部分组成,Manager工具包和Node工具包

Manager 节点工具包

masterha_check_ssh:MHA 依赖的 ssh 环境监测工具

masterha_check_repl:MYSQL 主从复制环境检测工具(slave状态、文件配置)

masterga_manager:MHA 服务主程序

masterha_check_status:MHA 运行状态探测工具

masterha_master_monitor:MYSQL master 节点可用性监测工具

masterha_master_swith:master 节点切换工具(自动或者手动)

masterha_conf_host:添加或删除配置的节点

masterha_stop:关闭 MHA 服务的工具

Node 节点工具包

save_binary_logs:保存和复制 master 的二进制日志

apply_diff_relay_logs:识别差异的中继日志事件并应用于其他slave

purge_relay_logs:清除中继日志(不会阻塞 SQL 线程)

自定义扩展

secondary_check_script:通过多条网络路由检测 master 的可用性;

master_ip_failover_script:更新 application 使用的 masterip;

report_script:发送报告;

init_conf_load_script:加载初始配置参数;

master_ip_online_change_script;更新 master 节点 ip 地址

2、MHA搭建

2.1 环境信息

环境名称

主机名称

Mysql版本

IP

端口

OS 版本

主库

master

Mysql8.0.30

172.16.134.24

3310

Centos 7.2

从库(MHA Manager)

Slave01

Mysql8.0.30

172.16.134.25

3310

Centos 7.2

从库

Slave02

Mysql8.0.30

172.16.134.26

3310

Centos 7.2

2.2 基础环境搭建

MHA 基础环境主从搭建见复制文档

创建监控用户(master)

Create user 'root'@'172.16.134.%' identified  by '123456';

grant all privileges on *.* to 'root'@'172.16.134.%'

flush  privileges;

2.3 MHA安装

Manager 安装

yum install perl-DBD-MySQL

yum install perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes

rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

Node 安装

yum install perl-DBD-MySQL -y

rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

安装完成后/bin 下会生产对应的工具

-rwxr-xr-x 1 root root 1995 Mar 23  2018 masterha_check_repl

-rwxr-xr-x 1 root root 1779 Mar 23  2018 masterha_check_ssh

-rwxr-xr-x 1 root root 1865 Mar 23  2018 masterha_check_status

-rwxr-xr-x 1 root root 3201 Mar 23  2018 masterha_conf_host

-rwxr-xr-x 1 root root 2517 Mar 23  2018 masterha_manager

-rwxr-xr-x 1 root root 2165 Mar 23  2018 masterha_master_monitor

-rwxr-xr-x 1 root root 2373 Mar 23  2018 masterha_master_switch

-rwxr-xr-x 1 root root 5172 Mar 23  2018 masterha_secondary_check

-rwxr-xr-x 1 root root 1739 Mar 23  2018 masterha_stop


-rwxr-xr-x 1 root root 7525 Mar 23  2018 save_binary_logs

-rwxr-xr-x 1 root root 8337 Mar 23  2018 purge_relay_logs

-rwxr-xr-x 1 root root  4807 Mar 23  2018 filter_mysqlbinlog

-rwxr-xr-x 1 root root 17639 Mar 23  2018 apply_diff_relay_logs

2.4 配置互信

cd ~

ssh-keygen -t rsa #一路回车

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/authorized_keys

在24机器上执行以下命令

ssh 172.16.134.24 cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

ssh 172.16.134.25 cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

ssh 172.16.134.26 cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

scp ~/.ssh/authorized_keys 172.16.134.25:~/.ssh/

scp ~/.ssh/authorized_keys 172.16.134.26:~/.ssh/


通过ssh root@172.16.134.25 命令直接登录,不需要输入密码

2.5 配置MHA

1、创建MHA工作目录,并创建相关配置文件(软件包解压后目录有样例配置文件)

MHA Manage 节点上

#创建MHA配置文件目录

mkdir /etc/mha

# 创建MHA脚本目录

mkdir /etc/mha/scripts

# 创建MHA日志目录

mkdir /var/log/mha/

# 创建日志目录(app1是自定义的,因为manager可以同时管理多个集群,所以这里取了这个名)

mkdir -p /var/log/mha/app1

# 创建日志文件

touch /var/log/mha/app1/manager.log

cp /home/soft/mha4mysql-manager-0.58/samples/conf/* /etc/mha/

配置主从配置文件,candidate_master=1的意思是是否能提升为master,可以不加。

Vi /etc/mha/app1.cnf

[server default]

manager_workdir=/var/log/mha/app1

manager_log=/var/log/mha/app1/manager.log


[server1]

hostname=172.16.134.24

port=3310


[server2]

hostname=172.16.134.25

port=3310

candidate_master=1

check_repl_delay=0


[server3]

hostname=172.16.134.26

port=3310

no_master=1

配置主要的配置文件

[server default]

user=root

password=123456

repl_user=repl

repl_password=rep1123

ssh_user=root

ping_interval=1

master_binlog_dir= /home/mysql3310/binlog/

manager_workdir=/var/log/mha/app1

manager_log=/var/log/mha/app1/manager.log

master_ip_failover_script="/etc/mha/scripts/master_ip_failover"

master_ip_online_change_script="/etc/mha/scripts/master_ip_online_change"

report_script="/etc/mha/scripts/send_report"

remote_workdir=/tmp

secondary_check_script= /usr/local/bin/masterha_secondary_check -s 172.16.134.25 -s 172.16.134.26

shutdown_script=""

[server1]

hostname=172.16.134.24

port=3310


[server2]

hostname=172.16.134.25

port=3310

candidate_master=1

check_repl_delay=0


 [server3]

hostname=172.16.134.26

port=3310

no_master=1

脚本配置(vip)

#!/usr/bin/env perl


#  Copyright (C) 2011 DeNA Co.,Ltd.

## Note: This is a sample script and is not complete. Modify the script based on your environment.


use strict;

use warnings FATAL => 'all';


use Getopt::Long;

use MHA::DBHelper;


my (

  $command,        $ssh_user,         $orig_master_host,

  $orig_master_ip, $orig_master_port, $new_master_host,

  $new_master_ip,  $new_master_port,  $new_master_user,

  $new_master_password

);


my $vip = '10.89.181.9/25';

my $key = '0';

my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip netmask 255.255.255.128";

my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";


GetOptions(

  'command=s'             => \$command,

  'ssh_user=s'            => \$ssh_user,

  'orig_master_host=s'    => \$orig_master_host,

  'orig_master_ip=s'      => \$orig_master_ip,

  'orig_master_port=i'    => \$orig_master_port,

  'new_master_host=s'     => \$new_master_host,

  'new_master_ip=s'       => \$new_master_ip,

  'new_master_port=i'     => \$new_master_port,

  'new_master_user=s'     => \$new_master_user,

  'new_master_password=s' => \$new_master_password,

);


exit &main();


sub main {

  if ( $command eq "stop" || $command eq "stopssh" ) {


    my $exit_code = 1;

    eval {

      print "Disabling the VIP on old master: $orig_master_host \n";

&stop_vip();

      $exit_code = 0;

    };

    if ($@) {

      warn "Got Error: $@\n";

      exit $exit_code;

    }

    exit $exit_code;

  }

  elsif ( $command eq "start" ) {


    my $exit_code = 10;

    eval {

      print "Enabling the VIP - $vip on the new master - $new_master_host \n";

 &start_vip();

      $exit_code = 0;

    };

    if ($@) {

      warn $@;

      exit $exit_code;

    }

    exit $exit_code;

  }

  elsif ( $command eq "status" ) {

    print "Checking the Status of the script.. OK \n";

    exit 0;

  }

  else {

    &usage();

    exit 1;

  }

}


sub start_vip() {

    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;

}

sub stop_vip() {

     return 0  unless  ($ssh_user);

    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;

}


sub usage {

  print

"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=

port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";

}

2.6 信任检测

root@ebsproddb.ys:/etc/mha/scripts$ masterha_check_ssh --conf=/etc/mha/app1.cnf

Fri Mar 10 18:25:20 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Fri Mar 10 18:25:20 2023 - [info] Reading application default configuration from /etc/mha/app1.cnf..

Fri Mar 10 18:25:20 2023 - [info] Reading server configuration from /etc/mha/app1.cnf..

Fri Mar 10 18:25:20 2023 - [info] Starting SSH connection tests..

Fri Mar 10 18:25:21 2023 - [debug]

Fri Mar 10 18:25:20 2023 - [debug]  Connecting via SSH from root@172.16.134.24(172.16.134.24:22) to root@172.16.134.25(172.16.134.25:22)..

Fri Mar 10 18:25:20 2023 - [debug]   ok.

Fri Mar 10 18:25:20 2023 - [debug]  Connecting via SSH from root@172.16.134.24(172.16.134.24:22) to root@172.16.134.26(172.16.134.26:22)..

Fri Mar 10 18:25:21 2023 - [debug]   ok.

Fri Mar 10 18:25:22 2023 - [debug]

Fri Mar 10 18:25:21 2023 - [debug]  Connecting via SSH from root@172.16.134.25(172.16.134.25:22) to root@172.16.134.24(172.16.134.24:22)..

Fri Mar 10 18:25:21 2023 - [debug]   ok.

Fri Mar 10 18:25:21 2023 - [debug]  Connecting via SSH from root@172.16.134.25(172.16.134.25:22) to root@172.16.134.26(172.16.134.26:22)..

Fri Mar 10 18:25:21 2023 - [debug]   ok.

Fri Mar 10 18:25:23 2023 - [debug]

Fri Mar 10 18:25:21 2023 - [debug]  Connecting via SSH from root@172.16.134.26(172.16.134.26:22) to root@172.16.134.24(172.16.134.24:22)..

Fri Mar 10 18:25:21 2023 - [debug]   ok.

Fri Mar 10 18:25:21 2023 - [debug]  Connecting via SSH from root@172.16.134.26(172.16.134.26:22) to root@172.16.134.25(172.16.134.25:22)..

Fri Mar 10 18:25:22 2023 - [debug]   ok.

Fri Mar 10 18:25:23 2023 - [info] All SSH connection tests passed successfully

检查成功

2.7 检查主从复制是否成功

root@ebsproddb.ys:/etc/mha$ masterha_check_repl --conf=/etc/mha/masterha_default.cnf

Fri Mar 10 19:18:34 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Fri Mar 10 19:18:34 2023 - [info] Reading application default configuration from /etc/mha/masterha_default.cnf..

Fri Mar 10 19:18:34 2023 - [info] Reading server configuration from /etc/mha/masterha_default.cnf..

Fri Mar 10 19:18:34 2023 - [info] MHA::MasterMonitor version 0.58.

Fri Mar 10 19:18:35 2023 - [info] GTID failover mode = 1

Fri Mar 10 19:18:35 2023 - [info] Dead Servers:

Fri Mar 10 19:18:35 2023 - [info] Alive Servers:

Fri Mar 10 19:18:35 2023 - [info]   172.16.134.25(172.16.134.25:3310)

Fri Mar 10 19:18:35 2023 - [info]   172.16.134.26(172.16.134.26:3310)

Fri Mar 10 19:18:35 2023 - [info] Alive Slaves:

Fri Mar 10 19:18:35 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Fri Mar 10 19:18:35 2023 - [info]     GTID ON

Fri Mar 10 19:18:35 2023 - [info]     Replicating from 172.16.134.25(172.16.134.25:3310)

Fri Mar 10 19:18:35 2023 - [info] Current Alive Master: 172.16.134.25(172.16.134.25:3310)

Fri Mar 10 19:18:35 2023 - [info] Checking slave configurations..

Fri Mar 10 19:18:35 2023 - [info] Checking replication filtering settings..

Fri Mar 10 19:18:35 2023 - [info]  binlog_do_db= , binlog_ignore_db=

Fri Mar 10 19:18:35 2023 - [info]  Replication filtering check ok.

Fri Mar 10 19:18:35 2023 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.

Fri Mar 10 19:18:35 2023 - [info] Checking SSH publickey authentication settings on the current master..

Fri Mar 10 19:18:35 2023 - [info] HealthCheck: SSH to 172.16.134.25 is reachable.

Fri Mar 10 19:18:35 2023 - [info]

172.16.134.25(172.16.134.25:3310) (current master)

 +--172.16.134.26(172.16.134.26:3310)


Fri Mar 10 19:18:35 2023 - [info] Checking replication health on 172.16.134.26..

Fri Mar 10 19:18:35 2023 - [info]  ok.

Fri Mar 10 19:18:35 2023 - [info] Checking master_ip_failover_script status:

Fri Mar 10 19:18:35 2023 - [info]   /etc/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.134.25 --orig_master_ip=172.16.134.25 --orig_master_port=3310

Fri Mar 10 19:18:35 2023 - [info]  OK.

Fri Mar 10 19:18:35 2023 - [warning] shutdown_script is not defined.

Fri Mar 10 19:18:35 2023 - [info] Got exit code 0 (Not master dead).


MySQL Replication Health is OK.

2.8 启动监控进程

nohup masterha_manager --conf=/etc/mha/masterha_default.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log &

查看状态

root@ebsproddb.ys:/etc/mha$ masterha_check_status --conf=/etc/mha/masterha_default.cnf

masterha_default (pid:14977) is running(0:PING_OK), master:172.16.134.24

参数说明:

--remove_dead_master_conf   该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除

--manger_log                日志存放位置

--ignore_last_failover         在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover

停止MHA监控

masterha_stop --conf=/etc/mha/masterha_default.cnf

3、故障切换演练

3.1 failover切换演练

场景一、自动failover,主从正常,从IO,SQL进程down掉,主库down掉

  1. slave01停止IO、SQL线程
  2. 模拟主库down机

3.1.1 slave01停止复制

Mysql>stop replica;

3.1.2 主库down机

Pkill -9 mysqld

3.1.3 观察maanger日志

Master 172.16.134.24(172.16.134.24:3310) is down!


Check MHA Manager logs at ebsproddb.ys:/var/log/mha/app1/manager.log for details.


Started automated(non-interactive) failover.

Invalidated master IP address on 172.16.134.24(172.16.134.24:3310)

None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too  far behind.

Got Error so couldn't continue failover from here.

Tue Mar 14 10:56:53 2023 - [info] Sending mail..

Unknown option: conf

tail: manager.log: file truncated

Thu Mar 16 19:32:14 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Thu Mar 16 19:32:14 2023 - [info] Reading application default configuration from /etc/mha/masterha_default.cnf..

Thu Mar 16 19:32:14 2023 - [info] Reading server configuration from /etc/mha/masterha_default.cnf..

Thu Mar 16 19:32:14 2023 - [info] MHA::MasterMonitor version 0.58.

Thu Mar 16 19:32:15 2023 - [info] GTID failover mode = 1

Thu Mar 16 19:32:15 2023 - [info] Dead Servers:

Thu Mar 16 19:32:15 2023 - [info] Alive Servers:

Thu Mar 16 19:32:15 2023 - [info]   172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:32:15 2023 - [info]   172.16.134.25(172.16.134.25:3310)

Thu Mar 16 19:32:15 2023 - [info]   172.16.134.26(172.16.134.26:3310)

Thu Mar 16 19:32:15 2023 - [info] Alive Slaves:

Thu Mar 16 19:32:15 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:32:15 2023 - [info]     GTID ON

Thu Mar 16 19:32:15 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:32:15 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Thu Mar 16 19:32:15 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:32:15 2023 - [info]     GTID ON

Thu Mar 16 19:32:15 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:32:15 2023 - [info]     Not candidate for the new Master (no_master is set)

Thu Mar 16 19:32:15 2023 - [info] Current Alive Master: 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:32:15 2023 - [info] Checking slave configurations..

Thu Mar 16 19:32:15 2023 - [info] Checking replication filtering settings..

Thu Mar 16 19:32:15 2023 - [info]  binlog_do_db= , binlog_ignore_db=

Thu Mar 16 19:32:15 2023 - [info]  Replication filtering check ok.

Thu Mar 16 19:32:15 2023 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.

Thu Mar 16 19:32:15 2023 - [info] Checking SSH publickey authentication settings on the current master..

Thu Mar 16 19:32:15 2023 - [info] HealthCheck: SSH to 172.16.134.24 is reachable.

Thu Mar 16 19:32:15 2023 - [info]

172.16.134.24(172.16.134.24:3310) (current master)

 +--172.16.134.25(172.16.134.25:3310)

 +--172.16.134.26(172.16.134.26:3310)


Thu Mar 16 19:32:15 2023 - [info] Checking master_ip_failover_script status:

Thu Mar 16 19:32:15 2023 - [info]   /etc/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.134.24 --orig_master_ip=172.16.134.24 --orig_master_port=3310

Checking the Status of the script.. OK

Thu Mar 16 19:32:15 2023 - [info]  OK.

Thu Mar 16 19:32:15 2023 - [warning] shutdown_script is not defined.

Thu Mar 16 19:32:15 2023 - [info] Set master ping interval 1 seconds.

Thu Mar 16 19:32:15 2023 - [info] Set secondary check script: /bin/masterha_secondary_check -s 172.16.134.25 -s 172.16.134.26

Thu Mar 16 19:32:15 2023 - [info] Starting ping health check on 172.16.134.24(172.16.134.24:3310)..

Thu Mar 16 19:32:15 2023 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

 masterha_check_status --conf=/etc/mha/masterha_default.cnf Thu Mar 16 19:33:49 2023 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)

Thu Mar 16 19:33:49 2023 - [info] Executing secondary network check script: /bin/masterha_secondary_check -s 172.16.134.25 -s 172.16.134.26  --user=root  --master_host=172.16.134.24  --master_ip=172.16.134.24  --master_port=3310 --master_user=root --master_password=123456 --ping_type=SELECT

Thu Mar 16 19:33:49 2023 - [info] Executing SSH check script: exit 0

Thu Mar 16 19:33:50 2023 - [info] HealthCheck: SSH to 172.16.134.24 is reachable.

Monitoring server 172.16.134.25 is reachable, Master is not reachable from 172.16.134.25. OK.

Monitoring server 172.16.134.26 is reachable, Master is not reachable from 172.16.134.26. OK.

Thu Mar 16 19:33:50 2023 - [info] Master is not reachable from all other monitoring servers. Failover should start.

Thu Mar 16 19:33:50 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.134.24' (111))

Thu Mar 16 19:33:50 2023 - [warning] Connection failed 2 time(s)..

Thu Mar 16 19:33:51 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.134.24' (111))

Thu Mar 16 19:33:51 2023 - [warning] Connection failed 3 time(s)..

Thu Mar 16 19:33:52 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.134.24' (111))

Thu Mar 16 19:33:52 2023 - [warning] Connection failed 4 time(s)..

Thu Mar 16 19:33:52 2023 - [warning] Master is not reachable from health checker!

Thu Mar 16 19:33:52 2023 - [warning] Master 172.16.134.24(172.16.134.24:3310) is not reachable!

Thu Mar 16 19:33:52 2023 - [warning] SSH is reachable.

Thu Mar 16 19:33:52 2023 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/masterha_default.cnf again, and trying to connect to all servers to check server status..

Thu Mar 16 19:33:52 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Thu Mar 16 19:33:52 2023 - [info] Reading application default configuration from /etc/mha/masterha_default.cnf..

Thu Mar 16 19:33:52 2023 - [info] Reading server configuration from /etc/mha/masterha_default.cnf..

Thu Mar 16 19:33:53 2023 - [warning] SQL Thread is stopped(no error) on 172.16.134.25(172.16.134.25:3310)

Thu Mar 16 19:33:53 2023 - [info] GTID failover mode = 1

Thu Mar 16 19:33:53 2023 - [info] Dead Servers:

Thu Mar 16 19:33:53 2023 - [info]   172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:53 2023 - [info] Alive Servers:

Thu Mar 16 19:33:53 2023 - [info]   172.16.134.25(172.16.134.25:3310)

Thu Mar 16 19:33:53 2023 - [info]   172.16.134.26(172.16.134.26:3310)

Thu Mar 16 19:33:53 2023 - [info] Alive Slaves:

Thu Mar 16 19:33:53 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:53 2023 - [info]     GTID ON

Thu Mar 16 19:33:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:53 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Thu Mar 16 19:33:53 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:53 2023 - [info]     GTID ON

Thu Mar 16 19:33:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:53 2023 - [info]     Not candidate for the new Master (no_master is set)

Thu Mar 16 19:33:53 2023 - [info] Checking slave configurations..

Thu Mar 16 19:33:53 2023 - [info] Checking replication filtering settings..

Thu Mar 16 19:33:53 2023 - [info]  Replication filtering check ok.

Thu Mar 16 19:33:53 2023 - [info] Master is down!

Thu Mar 16 19:33:53 2023 - [info] Terminating monitoring script.

Thu Mar 16 19:33:53 2023 - [info] Got exit code 20 (Master dead).

Thu Mar 16 19:33:53 2023 - [info] MHA::MasterFailover version 0.58.

Thu Mar 16 19:33:53 2023 - [info] Starting master failover.

Thu Mar 16 19:33:53 2023 - [info]

Thu Mar 16 19:33:53 2023 - [info] * Phase 1: Configuration Check Phase..

Thu Mar 16 19:33:53 2023 - [info]

Thu Mar 16 19:33:55 2023 - [warning] SQL Thread is stopped(no error) on 172.16.134.25(172.16.134.25:3310)

Thu Mar 16 19:33:55 2023 - [info] GTID failover mode = 1

Thu Mar 16 19:33:55 2023 - [info] Dead Servers:

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info] Checking master reachability via MySQL(double check)...

Thu Mar 16 19:33:55 2023 - [info]  ok.

Thu Mar 16 19:33:55 2023 - [info] Alive Servers:

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.25(172.16.134.25:3310)

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.26(172.16.134.26:3310)

Thu Mar 16 19:33:55 2023 - [info] Alive Slaves:

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:55 2023 - [info]     GTID ON

Thu Mar 16 19:33:55 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:55 2023 - [info]     GTID ON

Thu Mar 16 19:33:55 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info]     Not candidate for the new Master (no_master is set)

Thu Mar 16 19:33:55 2023 - [info]  Starting SQL thread on 172.16.134.25(172.16.134.25:3310) ..

Thu Mar 16 19:33:55 2023 - [info]   done.

Thu Mar 16 19:33:55 2023 - [info] Starting GTID based failover.

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] ** Phase 1: Configuration Check Phase completed.

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] * Phase 2: Dead Master Shutdown Phase..

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] Forcing shutdown so that applications never connect to the current master..

Thu Mar 16 19:33:55 2023 - [info] Executing master IP deactivation script:

Thu Mar 16 19:33:55 2023 - [info]   /etc/mha/scripts/master_ip_failover --orig_master_host=172.16.134.24 --orig_master_ip=172.16.134.24 --orig_master_port=3310 --command=stopssh --ssh_user=root 

Disabling the VIP on old master: 172.16.134.24

Thu Mar 16 19:33:55 2023 - [info]  done.

Thu Mar 16 19:33:55 2023 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.

Thu Mar 16 19:33:55 2023 - [info] * Phase 2: Dead Master Shutdown Phase completed.

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] * Phase 3: Master Recovery Phase..

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] * Phase 3.1: Getting Latest Slaves Phase..

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] The latest binary log file/position on all slaves is mysql-bin.000016:197

Thu Mar 16 19:33:55 2023 - [info] Retrieved Gtid Set: 59ad1c21-beed-11ed-85ba-6c92bf7dd676:2267583-3372260

Thu Mar 16 19:33:55 2023 - [info] Latest slaves (Slaves that received relay log files to the latest):

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:55 2023 - [info]     GTID ON

Thu Mar 16 19:33:55 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:55 2023 - [info]     GTID ON

Thu Mar 16 19:33:55 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info]     Not candidate for the new Master (no_master is set)

Thu Mar 16 19:33:55 2023 - [info] The oldest binary log file/position on all slaves is mysql-bin.000016:197

Thu Mar 16 19:33:55 2023 - [info] Retrieved Gtid Set: 59ad1c21-beed-11ed-85ba-6c92bf7dd676:2267583-3372260

Thu Mar 16 19:33:55 2023 - [info] Oldest slaves:

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:55 2023 - [info]     GTID ON

Thu Mar 16 19:33:55 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:55 2023 - [info]     GTID ON

Thu Mar 16 19:33:55 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info]     Not candidate for the new Master (no_master is set)

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] * Phase 3.3: Determining New Master Phase..

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] Searching new master from slaves..

Thu Mar 16 19:33:55 2023 - [info]  Candidate masters from the configuration file:

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:55 2023 - [info]     GTID ON

Thu Mar 16 19:33:55 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Thu Mar 16 19:33:55 2023 - [info]  Non-candidate masters:

Thu Mar 16 19:33:55 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Thu Mar 16 19:33:55 2023 - [info]     GTID ON

Thu Mar 16 19:33:55 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Thu Mar 16 19:33:55 2023 - [info]     Not candidate for the new Master (no_master is set)

Thu Mar 16 19:33:55 2023 - [info]  Searching from candidate_master slaves which have received the latest relay log events..

Thu Mar 16 19:33:55 2023 - [info] New master is 172.16.134.25(172.16.134.25:3310)

Thu Mar 16 19:33:55 2023 - [info] Starting master failover..

Thu Mar 16 19:33:55 2023 - [info]

From:

172.16.134.24(172.16.134.24:3310) (current master)

 +--172.16.134.25(172.16.134.25:3310)

 +--172.16.134.26(172.16.134.26:3310)


To:

172.16.134.25(172.16.134.25:3310) (new master)

 +--172.16.134.26(172.16.134.26:3310)

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] * Phase 3.3: New Master Recovery Phase..

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info]  Waiting all logs to be applied..

Thu Mar 16 19:33:55 2023 - [info]   done.

Thu Mar 16 19:33:55 2023 - [info] Getting new master's binlog name and position..

Thu Mar 16 19:33:55 2023 - [info]  mysql-bin.000008:123787546

Thu Mar 16 19:33:55 2023 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.134.25', MASTER_PORT=3310, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';

Thu Mar 16 19:33:55 2023 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000008, 123787546, 59ad1c21-beed-11ed-85ba-6c92bf7dd676:1-3372260

Thu Mar 16 19:33:55 2023 - [info] Executing master IP activate script:

Thu Mar 16 19:33:55 2023 - [info]   /etc/mha/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=172.16.134.24 --orig_master_ip=172.16.134.24 --orig_master_port=3310 --new_master_host=172.16.134.25 --new_master_ip=172.16.134.25 --new_master_port=3310 --new_master_user='root'   --new_master_password=xxx

Enabling the VIP - 10.89.181.9/25 on the new master - 172.16.134.25

Thu Mar 16 19:33:55 2023 - [info]  OK.

Thu Mar 16 19:33:55 2023 - [info] Setting read_only=0 on 172.16.134.25(172.16.134.25:3310)..

Thu Mar 16 19:33:55 2023 - [info]  ok.

Thu Mar 16 19:33:55 2023 - [info] ** Finished master recovery successfully.

Thu Mar 16 19:33:55 2023 - [info] * Phase 3: Master Recovery Phase completed.

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] * Phase 4: Slaves Recovery Phase..

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] * Phase 4.1: Starting Slaves in parallel..

Thu Mar 16 19:33:55 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info] -- Slave recovery on host 172.16.134.26(172.16.134.26:3310) started, pid: 17844. Check tmp log /var/log/mha/app1/172.16.134.26_3310_20230316193353.log if it takes time..

Thu Mar 16 19:33:56 2023 - [info]

Thu Mar 16 19:33:56 2023 - [info] Log messages from 172.16.134.26 ...

Thu Mar 16 19:33:56 2023 - [info]

Thu Mar 16 19:33:55 2023 - [info]  Resetting slave 172.16.134.26(172.16.134.26:3310) and starting replication from the new master 172.16.134.25(172.16.134.25:3310)..

Thu Mar 16 19:33:55 2023 - [info]  Executed CHANGE MASTER.

Thu Mar 16 19:33:55 2023 - [info]  Slave started.

Thu Mar 16 19:33:55 2023 - [info]  gtid_wait(59ad1c21-beed-11ed-85ba-6c92bf7dd676:1-3372260) completed on 172.16.134.26(172.16.134.26:3310). Executed 0 events.

Thu Mar 16 19:33:56 2023 - [info] End of log messages from 172.16.134.26.

Thu Mar 16 19:33:56 2023 - [info] -- Slave on host 172.16.134.26(172.16.134.26:3310) started.

Thu Mar 16 19:33:56 2023 - [info] All new slave servers recovered successfully.

Thu Mar 16 19:33:56 2023 - [info]

Thu Mar 16 19:33:56 2023 - [info] * Phase 5: New master cleanup phase..

Thu Mar 16 19:33:56 2023 - [info]

Thu Mar 16 19:33:56 2023 - [info] Resetting slave info on the new master..

Thu Mar 16 19:33:56 2023 - [info]  172.16.134.25: Resetting slave info succeeded.

Thu Mar 16 19:33:56 2023 - [info] Master failover to 172.16.134.25(172.16.134.25:3310) completed successfully.

Thu Mar 16 19:33:56 2023 - [info] Deleted server1 entry from /etc/mha/masterha_default.cnf .

Thu Mar 16 19:33:56 2023 - [info]


----- Failover Report -----


masterha_default: MySQL Master failover 172.16.134.24(172.16.134.24:3310) to 172.16.134.25(172.16.134.25:3310) succeeded


Master 172.16.134.24(172.16.134.24:3310) is down!


Check MHA Manager logs at ebsproddb.ys:/var/log/mha/app1/manager.log for details.


Started automated(non-interactive) failover.

Invalidated master IP address on 172.16.134.24(172.16.134.24:3310)

Selected 172.16.134.25(172.16.134.25:3310) as a new master.

172.16.134.25(172.16.134.25:3310): OK: Applying all logs succeeded.

172.16.134.25(172.16.134.25:3310): OK: Activated master IP address.

172.16.134.26(172.16.134.26:3310): OK: Slave started, replicating from 172.16.134.25(172.16.134.25:3310)

172.16.134.25(172.16.134.25:3310): Resetting slave info succeeded.

Master failover to 172.16.134.25(172.16.134.25:3310) completed successfully.

场景二、自动failover,备主延迟>100M,从库设置no_maseter

  1. slave01停止IO线程
  2. 主库开启压力测试数据
  3. 开启slave01 IO线程

4、模拟主库down机

3.1.3 slave01停止复制

Mysql>stop slave io_thread;

3.1.2 主库创建测试数据

mysqlslap --user=root --password=123456 -h 172.16.134.24 -P 3310 --auto-generate-sql --auto-generate-sql-load-type=write --concurrency=128 --number-of-queries=1000000 --create-schema=test03

3.1.3 slave01开启复制

Slave01

Mysql>start slave io_thread;

3.1.4 主库down机

Pkill -9 mysqld

3.1.6 观察maanger日志

Tue Mar 14 10:56:48 2023 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)

Tue Mar 14 10:56:48 2023 - [info] Executing secondary network check script: /bin/masterha_secondary_check -s 172.16.134.25 -s 172.16.134.26  --user=root  --master_host=172.16.134.24  --master_ip=172.16.134.24  --master_port=3310 --master_user=root --master_password=123456 --ping_type=SELECT

Tue Mar 14 10:56:48 2023 - [info] Executing SSH check script: exit 0

Tue Mar 14 10:56:48 2023 - [info] HealthCheck: SSH to 172.16.134.24 is reachable.

Monitoring server 172.16.134.25 is reachable, Master is not reachable from 172.16.134.25. OK.

Monitoring server 172.16.134.26 is reachable, Master is not reachable from 172.16.134.26. OK.

Tue Mar 14 10:56:48 2023 - [info] Master is not reachable from all other monitoring servers. Failover should start.

Tue Mar 14 10:56:49 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.134.24' (111))

Tue Mar 14 10:56:49 2023 - [warning] Connection failed 2 time(s)..

Tue Mar 14 10:56:50 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.134.24' (111))

Tue Mar 14 10:56:50 2023 - [warning] Connection failed 3 time(s)..

Tue Mar 14 10:56:51 2023 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.16.134.24' (111))

Tue Mar 14 10:56:51 2023 - [warning] Connection failed 4 time(s)..

Tue Mar 14 10:56:51 2023 - [warning] Master is not reachable from health checker!

Tue Mar 14 10:56:51 2023 - [warning] Master 172.16.134.24(172.16.134.24:3310) is not reachable!

Tue Mar 14 10:56:51 2023 - [warning] SSH is reachable.

Tue Mar 14 10:56:51 2023 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/masterha_default.cnf again, and trying to connect to all servers to check server status..

Tue Mar 14 10:56:51 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Tue Mar 14 10:56:51 2023 - [info] Reading application default configuration from /etc/mha/masterha_default.cnf..

Tue Mar 14 10:56:51 2023 - [info] Reading server configuration from /etc/mha/masterha_default.cnf..

Tue Mar 14 10:56:52 2023 - [info] GTID failover mode = 1

Tue Mar 14 10:56:52 2023 - [info] Dead Servers:

Tue Mar 14 10:56:52 2023 - [info]   172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:52 2023 - [info] Alive Servers:

Tue Mar 14 10:56:52 2023 - [info]   172.16.134.25(172.16.134.25:3310)

Tue Mar 14 10:56:52 2023 - [info]   172.16.134.26(172.16.134.26:3310)

Tue Mar 14 10:56:52 2023 - [info] Alive Slaves:

Tue Mar 14 10:56:52 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:52 2023 - [info]     GTID ON

Tue Mar 14 10:56:52 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:52 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Tue Mar 14 10:56:52 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:52 2023 - [info]     GTID ON

Tue Mar 14 10:56:52 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:52 2023 - [info]     Not candidate for the new Master (no_master is set)

Tue Mar 14 10:56:52 2023 - [info] Checking slave configurations..

Tue Mar 14 10:56:52 2023 - [info] Checking replication filtering settings..

Tue Mar 14 10:56:52 2023 - [info]  Replication filtering check ok.

Tue Mar 14 10:56:52 2023 - [info] Master is down!

Tue Mar 14 10:56:52 2023 - [info] Terminating monitoring script.

Tue Mar 14 10:56:52 2023 - [info] Got exit code 20 (Master dead).

Tue Mar 14 10:56:52 2023 - [info] MHA::MasterFailover version 0.58.

Tue Mar 14 10:56:52 2023 - [info] Starting master failover.

Tue Mar 14 10:56:52 2023 - [info]

Tue Mar 14 10:56:52 2023 - [info] * Phase 1: Configuration Check Phase..

Tue Mar 14 10:56:52 2023 - [info]

 Tue Mar 14 10:56:53 2023 - [info] GTID failover mode = 1

Tue Mar 14 10:56:53 2023 - [info] Dead Servers:

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:53 2023 - [info] Checking master reachability via MySQL(double check)...

Tue Mar 14 10:56:53 2023 - [info]  ok.

Tue Mar 14 10:56:53 2023 - [info] Alive Servers:

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.25(172.16.134.25:3310)

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.26(172.16.134.26:3310)

Tue Mar 14 10:56:53 2023 - [info] Alive Slaves:

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:53 2023 - [info]     GTID ON

Tue Mar 14 10:56:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:53 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:53 2023 - [info]     GTID ON

Tue Mar 14 10:56:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:53 2023 - [info]     Not candidate for the new Master (no_master is set)

Tue Mar 14 10:56:53 2023 - [info] Starting GTID based failover.

Tue Mar 14 10:56:53 2023 - [info]

Tue Mar 14 10:56:53 2023 - [info] ** Phase 1: Configuration Check Phase completed.

Tue Mar 14 10:56:53 2023 - [info]

Tue Mar 14 10:56:53 2023 - [info] * Phase 2: Dead Master Shutdown Phase..

Tue Mar 14 10:56:53 2023 - [info]

Tue Mar 14 10:56:53 2023 - [info] Forcing shutdown so that applications never connect to the current master..

Tue Mar 14 10:56:53 2023 - [info] Executing master IP deactivation script:

Tue Mar 14 10:56:53 2023 - [info]   /etc/mha/scripts/master_ip_failover --orig_master_host=172.16.134.24 --orig_master_ip=172.16.134.24 --orig_master_port=3310 --command=stopssh --ssh_user=root 

Disabling the VIP on old master: 172.16.134.24

SIOCSIFFLAGS: Cannot assign requested address

Tue Mar 14 10:56:53 2023 - [info]  done.

Tue Mar 14 10:56:53 2023 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.

Tue Mar 14 10:56:53 2023 - [info] * Phase 2: Dead Master Shutdown Phase completed.

Tue Mar 14 10:56:53 2023 - [info]

Tue Mar 14 10:56:53 2023 - [info] * Phase 3: Master Recovery Phase..

Tue Mar 14 10:56:53 2023 - [info]

Tue Mar 14 10:56:53 2023 - [info] * Phase 3.1: Getting Latest Slaves Phase..

Tue Mar 14 10:56:53 2023 - [info]

Tue Mar 14 10:56:53 2023 - [info] The latest binary log file/position on all slaves is mysql-bin.000014:391266116

Tue Mar 14 10:56:53 2023 - [info] Retrieved Gtid Set: 59ad1c21-beed-11ed-85ba-6c92bf7dd676:2267583-3372260

Tue Mar 14 10:56:53 2023 - [info] Latest slaves (Slaves that received relay log files to the latest):

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:53 2023 - [info]     GTID ON

Tue Mar 14 10:56:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:53 2023 - [info]     Not candidate for the new Master (no_master is set)

Tue Mar 14 10:56:53 2023 - [info] The oldest binary log file/position on all slaves is mysql-bin.000014:66985635

Tue Mar 14 10:56:53 2023 - [info] Retrieved Gtid Set: 59ad1c21-beed-11ed-85ba-6c92bf7dd676:2267583-2830890

Tue Mar 14 10:56:53 2023 - [info] Oldest slaves:

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:53 2023 - [info]     GTID ON

Tue Mar 14 10:56:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:53 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Tue Mar 14 10:56:53 2023 - [info]

Tue Mar 14 10:56:53 2023 - [info] * Phase 3.3: Determining New Master Phase..

Tue Mar 14 10:56:53 2023 - [info]

Tue Mar 14 10:56:53 2023 - [warning]  Slave 172.16.134.25(172.16.134.25:3310) SQL Thread delays too much. Latest log file:mysql-bin.000014:391266116, Current log file:mysql-bin.000014:38365868. This server is not selected as a new master because recovery will take long time.

Tue Mar 14 10:56:53 2023 - [info] Searching new master from slaves..

Tue Mar 14 10:56:53 2023 - [info]  Candidate masters from the configuration file:

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:53 2023 - [info]     GTID ON

Tue Mar 14 10:56:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:53 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Tue Mar 14 10:56:53 2023 - [info]  Non-candidate masters:

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.25(172.16.134.25:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:53 2023 - [info]     GTID ON

Tue Mar 14 10:56:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:53 2023 - [info]     Primary candidate for the new Master (candidate_master is set)

Tue Mar 14 10:56:53 2023 - [info]   172.16.134.26(172.16.134.26:3310)  Version=8.0.30 (oldest major version between slaves) log-bin:enabled

Tue Mar 14 10:56:53 2023 - [info]     GTID ON

Tue Mar 14 10:56:53 2023 - [info]     Replicating from 172.16.134.24(172.16.134.24:3310)

Tue Mar 14 10:56:53 2023 - [info]     Not candidate for the new Master (no_master is set)

Tue Mar 14 10:56:53 2023 - [info]  Searching from candidate_master slaves which have received the latest relay log events..

Tue Mar 14 10:56:53 2023 - [info]   Not found.

Tue Mar 14 10:56:53 2023 - [info]  Searching from all candidate_master slaves..

Tue Mar 14 10:56:53 2023 - [info]   Not found.

Tue Mar 14 10:56:53 2023 - [info]  Searching from all slaves which have received the latest relay log events..

Tue Mar 14 10:56:53 2023 - [info]   Not found.

Tue Mar 14 10:56:53 2023 - [info]  Searching from all slaves..

Tue Mar 14 10:56:53 2023 - [info]   Not found.

Tue Mar 14 10:56:53 2023 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln971] None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too  far behind.

Tue Mar 14 10:56:53 2023 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /bin/masterha_manager line 65.

Tue Mar 14 10:56:53 2023 - [info]


----- Failover Report -----


masterha_default: MySQL Master failover 172.16.134.24(172.16.134.24:3310)


Master 172.16.134.24(172.16.134.24:3310) is down!

Check MHA Manager logs at ebsproddb.ys:/var/log/mha/app1/manager.log for details.

Started automated(non-interactive) failover.

Invalidated master IP address on 172.16.134.24(172.16.134.24:3310)

None of existing slaves matches as a new master. Maybe preferred node is misconfigured or all slaves are too  far behind.

Got Error so couldn't continue failover from here.

Tue Mar 14 10:56:53 2023 - [info] Sending mail..

Unknown option: conf

结论:自动failover 失败,主从延迟>100M,另外从添加no_master,找不到新maseter,切换失败。

场景三、自动failover,备主延迟<100M

  1. slave01停止IO线程
  2. 主库开启压力测试数据
  3. 开启slave01 IO线程
  4. 模拟主库down机

结论:自动failover成功,备主延迟<100M,切换到备主成功。

场景四、自动failover, 备主延迟>100M,从库未设置no_maseter

  1. slave01停止IO线程
  2. 主库开启压力测试数据
  3. 开启slave01 IO线程

4、 模拟主库down机

结论:自动failover成功,备主延迟>100M,切换到从库成功