MHA(MasterHigh Availability Manager and Tools for MySQL),是日本的一位MySQL专家采用Perl语言编写的一个脚本管理工具,该工具仅适用于MySQLReplication(二层)环境,目的在于维持Master主库的高可用性。它是基于标准的MySQL复制(异步/半同步):

     MHA有两部分组成:MHAManager(管理节点)和MHA Node(数据节点);

     MHA Manager可以单独部署在一台独立机器上管理多个master-slave集群,也可以部署在一台slave上;

             MHA Manager探测集群的node节点,当发现master出现故障的时候,它可以自动将具有最新数据的slave提升为新的master,然后将所有其它的slave导向新的master上,整个故障转移过程对应用程序是透明的;

     MHA node运行在每台MySQL服务器上(master/slave/manager),它通过监控具备解析和清理logs功能的脚本来加快故障转移的;


MHA架构MHA由MHA Manager和MHA Node组成:

MySQL 5.6 GTID+MHA_管理工具

MHA Manager:

            运行一些工具,比如masterha_manager工具实现自动监控MySQL Master和实现master故障切换,其它工具实现手动实现master故障切换、在线mater转移、连接检查等等。一个Manager可以管理多个master-slave集群,只需要在管理节点上部署即可。



masterha_check_ssh :    检查MHA的SSH配置。

masterha_check_repl :    检查MySQL复制。

masterha_manager :      启动MHA。

masterha_check_status : 检测当前MHA运行状态。

masterha_master_monitor : 监测master是否宕机。

masterha_master_switch :   控制故障转移(自动或手动)。

masterha_conf_host :        添加或删除配置的server信息。

MHA Node:




Node工具(这些工具通常由MHA Manager的脚本触发,无需人手操作)。

save_binary_logs : 保存和复制master的二进制日志。

apply_diff_relay_logs : 识别差异的中继日志事件并应用于其它slave。

filter_mysqlbinlog : 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)。

purge_relay_logs  : 清除中继日志(不会阻塞SQL线程)。


master出现故障时,通过对比slave之间I/O线程读取master binlog的位置,选取最接近的slave做为latest slave。其它slave通过与latest slave对比生成差异中继日志。在latest slave上应用从master保存的binlog,同时将latest slave提升为master。最后在其它slave上应用相应的差异中继日志并开始从新的master开始复制。




M/S复制集群中,只要从库在复制上没有延迟,MHA通常可以在数秒内实现故障切换。9-10秒内检查到master故障,可以选择在7-10秒关闭master以避免出现裂脑,几秒钟内,将差异中继日志(relay log)应用到新的master上,因此总的宕机时间通常为10-30秒。恢复新的master后,MHA并行的恢复其余的slave。即使在有数万台slave,也不会影响master的恢复时间。


当目前的master出现故障是,MHA自动识别slave之间中继日志(relay log)的不同,并应用到所有的slave中。这样所有的salve能够保持同步,只要所有的slave处于存活状态。Semi-SynchronousReplication(半同步插件)一起使用,(几乎)可以保证没有数据丢失。






MHA由MHA Manager和MHANode组成。MHA Node运行在需要故障切换/恢复的MySQL服务器上,因此并不需要额外增加服务器。MHAManager运行在特定的服务器上,因此需要增加一台(实现高可用需要2台),但是MHAManager可以监控大量(甚至上百台)单独的master,因此,并不需要增加大量的服务器。即使在一台slave上运行MHA Manager也是可以的。综上,实现MHA并没用额外增加大量的服务。





7、MHA 0.56版本已经支持MYSQLGTID复制


全局事务标示符(Global Transactions Identifier)是MySQL5.6复制的一个新特性,全局事务 ID 的官方定义是:GTID= source_id:transaction_id

MySQL 5.6 中,每一个 GTID 代表一个数据库事务。source_id 表示执行事务的主库 uuid(server_uuid),transaction_id 是一个从 1 开始的自增计数,表示在这个主库上执行的第 n 个事务。MySQL 会保证事务与 GTID 之间的 1 : 1 映射。GTID是全局唯一性。



比如:Server A的服务器宕机,需要将业务切换到Server B上。同时,我们又需要将Server C的复制源改成Server B。复制源修改的命令语法很简单即CHANGE MASTER TO MASTER_HOST='***',MASTER_LOG_FILE='***', MASTER_LOG_POS=N。而难点在于,由于同一个事务在每台机器上所在的binlog名字和位置都不一样,那么怎么找到Server C当前同步停止点,对应ServerB的master_log_file和master_log_pos就会很困难。


5.6的GTID出现后,就很简单,由于同一事务的GTID在所有节点上的值一致,那么根据Server C当前停止点的GTID就能唯一定位到Server B上的GTID。甚至由于MASTER_AUTO_POSITION功能的出现,不需要知道GTID的具体值,直接使用change master tomaster_host='****',master_user='****',master_password='*****',master_auto_position=1命令就可以直接完成failover的工作。




3.不能使用create table tablename as select语句;





gtid-mode                          = ON

enforce-gtid-consistency            = ON


change master to  master_host='XX',master_user='repl',master_password='XX',master_auto_position=1;




1.1 MHA

MHA(Master High Availability)0.56版本支持GTID和一主一从架构,以下以一主一从再加manager Node为例。

1.2 拓扑图


MySQL 5.6 GTID+MHA_manager_02

MHA分为Managernode 和 Agent node,此次设计,ManagerNode在192.168.1.108上,为独立的服务器,监控和管理整个集群的状态,和192.168.1.104分别为Agent Node。





    MHA  Manager Node


    Agent  Node


     Agent Node

注:(1)所有MHA所在的服务器需要配置root 用户下SSH互信

1.3安装 MHA软件


mha4mysql-manager-0.56-0.el6.noarch.rpm      --只需要在Manager节点上安装





1.      安装所有软件包(用yum安装,需配置yum源,注意安装先后顺序)

yum install mha4mysql-node-0.56-0.el6.noarch.rpm

yum install perl-Config-Tiny-2.12-1.el6.rfx.noarch.rpm

yum install perl-Log-Dispatch-2.26-1.el6.rf.noarch.rpm

yum install perl-Parallel-ForkManager-0.7.5-2.2.el6.rf.noarch.rpm

yum install mha4mysql-manager-0.56-0.el6.noarch.rpm


1.4.1 MHA工具包


1.4.2 配置MHA

注:只需要在ManagerNode上配置,Agent Node上无需做任何配置,此次配置文件放置在/etc/mha文件系统下:


[server default]

manager_workdir=/etc/mha              ---MHA工作目录   

manager_log=/etc/mha/manager.log      ---MHA日志存放目录

master_binlog_dir=/mysql/binlog/    ---MySQL数据库binlog位置      

password=111111                    ---- 密码

user=root                            ---MySQL用户读取relay  log和改变复制关系需要使用

ping_interval=1                      ---每一秒做一次健康检查  

remote_workdir=/etc/mha               --- 远程MHA工作目录

repl_password=oavir61               --- MySQL复制用户密码

repl_user=root                       --- MySQL复制用户

ssh_user=root                       ----ssh 互信用户

client_bindir=/mysql/app/bin       ---MySQL可执行文件所在目录

master_ip_failover_script=/etc/mha/master_ip_failover  ---VIP绑定和切换脚本












#!/usr/bin/env perl

use strict;

use warnings FATAL => 'all';


use Getopt::Long;


use Net::Ping;

use Switch;


my ($command, $ssh_user,  $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host,  $new_master_ip, $new_master_port, $new_master_user, $new_master_password);


   'command=s'             =>  \$command,

   'ssh_user=s'            =>  \$ssh_user,

   'orig_master_host=s'    =>  \$orig_master_host,

   'orig_master_ip=s'      => \$orig_master_ip,

   'orig_master_port=i'    =>  \$orig_master_port,

   'new_master_host=s'     =>  \$new_master_host,

   'new_master_ip=s'       =>  \$new_master_ip,

   'new_master_port=i'     =>  \$new_master_port,

   'new_master_user=s'     =>  \$new_master_user,

   'new_master_password=s' => \$new_master_password,


my $vip = '';  # Virtual IP

my $master_srv =  '';

my $timeout = 5;

my $key = "1";

my $gateway = '';

my $interface = 'eth1';

my $ssh_start_vip =  "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s  $vip $gateway >/dev/null 2>&1";

my $ssh_stop_vip =  "/sbin/ifconfig $interface:$key down";


exit &main();


sub main {


#print "\n\nIN SCRIPT  TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

            print  "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq  "stop" || $command eq "stopssh" ) {


    # $orig_master_host, $orig_master_ip,  $orig_master_port are passed.

    # If you manage master ip address at  global catalog database,

     # invalidate orig_master_ip  here.

    my $exit_code = 1;

    eval {

        print "Disabling the VIP on old  master if the server is still UP: $orig_master_host \n";

        my $p=Net::Ping->new('icmp');

        &stop_vip() if  $p->ping($master_srv, $timeout);


        $exit_code = 0;


    if ($@) {

        warn "Got Error: $@\n";

        exit $exit_code;


    exit $exit_code;


elsif ( $command eq  "start" ) {


    # all arguments are passed.

    # If you manage master ip address at  global catalog database,

    # activate new_master_ip here.

    # You can also grant write access (create  user, set read_only=0, etc) here.

my $exit_code = 10;

    eval {

        print "Enabling the VIP - $vip  on the new master - $new_master_host \n";


        $exit_code = 0;


    if ($@) {

        warn $@;

        exit $exit_code;


    exit $exit_code;


elsif ( $command eq  "status" ) {

    print "Checking the Status of the  script.. OK \n";

    #`ssh $ssh_user\@$new_master_host \"  $ssh_start_vip \"`;

    `ssh $ssh_user\@$orig_master_host \"  $ssh_start_vip \"`;

    exit 0;


else {


    exit 1;




# A simple system call that  enable the VIP on the new master

sub start_vip() {

    `ssh $ssh_user\@$new_master_host \"  $ssh_start_vip \"`;


# A simple system call that  disable the VIP on the old_master

 sub stop_vip() {

     `ssh $ssh_user\@$orig_master_host  \" $ssh_stop_vip \"`;



sub usage {


"Usage: master_ip_failover  --command=start|stop|stopssh|status --orig_master_host=host  --orig_master_ip=ip --orig_master_port=port --new_master_host=host  --new_master_ip=ip --new_master_port=port\n";



1.4.3 MHA配置信息校验


masterha_check_repl  --conf=/etc/mha/app1.cnf

[root@lab1 ~]# masterha_check_ssh  --conf=/etc/mha/app1.cnf

Sun Aug 14 11:11:17 2016 - [warning]  Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sun Aug 14 11:11:17 2016 - [info] Reading  application default configuration from /etc/mha/app1.cnf..

Sun Aug 14 11:11:17 2016 - [info] Reading  server configuration from /etc/mha/app1.cnf..

Sun Aug 14 11:11:17 2016 - [info] Starting  SSH connection tests..

Sun Aug 14 11:11:18 2016 - [debug]

Sun Aug 14 11:11:17 2016 - [debug]  Connecting via SSH from  root@ to  root@

Sun Aug 14 11:11:18 2016 - [debug]   ok.

Sun Aug 14 11:11:18 2016 - [debug]

Sun Aug 14 11:11:18 2016 - [debug]  Connecting via SSH from  root@ to  root@

Sun Aug 14 11:11:18 2016 - [debug]   ok.

Sun Aug 14 11:11:18 2016 - [info] All SSH  connection tests passed successfully.

[root@lab1 ~]# masterha_check_repl  --conf=/etc/mha/app1.cnf

Sun Aug 14 11:12:56 2016 - [warning]  Global configuration file /etc/masterha_default.cnf not found. Skipping.

Sun Aug 14 11:12:56 2016 - [info] Reading  application default configuration from /etc/mha/app1.cnf..

Sun Aug 14 11:12:56 2016 - [info] Reading  server configuration from /etc/mha/app1.cnf..

Sun Aug 14 11:12:56 2016 - [info]  MHA::MasterMonitor version 0.56.

Sun Aug 14 11:12:56 2016 - [info] GTID  failover mode = 1

Sun Aug 14 11:12:56 2016 - [info] Dead  Servers:

Sun Aug 14 11:12:56 2016 - [info] Alive  Servers:

Sun Aug 14 11:12:56 2016 - [info]

Sun Aug 14 11:12:56 2016 - [info]

Sun Aug 14 11:12:56 2016 - [info] Alive  Slaves:

Sun Aug 14 11:12:56 2016 - [info]  Version=5.6.30-enterprise-commercial-advanced-log  (oldest major version between slaves) log-bin:enabled

Sun Aug 14 11:12:56 2016 - [info]     GTID ON

Sun Aug 14 11:12:56 2016 - [info]     Replicating from

Sun Aug 14 11:12:56 2016 - [info]     Primary candidate for the new Master  (candidate_master is set)

Sun Aug 14 11:12:56 2016 - [info] Current  Alive Master:

Sun Aug 14 11:12:56 2016 - [info] Checking  slave configurations..

Sun Aug 14 11:12:56 2016 - [info]  read_only=1 is not set on slave

Sun Aug 14 11:12:56 2016 - [info] Checking  replication filtering settings..

Sun Aug 14 11:12:56 2016 - [info]  binlog_do_db= , binlog_ignore_db=

Sun Aug 14 11:12:56 2016 - [info]  Replication filtering check ok.

Sun Aug 14 11:12:56 2016 - [info] GTID  (with auto-pos) is supported. Skipping all SSH and Node package checking.

Sun Aug 14 11:12:56 2016 - [info] Checking  SSH publickey authentication settings on the current master..

Sun Aug 14 11:12:57 2016 - [info]  HealthCheck: SSH to is reachable.

Sun Aug 14 11:12:57 2016 - [info] (current  master)



Sun Aug 14 11:12:57 2016 - [info] Checking  replication health on

Sun Aug 14 11:12:57 2016 - [info]  ok.

Sun Aug 14 11:12:57 2016 - [info] Checking  master_ip_failover_script status:

Sun Aug 14 11:12:57 2016 - [info]   /etc/mha/master_ip_failover  --command=status --ssh_user=root --orig_master_host=  --orig_master_ip= --orig_master_port=3306



IN SCRIPT TEST====/sbin/ifconfig eth1:1  down==/sbin/ifconfig eth1:1;/sbin/arping -I eth1 -c 3 -s >/dev/null 2>&1===


Checking the Status of the script.. OK

Sun Aug 14 11:13:00 2016 - [info]  OK.

Sun Aug 14 11:13:00 2016 - [warning]  shutdown_script is not defined.

Sun Aug 14 11:13:00 2016 - [info] Got exit  code 0 (Not master dead).

MySQL Replication Health is OK.



1.5启动 MHA

nohup masterha_manager  --conf=/etc/mha/app1.cnf < /dev/null >/etc/mha/manager.log 2>&1  &


Sun Aug 14 11:47:07 2016 - [info]  MHA::MasterMonitor version 0.56.

Sun Aug 14 11:47:07 2016 - [info] GTID  failover mode = 1

Sun Aug 14 11:47:07 2016 - [info] Dead  Servers:

Sun Aug 14 11:47:07 2016 - [info] Alive  Servers:

Sun Aug 14 11:47:07 2016 - [info]

Sun Aug 14 11:47:07 2016 - [info]

Sun Aug 14 11:47:07 2016 - [info] Alive  Slaves:

Sun Aug 14 11:47:07 2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun Aug 14 11:47:07 2016 - [info]     GTID ON

Sun Aug 14 11:47:07 2016 - [info]     Replicating from

Sun Aug 14 11:47:07 2016 - [info]     Primary candidate for the new Master  (candidate_master is set)

Sun Aug 14 11:47:07 2016 - [info] Current  Alive Master:

Sun Aug 14 11:47:07 2016 - [info] Checking  slave configurations..

Sun Aug 14 11:47:07 2016 - [info]  read_only=1 is not set on slave

Sun Aug 14 11:47:07 2016 - [info] Checking  replication filtering settings..

Sun Aug 14 11:47:07 2016 - [info]  binlog_do_db= , binlog_ignore_db=

Sun Aug 14 11:47:07 2016 - [info]  Replication filtering check ok.

Sun Aug 14 11:47:07 2016 - [info] GTID  (with auto-pos) is supported. Skipping all SSH and Node package checking.

Sun Aug 14 11:47:07 2016 - [info] Checking  SSH publickey authentication settings on the current master..

Sun Aug 14 11:47:07 2016 - [info]  HealthCheck: SSH to is reachable.

Sun Aug 14 11:47:07 2016 - [info] (current  master)



Sun Aug 14 11:47:07 2016 - [info] Checking  master_ip_failover_script status:

Sun Aug 14 11:47:07 2016 - [info]   /etc/mha/master_ip_failover  --command=status --ssh_user=root --orig_master_host=  --orig_master_ip= --orig_master_port=3306



IN SCRIPT TEST====/sbin/ifconfig eth1:1  down==/sbin/ifconfig eth1:1;/sbin/arping -I eth1 -c 3 -s >/dev/null 2>&1===


Checking the Status of the script.. OK

Sun Aug 14 11:47:10 2016 - [info]  OK.

Sun Aug 14 11:47:10 2016 - [warning]  shutdown_script is not defined.

Sun Aug 14 11:47:10 2016 - [info] Set  master ping interval 1 seconds.

Sun Aug 14 11:47:10 2016 - [warning]  secondary_check_script is not defined. It is highly recommended setting it to  check master reachability from two or more routes.

Sun Aug 14 11:47:10 2016 - [info] Starting  ping health check on

Sun Aug 14 11:47:10 2016 - [info]  Ping(SELECT) succeeded, waiting until MySQL doesn't respond..                                                                                          可以看到,MHA已经正常启动,且在192.168.1.103上绑定了VIP:



1.6.1 Master数据库实例故障


  在Master上:service mysqld stop,查看MHA manager日志:


Sun  Aug 14 11:51:14 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL  server has gone away)

Sun  Aug 14 11:51:14 2016 - [info] Executing SSH check script: exit 0

Sun  Aug 14 11:51:14 2016 - [info] HealthCheck: SSH to is reachable.

Sun  Aug 14 11:51:15 2016 - [warning] Got error on MySQL connect: 2013 (Lost  connection to MySQL server at 'reading initial communication packet', system  error: 111)

Sun  Aug 14 11:51:15 2016 - [warning] Connection failed 2 time(s)..

Sun  Aug 14 11:51:16 2016 - [warning] Got error on MySQL connect: 2013 (Lost  connection to MySQL server at 'reading initial communication packet', system  error: 111)

Sun  Aug 14 11:51:16 2016 - [warning] Connection failed 3 time(s)..

Sun  Aug 14 11:51:17 2016 - [warning] Got error on MySQL connect: 2013 (Lost  connection to MySQL server at 'reading initial communication packet', system  error: 111)

Sun  Aug 14 11:51:17 2016 - [warning] Connection failed 4 time(s)..

Sun  Aug 14 11:51:17 2016 - [warning] Master is not reachable from health checker!

Sun Aug 14 11:51:17 2016 - [warning] Master is not reachable!

Sun  Aug 14 11:51:17 2016 - [warning] SSH is reachable.

Sun  Aug 14 11:51:17 2016 - [info] Connecting to a master server failed. Reading  configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and  trying to connect to all servers to check server status..

Sun  Aug 14 11:51:17 2016 - [warning] Global configuration file  /etc/masterha_default.cnf not found. Skipping.

Sun  Aug 14 11:51:17 2016 - [info] Reading application default configuration from  /etc/mha/app1.cnf..

Sun  Aug 14 11:51:17 2016 - [info] Reading server configuration from  /etc/mha/app1.cnf..

Sun Aug 14 11:51:17 2016 - [info] GTID failover mode = 1

Sun Aug  14 11:51:17 2016 - [info] Dead Servers:

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] Alive Servers:

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] Alive Slaves:

Sun  Aug 14 11:51:17 2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun  Aug 14 11:51:17 2016 - [info]     GTID  ON

Sun  Aug 14 11:51:17 2016 - [info]      Replicating from

Sun  Aug 14 11:51:17 2016 - [info]      Primary candidate for the new Master (candidate_master is set)

Sun  Aug 14 11:51:17 2016 - [info] Checking slave configurations..

Sun  Aug 14 11:51:17 2016 - [info]   read_only=1 is not set on slave

Sun  Aug 14 11:51:17 2016 - [info] Checking replication filtering settings..

Sun  Aug 14 11:51:17 2016 - [info]   Replication filtering check ok.

Sun  Aug 14 11:51:17 2016 - [info] Master is down!

Sun  Aug 14 11:51:17 2016 - [info] Terminating monitoring script.

Sun  Aug 14 11:51:17 2016 - [info] Got exit code 20 (Master dead).

Sun  Aug 14 11:51:17 2016 - [info] MHA::MasterFailover version 0.56.

Sun  Aug 14 11:51:17 2016 - [info] Starting master failover.

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] * Phase 1: Configuration Check Phase..

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] GTID failover mode = 1

Sun  Aug 14 11:51:17 2016 - [info] Dead Servers:

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] Checking master reachability via MySQL(double  check)...

Sun  Aug 14 11:51:17 2016 - [info]  ok.

Sun  Aug 14 11:51:17 2016 - [info] Alive Servers:

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] Alive Slaves:

Sun  Aug 14 11:51:17 2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun  Aug 14 11:51:17 2016 - [info]     GTID  ON

Sun  Aug 14 11:51:17 2016 - [info]      Replicating from

Sun  Aug 14 11:51:17 2016 - [info]      Primary candidate for the new Master (candidate_master is set)

Sun  Aug 14 11:51:17 2016 - [info] Starting GTID based failover.

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] ** Phase 1: Configuration Check Phase  completed.

Sun  Aug 14 11:51:17 2016 - [info]

Sun Aug  14 11:51:17 2016 - [info] * Phase 2: Dead Master Shutdown Phase..

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] Forcing shutdown so that applications never  connect to the current master..

Sun  Aug 14 11:51:17 2016 - [info] Executing master IP deactivation script:

Sun  Aug 14 11:51:17 2016 - [info]    /etc/mha/master_ip_failover --orig_master_host=  --orig_master_ip= --orig_master_port=3306 --command=stopssh  --ssh_user=root 



IN  SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1;/sbin/arping -I eth1 -c 3 -s  >/dev/null 2>&1===


Disabling  the VIP on old master if the server is still UP:

Sun  Aug 14 11:51:17 2016 - [info]  done.

Sun  Aug 14 11:51:17 2016 - [warning] shutdown_script is not set. Skipping  explicit shutting down of the dead master.

Sun  Aug 14 11:51:17 2016 - [info] * Phase 2: Dead Master Shutdown Phase  completed.

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] * Phase 3: Master Recovery Phase..

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] The latest binary log file/position on all  slaves is bin.000014:231

Sun  Aug 14 11:51:17 2016 - [info] Latest slaves (Slaves that received relay log  files to the latest):

Sun  Aug 14 11:51:17 2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun  Aug 14 11:51:17 2016 - [info]     GTID  ON

Sun  Aug 14 11:51:17 2016 - [info]      Replicating from

Sun  Aug 14 11:51:17 2016 - [info]      Primary candidate for the new Master (candidate_master is set)

Sun  Aug 14 11:51:17 2016 - [info] The oldest binary log file/position on all  slaves is bin.000014:231

Sun  Aug 14 11:51:17 2016 - [info] Oldest slaves:

Sun  Aug 14 11:51:17 2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun  Aug 14 11:51:17 2016 - [info]     GTID  ON

Sun  Aug 14 11:51:17 2016 - [info]      Replicating from

Sun  Aug 14 11:51:17 2016 - [info]      Primary candidate for the new Master (candidate_master is set)

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] * Phase 3.3: Determining New Master Phase..

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] Searching new master from slaves..

Sun  Aug 14 11:51:17 2016 - [info]   Candidate masters from the configuration file:

Sun  Aug 14 11:51:17 2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun  Aug 14 11:51:17 2016 - [info]     GTID  ON

Sun  Aug 14 11:51:17 2016 - [info]     Replicating from

Sun  Aug 14 11:51:17 2016 - [info]      Primary candidate for the new Master (candidate_master is set)

Sun  Aug 14 11:51:17 2016 - [info]   Non-candidate masters:

Sun  Aug 14 11:51:17 2016 - [info]  Searching  from candidate_master slaves which have received the latest relay log  events..

Sun  Aug 14 11:51:17 2016 - [info] New master is

Sun  Aug 14 11:51:17 2016 - [info] Starting master failover..

Sun  Aug 14 11:51:17 2016 - [info]

From:  (current master)



To:  (new master)

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info] * Phase 3.3: New Master Recovery Phase..

Sun  Aug 14 11:51:17 2016 - [info]

Sun  Aug 14 11:51:17 2016 - [info]  Waiting  all logs to be applied..

Sun  Aug 14 11:51:17 2016 - [info]   done.

Sun  Aug 14 11:51:17 2016 - [info] Getting new master's binlog name and position..

Sun  Aug 14 11:51:17 2016 - [info]   bin.000015:231

Sun  Aug 14 11:51:17 2016 - [info]  All  other slaves should start replication from here. Statement should be: CHANGE  MASTER TO MASTER_HOST='', MASTER_PORT=3306,  MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';

Sun  Aug 14 11:51:17 2016 - [info] Master Recovery succeeded.  File:Pos:Exec_Gtid_Set: bin.000015, 231,  1683955a-6102-11e6-8b6f-080027ca1592:1-8,


Sun  Aug 14 11:51:17 2016 - [info] Executing master IP activate script:

Sun  Aug 14 11:51:17 2016 - [info]    /etc/mha/master_ip_failover --command=start --ssh_user=root  --orig_master_host= --orig_master_ip=  --orig_master_port=3306 --new_master_host=  --new_master_ip= --new_master_port=3306 --new_master_user='root'  --new_master_password='111111' 

IN  SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1;/sbin/arping -I eth1 -c 3 -s >/dev/null  2>&1===

Enabling the VIP - on the new master -  

Sun  Aug 14 11:51:20 2016 - [info]  OK.

Sun  Aug 14 11:51:20 2016 - [info] ** Finished master recovery successfully.

Sun  Aug 14 11:51:20 2016 - [info] * Phase 3: Master Recovery Phase completed.

Sun  Aug 14 11:51:20 2016 - [info]

Sun  Aug 14 11:51:20 2016 - [info] * Phase 4: Slaves Recovery Phase..

Sun  Aug 14 11:51:20 2016 - [info]

Sun  Aug 14 11:51:20 2016 - [info]

Sun  Aug 14 11:51:20 2016 - [info] * Phase 4.1: Starting Slaves in parallel..

Sun  Aug 14 11:51:20 2016 - [info]

Sun  Aug 14 11:51:20 2016 - [info] All new slave servers recovered successfully.

Sun  Aug 14 11:51:20 2016 - [info]

Sun  Aug 14 11:51:20 2016 - [info] * Phase 5: New master cleanup phase..

Sun  Aug 14 11:51:20 2016 - [info]

Sun  Aug 14 11:51:20 2016 - [info] Resetting slave info on the new master..

Sun  Aug 14 11:51:20 2016 - [info] Resetting slave info succeeded.

Sun  Aug 14 11:51:20 2016 - [info] Master failover to  completed successfully.

Sun  Aug 14 11:51:20 2016 - [info]


----- Failover Report -----


app1: MySQL Master failover to succeeded


Master is down!


Check MHA Manager logs at lab1:/etc/mha/manager.log for details.


Started automated(non-interactive) failover.

Invalidated master IP address on

Selected as a new master. OK: Applying all logs  succeeded. OK: Activated master IP  address. Resetting slave info succeeded.

Master failover to completed  successfully.

从日志中可以看到,master切换至192.168.1.104 slave上,并且VIP也随之绑定到192.168.1.104




change master to master_host='',master_user='repl',master_password='111111',master_auto_position=1;

start slave;


      nohup masterha_manager  --conf=/etc/mha/app1.cnf < /dev/null >/etc/mha/app1.log 2>&1  &


       Shutdown –h now

       对应MHA manager日志:


Sun Aug 14 11:59:09  2016 - [info] MHA::MasterMonitor version 0.56.

Sun Aug 14 11:59:09  2016 - [info] GTID failover mode = 1

Sun Aug 14 11:59:09  2016 - [info] Dead Servers:

Sun Aug 14 11:59:09  2016 - [info] Alive Servers:

Sun Aug 14 11:59:09  2016 - [info]

Sun Aug 14 11:59:09  2016 - [info]

Sun Aug 14 11:59:09  2016 - [info] Alive Slaves:

Sun Aug 14 11:59:09  2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun Aug 14 11:59:09  2016 - [info]     GTID ON

Sun Aug 14 11:59:09  2016 - [info]     Replicating from

Sun Aug 14 11:59:09  2016 - [info]     Primary candidate for  the new Master (candidate_master is set)

Sun Aug 14 11:59:09  2016 - [info] Current Alive Master:

Sun Aug 14 11:59:09  2016 - [info] Checking slave configurations..

Sun Aug 14 11:59:09  2016 - [info]  read_only=1 is not set  on slave

Sun Aug 14 11:59:09  2016 - [info] Checking replication filtering settings..

Sun Aug 14 11:59:09  2016 - [info]  binlog_do_db= ,  binlog_ignore_db=

Sun Aug 14 11:59:09  2016 - [info]  Replication filtering  check ok.

Sun Aug 14 11:59:09  2016 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node  package checking.

Sun Aug 14 11:59:09  2016 - [info] Checking SSH publickey authentication settings on the current  master..

Sun Aug 14 11:59:09  2016 - [info] HealthCheck: SSH to is reachable.

Sun Aug 14 11:59:09  2016 - [info]  (current master)



Sun Aug 14 11:59:09  2016 - [info] Checking master_ip_failover_script status:

Sun Aug 14 11:59:09  2016 - [info]    /etc/mha/master_ip_failover --command=status --ssh_user=root  --orig_master_host= --orig_master_ip=  --orig_master_port=3306



IN SCRIPT TEST====/sbin/ifconfig  eth1:1 down==/sbin/ifconfig eth1:1;/sbin/arping -I eth1 -c 3 -s >/dev/null 2>&1===


Checking the Status of  the script.. OK

Sun Aug 14 11:59:12  2016 - [info]  OK.

Sun Aug 14 11:59:12  2016 - [warning] shutdown_script is not defined.

Sun Aug 14 11:59:12  2016 - [info] Set master ping interval 1 seconds.

Sun Aug 14 11:59:12  2016 - [warning] secondary_check_script is not defined. It is highly  recommended setting it to check master reachability from two or more routes.

Sun Aug 14 11:59:12  2016 - [info] Starting ping health check on

Sun Aug 14 11:59:12  2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

Sun Aug 14 11:59:23  2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone  away)

Sun Aug 14 11:59:23  2016 - [info] Executing SSH check script: exit 0

Sun Aug 14 11:59:23  2016 - [warning] HealthCheck: SSH to is NOT reachable.

Sun Aug 14 11:59:24  2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL  server at 'reading initial communication packet', system error: 111)

Sun Aug 14 11:59:24  2016 - [warning] Connection failed 2 time(s)..

Sun Aug 14 11:59:25  2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL  server at 'reading initial communication packet', system error: 111)

Sun Aug 14 11:59:25  2016 - [warning] Connection failed 3 time(s)..

Sun Aug 14 11:59:26  2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL  server at 'reading initial communication packet', system error: 111)

Sun Aug 14 11:59:26  2016 - [warning] Connection failed 4 time(s)..

Sun Aug 14 11:59:26  2016 - [warning] Master is not reachable from health checker!

Sun Aug 14 11:59:26  2016 - [warning] Master is not reachable!

Sun Aug 14 11:59:26  2016 - [warning] SSH is NOT reachable.

Sun Aug 14 11:59:26  2016 - [info] Connecting to a master server failed. Reading configuration file  /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect  to all servers to check server status..

Sun Aug 14 11:59:26  2016 - [warning] Global configuration file /etc/masterha_default.cnf not  found. Skipping.

Sun Aug 14 11:59:26  2016 - [info] Reading application default configuration from  /etc/mha/app1.cnf..

Sun Aug 14 11:59:26  2016 - [info] Reading server configuration from /etc/mha/app1.cnf..

Sun Aug 14 11:59:27  2016 - [info] GTID failover mode = 1

Sun Aug 14 11:59:27  2016 - [info] Dead Servers:

Sun Aug 14 11:59:27  2016 - [info]

Sun Aug 14 11:59:27  2016 - [info] Alive Servers:

Sun Aug 14 11:59:27  2016 - [info]

Sun Aug 14 11:59:27  2016 - [info] Alive Slaves:

Sun Aug 14 11:59:27  2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun Aug 14 11:59:27  2016 - [info]     GTID ON

Sun Aug 14 11:59:27  2016 - [info]     Replicating from

Sun Aug 14 11:59:27  2016 - [info]     Primary candidate for  the new Master (candidate_master is set)

Sun Aug 14 11:59:27  2016 - [info] Checking slave configurations..

Sun Aug 14 11:59:27  2016 - [info]  read_only=1 is not set  on slave

Sun Aug 14 11:59:27  2016 - [info] Checking replication filtering settings..

Sun Aug 14 11:59:27  2016 - [info]  Replication filtering  check ok.

Sun Aug 14 11:59:27  2016 - [info] Master is down!

Sun Aug 14 11:59:27  2016 - [info] Terminating monitoring script.

Sun Aug 14 11:59:27  2016 - [info] Got exit code 20 (Master dead).

Sun Aug 14 11:59:27  2016 - [info] MHA::MasterFailover version 0.56.

Sun Aug 14 11:59:27  2016 - [info] Starting master failover.

Sun Aug 14 11:59:27  2016 - [info]

Sun Aug 14 11:59:27  2016 - [info] * Phase 1: Configuration Check Phase..

Sun Aug 14 11:59:27  2016 - [info]

Sun Aug 14 11:59:27  2016 - [info] GTID failover mode = 1

Sun Aug 14 11:59:27  2016 - [info] Dead Servers:

Sun Aug 14 11:59:27  2016 - [info]

Sun Aug 14 11:59:27  2016 - [info] Checking master reachability via MySQL(double check)...

Sun Aug 14 11:59:28  2016 - [info]  ok.

Sun Aug 14 11:59:28  2016 - [info] Alive Servers:

Sun Aug 14 11:59:28  2016 - [info]

Sun Aug 14 11:59:28  2016 - [info] Alive Slaves:

Sun Aug 14 11:59:28  2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun Aug 14 11:59:28  2016 - [info]     GTID ON

Sun Aug 14 11:59:28  2016 - [info]     Replicating from

Sun Aug 14 11:59:28  2016 - [info]     Primary candidate for  the new Master (candidate_master is set)

Sun Aug 14 11:59:28  2016 - [info] Starting GTID based failover.

Sun Aug 14 11:59:28  2016 - [info]

Sun Aug 14 11:59:28  2016 - [info] ** Phase 1: Configuration Check Phase completed.

Sun Aug 14 11:59:28  2016 - [info]

Sun Aug 14 11:59:28  2016 - [info] * Phase 2: Dead Master Shutdown Phase..

Sun Aug 14 11:59:28  2016 - [info]

Sun Aug 14 11:59:28  2016 - [info] Forcing shutdown so that applications never connect to the  current master..

Sun Aug 14 11:59:28  2016 - [info] Executing master IP deactivation script:

Sun Aug 14 11:59:28  2016 - [info]    /etc/mha/master_ip_failover --orig_master_host=  --orig_master_ip= --orig_master_port=3306 --command=stop



IN SCRIPT  TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1;/sbin/arping  -I eth1 -c 3 -s >/dev/null 2>&1===


Disabling the VIP on  old master if the server is still UP:

Sun Aug 14 11:59:34  2016 - [info]  done.

Sun Aug 14 11:59:34  2016 - [warning] shutdown_script is not set. Skipping explicit shutting down  of the dead master.

Sun Aug 14 11:59:34  2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.

Sun Aug 14 11:59:34  2016 - [info]

Sun Aug 14 11:59:34  2016 - [info] * Phase 3: Master Recovery Phase..

Sun Aug 14 11:59:34  2016 - [info]

Sun Aug 14 11:59:34  2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..

Sun Aug 14 11:59:34  2016 - [info]

Sun Aug 14 11:59:34  2016 - [info] The latest binary log file/position on all slaves is  bin.000015:231

Sun Aug 14 11:59:34  2016 - [info] Latest slaves (Slaves that received relay log files to the  latest):

Sun Aug 14 11:59:34  2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun Aug 14 11:59:34  2016 - [info]     GTID ON

Sun Aug 14 11:59:34  2016 - [info]     Replicating from

Sun Aug 14 11:59:34  2016 - [info]     Primary candidate for  the new Master (candidate_master is set)

Sun Aug 14 11:59:34  2016 - [info] The oldest binary log file/position on all slaves is  bin.000015:231

Sun Aug 14 11:59:34  2016 - [info] Oldest slaves:

Sun Aug 14 11:59:34  2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun Aug 14 11:59:34  2016 - [info]     GTID ON

Sun Aug 14 11:59:34  2016 - [info]     Replicating from

Sun Aug 14 11:59:34  2016 - [info]     Primary candidate for  the new Master (candidate_master is set)

Sun Aug 14 11:59:34  2016 - [info]

Sun Aug 14 11:59:34  2016 - [info] * Phase 3.3: Determining New Master Phase..

Sun Aug 14 11:59:34  2016 - [info]

Sun Aug 14 11:59:34  2016 - [info] Searching new master from slaves..

Sun Aug 14 11:59:34  2016 - [info]  Candidate masters from  the configuration file:

Sun Aug 14 11:59:34  2016 - [info]   Version=5.6.30-enterprise-commercial-advanced-log (oldest major  version between slaves) log-bin:enabled

Sun Aug 14 11:59:34  2016 - [info]     GTID ON

Sun Aug 14 11:59:34  2016 - [info]     Replicating from

Sun Aug 14 11:59:34  2016 - [info]     Primary candidate for  the new Master (candidate_master is set)

Sun Aug 14 11:59:34  2016 - [info]  Non-candidate masters:

Sun Aug 14 11:59:34  2016 - [info]  Searching from  candidate_master slaves which have received the latest relay log events..

Sun Aug 14 11:59:34  2016 - [info] New master is

Sun Aug 14 11:59:34 2016  - [info] Starting master failover..

Sun Aug 14 11:59:34  2016 - [info]

From:  (current master)



To:  (new master)

Sun Aug 14 11:59:34  2016 - [info]

Sun Aug 14 11:59:34  2016 - [info] * Phase 3.3: New Master Recovery Phase..

Sun Aug 14 11:59:34  2016 - [info]

Sun Aug 14 11:59:34  2016 - [info]  Waiting all logs to be  applied..

Sun Aug 14 11:59:34  2016 - [info]   done.

Sun Aug 14 11:59:34  2016 - [info] Getting new master's binlog name and position..

Sun Aug 14 11:59:34  2016 - [info]  bin.000015:231

Sun Aug 14 11:59:34  2016 - [info]  All other slaves should  start replication from here. Statement should be: CHANGE MASTER TO  MASTER_HOST='', MASTER_PORT=3306, MASTER_AUTO_POSITION=1,  MASTER_USER='repl', MASTER_PASSWORD='xxx';

Sun Aug 14 11:59:34  2016 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: bin.000015,  231, 1683955a-6102-11e6-8b6f-080027ca1592:1-8,


Sun Aug 14 11:59:34  2016 - [info] Executing master IP activate script:

Sun Aug 14 11:59:34  2016 - [info]    /etc/mha/master_ip_failover --command=start --ssh_user=root  --orig_master_host= --orig_master_ip=  --orig_master_port=3306 --new_master_host=  --new_master_ip= --new_master_port=3306 --new_master_user='root'  --new_master_password='111111' 



IN SCRIPT  TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1;/sbin/arping -I eth1 -c 3 -s  >/dev/null 2>&1===


Enabling the  VIP - on the new master -

Sun Aug 14 11:59:37  2016 - [info]  OK.

Sun Aug 14 11:59:37  2016 - [info] ** Finished master recovery successfully.

Sun Aug 14 11:59:37  2016 - [info] * Phase 3: Master Recovery Phase completed.

Sun Aug 14 11:59:37  2016 - [info]

Sun Aug 14 11:59:37  2016 - [info] * Phase 4: Slaves Recovery Phase..

Sun Aug 14 11:59:37  2016 - [info]

Sun Aug 14 11:59:37  2016 - [info]

Sun Aug 14 11:59:37  2016 - [info] * Phase 4.1: Starting Slaves in parallel..

Sun Aug 14 11:59:37  2016 - [info]

Sun Aug 14 11:59:37  2016 - [info] All new slave servers recovered successfully.

Sun Aug 14 11:59:37  2016 - [info]

Sun Aug 14 11:59:37  2016 - [info] * Phase 5: New master cleanup phase..

Sun Aug 14 11:59:37  2016 - [info]

Sun Aug 14 11:59:37  2016 - [info] Resetting slave info on the new master..

Sun Aug 14 11:59:37  2016 - [info] Resetting  slave info succeeded.

Sun Aug 14 11:59:37  2016 - [info] Master failover to completed  successfully.

Sun Aug 14 11:59:37  2016 - [info]


-----  Failover Report -----


app1: MySQL  Master failover to succeeded


Master is down!

Check MHA  Manager logs at lab1:/etc/mha/manager.log for details.

Started  automated(non-interactive) failover.

Invalidated  master IP address on

Selected as a new master.  OK: Applying all logs succeeded.  OK: Activated master IP address.  Resetting slave info succeeded.

Master  failover to completed successfully.


可以看到Master已经切换至192.168.1.103 上,VIP也绑定至192.168.1.103上。