问题描述:搭建mysql mha执行masterha_check_repl时报错Failed to get master_ip_failover_script status with return code 255:0,如下所示:
系统:rhel 7.9 64位
mha安装包:mha4mysql-node-0.58.tar.gz、mha4mysql-manager-0.58.tar.gz
数据库:mysql 5.7.21
1、问题重现
[root@mha-manager ~]# masterha_check_repl -conf=/etc/masterha/app1.cnf
Sun Jun 30 12:22:09 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun 30 12:22:09 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jun 30 12:22:09 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jun 30 12:22:09 2024 - [info] MHA::MasterMonitor version 0.58.
Sun Jun 30 12:22:10 2024 - [info] GTID failover mode = 0
Sun Jun 30 12:22:10 2024 - [info] Dead Servers:
Sun Jun 30 12:22:10 2024 - [info] Alive Servers:
Sun Jun 30 12:22:10 2024 - [info]   192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:22:10 2024 - [info]   192.168.133.72(192.168.133.72:3306)
Sun Jun 30 12:22:10 2024 - [info]   192.168.133.73(192.168.133.73:3306)
Sun Jun 30 12:22:10 2024 - [info] Alive Slaves:
Sun Jun 30 12:22:10 2024 - [info]   192.168.133.72(192.168.133.72:3306)  Version=5.7.21-20-log (oldest major version between slaves) log-bin:enabled
Sun Jun 30 12:22:10 2024 - [info]     Replicating from 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:22:10 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun 30 12:22:10 2024 - [info]   192.168.133.73(192.168.133.73:3306)  Version=5.7.21-20-log (oldest major version between slaves) log-bin:enabled
Sun Jun 30 12:22:10 2024 - [info]     Replicating from 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:22:10 2024 - [info] Current Alive Master: 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:22:10 2024 - [info] Checking slave configurations..
Sun Jun 30 12:22:10 2024 - [info] Checking replication filtering settings..
Sun Jun 30 12:22:10 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Sun Jun 30 12:22:10 2024 - [info]  Replication filtering check ok.
Sun Jun 30 12:22:10 2024 - [info] GTID (with auto-pos) is not supported
Sun Jun 30 12:22:10 2024 - [info] Starting SSH connection tests..
Sun Jun 30 12:22:13 2024 - [info] All SSH connection tests passed successfully.
Sun Jun 30 12:22:13 2024 - [info] Checking MHA Node version..
Sun Jun 30 12:22:13 2024 - [info]  Version check ok.
Sun Jun 30 12:22:13 2024 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jun 30 12:22:14 2024 - [info] HealthCheck: SSH to 192.168.133.71 is reachable.
Sun Jun 30 12:22:14 2024 - [info] Master MHA Node version is 0.58.
Sun Jun 30 12:22:14 2024 - [info] Checking recovery script configurations on 192.168.133.71(192.168.133.71:3306)..
Sun Jun 30 12:22:14 2024 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000002 
Sun Jun 30 12:22:14 2024 - [info]   Connecting to root@192.168.133.71(192.168.133.71:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /mysql/binlog, up to mysql-bin.000002
Sun Jun 30 12:22:14 2024 - [info] Binlog setting check done.
Sun Jun 30 12:22:14 2024 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Jun 30 12:22:14 2024 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.133.72 --slave_ip=192.168.133.72 --slave_port=3306 --workdir=/tmp --target_version=5.7.21-20-log --manager_version=0.58 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
Sun Jun 30 12:22:14 2024 - [info]   Connecting to root@192.168.133.72(192.168.133.72:22).. 
  Checking slave recovery environment settings..
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to relay-bin.000002
    Temporary relay log file is /mysql/data/relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 30 12:22:14 2024 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.133.73 --slave_ip=192.168.133.73 --slave_port=3306 --workdir=/tmp --target_version=5.7.21-20-log --manager_version=0.58 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
Sun Jun 30 12:22:14 2024 - [info]   Connecting to root@192.168.133.73(192.168.133.73:22).. 
  Checking slave recovery environment settings..
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to relay-bin.000002
    Temporary relay log file is /mysql/data/relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 30 12:22:15 2024 - [info] Slaves settings check done.
Sun Jun 30 12:22:15 2024 - [info] 
192.168.133.71(192.168.133.71:3306) (current master)
 +--192.168.133.72(192.168.133.72:3306)
 +--192.168.133.73(192.168.133.73:3306)

Sun Jun 30 12:22:15 2024 - [info] Checking replication health on 192.168.133.72..
Sun Jun 30 12:22:15 2024 - [info]  ok.
Sun Jun 30 12:22:15 2024 - [info] Checking replication health on 192.168.133.73..
Sun Jun 30 12:22:15 2024 - [info]  ok.
Sun Jun 30 12:22:15 2024 - [info] Checking master_ip_failover_script status:
Sun Jun 30 12:22:15 2024 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.133.71 --orig_master_ip=192.168.133.71 --orig_master_port=3306 
"my" variable $exit_code masks earlier declaration in same scope at /usr/local/bin/master_ip_failover line 43.
Sun Jun 30 12:22:15 2024 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln229]  Failed to get master_ip_failover_script status with return code 255:0.
Sun Jun 30 12:22:15 2024 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/local/bin/masterha_check_repl line 48.
Sun Jun 30 12:22:15 2024 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Sun Jun 30 12:22:15 2024 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

2、异常分析
根据提示可知master_ip_failover_script脚本存在问题.

3、解决方案
重新编辑master_ip_failover_script脚本,该脚本内容来自https://blog.csdn.net/weixin_34194379/article/details/93531722网址.
[root@mha-manager bin]# vi master_ip_failover
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

my $vip = '192.168.133.75/24';
my $key = '88';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
    if ( $command eq "stop" || $command eq "stopssh" ) {
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

说明:以上脚本仅对VIP参数进行过修改.

4、验证
[root@mha-manager bin]# masterha_check_repl -conf=/etc/masterha/app1.cnf
Sun Jun 30 12:49:30 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun 30 12:49:30 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Jun 30 12:49:30 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Jun 30 12:49:30 2024 - [info] MHA::MasterMonitor version 0.58.
Sun Jun 30 12:49:31 2024 - [info] GTID failover mode = 0
Sun Jun 30 12:49:31 2024 - [info] Dead Servers:
Sun Jun 30 12:49:31 2024 - [info] Alive Servers:
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.72(192.168.133.72:3306)
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.73(192.168.133.73:3306)
Sun Jun 30 12:49:31 2024 - [info] Alive Slaves:
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.72(192.168.133.72:3306)  Version=5.7.21-20-log (oldest major version between slaves) log-bin:enabled
Sun Jun 30 12:49:31 2024 - [info]     Replicating from 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:49:31 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Jun 30 12:49:31 2024 - [info]   192.168.133.73(192.168.133.73:3306)  Version=5.7.21-20-log (oldest major version between slaves) log-bin:enabled
Sun Jun 30 12:49:31 2024 - [info]     Replicating from 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:49:31 2024 - [info] Current Alive Master: 192.168.133.71(192.168.133.71:3306)
Sun Jun 30 12:49:31 2024 - [info] Checking slave configurations..
Sun Jun 30 12:49:31 2024 - [info] Checking replication filtering settings..
Sun Jun 30 12:49:31 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Sun Jun 30 12:49:31 2024 - [info]  Replication filtering check ok.
Sun Jun 30 12:49:31 2024 - [info] GTID (with auto-pos) is not supported
Sun Jun 30 12:49:31 2024 - [info] Starting SSH connection tests..
Sun Jun 30 12:49:33 2024 - [info] All SSH connection tests passed successfully.
Sun Jun 30 12:49:33 2024 - [info] Checking MHA Node version..
Sun Jun 30 12:49:34 2024 - [info]  Version check ok.
Sun Jun 30 12:49:34 2024 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jun 30 12:49:34 2024 - [info] HealthCheck: SSH to 192.168.133.71 is reachable.
Sun Jun 30 12:49:34 2024 - [info] Master MHA Node version is 0.58.
Sun Jun 30 12:49:34 2024 - [info] Checking recovery script configurations on 192.168.133.71(192.168.133.71:3306)..
Sun Jun 30 12:49:34 2024 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/mysql/binlog --output_file=/tmp/save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000002 
Sun Jun 30 12:49:34 2024 - [info]   Connecting to root@192.168.133.71(192.168.133.71:22).. 
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /mysql/binlog, up to mysql-bin.000002
Sun Jun 30 12:49:34 2024 - [info] Binlog setting check done.
Sun Jun 30 12:49:34 2024 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Jun 30 12:49:34 2024 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.133.72 --slave_ip=192.168.133.72 --slave_port=3306 --workdir=/tmp --target_version=5.7.21-20-log --manager_version=0.58 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
Sun Jun 30 12:49:34 2024 - [info]   Connecting to root@192.168.133.72(192.168.133.72:22).. 
  Checking slave recovery environment settings..
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to relay-bin.000002
    Temporary relay log file is /mysql/data/relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 30 12:49:35 2024 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.133.73 --slave_ip=192.168.133.73 --slave_port=3306 --workdir=/tmp --target_version=5.7.21-20-log --manager_version=0.58 --relay_log_info=/mysql/data/relay-log.info  --relay_dir=/mysql/data/  --slave_pass=xxx
Sun Jun 30 12:49:35 2024 - [info]   Connecting to root@192.168.133.73(192.168.133.73:22).. 
  Checking slave recovery environment settings..
    Opening /mysql/data/relay-log.info ... ok.
    Relay log found at /mysql/data, up to relay-bin.000002
    Temporary relay log file is /mysql/data/relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Jun 30 12:49:35 2024 - [info] Slaves settings check done.
Sun Jun 30 12:49:35 2024 - [info] 
192.168.133.71(192.168.133.71:3306) (current master)
 +--192.168.133.72(192.168.133.72:3306)
 +--192.168.133.73(192.168.133.73:3306)

Sun Jun 30 12:49:35 2024 - [info] Checking replication health on 192.168.133.72..
Sun Jun 30 12:49:35 2024 - [info]  ok.
Sun Jun 30 12:49:35 2024 - [info] Checking replication health on 192.168.133.73..
Sun Jun 30 12:49:35 2024 - [info]  ok.
Sun Jun 30 12:49:35 2024 - [info] Checking master_ip_failover_script status:
Sun Jun 30 12:49:35 2024 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.133.71 --orig_master_ip=192.168.133.71 --orig_master_port=3306 


IN SCRIPT TEST====/sbin/ifconfig ens33:88 down==/sbin/ifconfig ens33:88 192.168.133.75/24===

Checking the Status of the script.. OK 
Sun Jun 30 12:49:35 2024 - [info]  OK.
Sun Jun 30 12:49:35 2024 - [warning] shutdown_script is not defined.
Sun Jun 30 12:49:35 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

说明:如上所示,masterha_check_repl检查返回成功.

5、此前脚本内容
说明:该脚本内容来自https://blog.csdn.net/m0_59439550/article/details/121106544网址.
[root@mha-manager bin]# cat master_ip_failover.bak20240630
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.133.75';
my $brdc = '192.168.133.255';
my $ifdev = 'ens33';
my $key = '1'; 
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
my $exit_code = 0;
#my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
#my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
};
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
## A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}