简介:
1、MHA简介
- MHA介绍
MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
- MHA组成
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。
MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。
MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。另外对于想快速搭建的可以参考:MHA快速搭建
我们自己使用其实也可以使用1主1从,但是master主机宕机后无法切换,以及无法补全binlog。master的mysqld进程crash后,还是可以切换成功,以及补全binlog的。
- MHA工作原理
(1)从宕机崩溃的master保存二进制日志事件(binlog events);
(2)识别含有最新更新的slave;
(3)应用差异的中继日志(relay log)到其他的slave;
(4)应用从master保存的二进制日志事件(binlog events);
(5)提升一个slave为新的master;
(6)使其他的slave连接新的master进行复制;
1.4 MHA工具
MHA软件由两部分组成,Manager工具包和Node工具包
Manager 节点工具包
masterha_check_ssh:MHA 依赖的 ssh 环境监测工具
masterha_check_repl:MYSQL 主从复制环境检测工具(slave状态、文件配置)
masterga_manager:MHA 服务主程序
masterha_check_status:MHA 运行状态探测工具
masterha_master_monitor:MYSQL master 节点可用性监测工具
masterha_master_swith:master 节点切换工具(自动或者手动)
masterha_conf_host:添加或删除配置的节点
masterha_stop:关闭 MHA 服务的工具
Node 节点工具包
save_binary_logs:保存和复制 master 的二进制日志
apply_diff_relay_logs:识别差异的中继日志事件并应用于其他slave
purge_relay_logs:清除中继日志(不会阻塞 SQL 线程)
自定义扩展
secondary_check_script:通过多条网络路由检测 master 的可用性;
master_ip_failover_script:更新 application 使用的 masterip;
report_script:发送报告;
init_conf_load_script:加载初始配置参数;
master_ip_online_change_script;更新 master 节点 ip 地址
2、MHA搭建
2.1 环境信息
环境名称 | 主机名称 | Mysql版本 | IP | 端口 | OS 版本 |
主库 | master | Mysql8.0.30 | 172.16.134.24 | 3310 | Centos 7.2 |
从库(MHA Manager) | Slave01 | Mysql8.0.30 | 172.16.134.25 | 3310 | Centos 7.2 |
从库 | Slave02 | Mysql8.0.30 | 172.16.134.26 | 3310 | Centos 7.2 |
2.2 基础环境搭建
MHA 基础环境主从搭建见复制文档
创建监控用户(master)
Create user 'root'@'172.16.134.%' identified by '123456';
grant all privileges on *.* to 'root'@'172.16.134.%'
flush privileges;
2.3 MHA安装
Manager 安装
yum install perl-DBD-MySQL
yum install perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
Node 安装
yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
安装完成后/bin 下会生产对应的工具
-rwxr-xr-x 1 root root 1995 Mar 23 2018 masterha_check_repl
-rwxr-xr-x 1 root root 1779 Mar 23 2018 masterha_check_ssh
-rwxr-xr-x 1 root root 1865 Mar 23 2018 masterha_check_status
-rwxr-xr-x 1 root root 3201 Mar 23 2018 masterha_conf_host
-rwxr-xr-x 1 root root 2517 Mar 23 2018 masterha_manager
-rwxr-xr-x 1 root root 2165 Mar 23 2018 masterha_master_monitor
-rwxr-xr-x 1 root root 2373 Mar 23 2018 masterha_master_switch
-rwxr-xr-x 1 root root 5172 Mar 23 2018 masterha_secondary_check
-rwxr-xr-x 1 root root 1739 Mar 23 2018 masterha_stop
-rwxr-xr-x 1 root root 7525 Mar 23 2018 save_binary_logs
-rwxr-xr-x 1 root root 8337 Mar 23 2018 purge_relay_logs
-rwxr-xr-x 1 root root 4807 Mar 23 2018 filter_mysqlbinlog
-rwxr-xr-x 1 root root 17639 Mar 23 2018 apply_diff_relay_logs
2.4 配置互信
cd ~
ssh-keygen -t rsa #一路回车
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
在24机器上执行以下命令
ssh 172.16.134.24 cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
ssh 172.16.134.25 cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
ssh 172.16.134.26 cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys 172.16.134.25:~/.ssh/
scp ~/.ssh/authorized_keys 172.16.134.26:~/.ssh/
通过ssh root@172.16.134.25 命令直接登录,不需要输入密码
2.5 配置MHA
1、创建MHA工作目录,并创建相关配置文件(软件包解压后目录有样例配置文件)
MHA Manage 节点上
#创建MHA配置文件目录
mkdir /etc/mha
# 创建MHA脚本目录
mkdir /etc/mha/scripts
# 创建MHA日志目录
mkdir /var/log/mha/
# 创建日志目录(app1是自定义的,因为manager可以同时管理多个集群,所以这里取了这个名)
mkdir -p /var/log/mha/app1
# 创建日志文件
touch /var/log/mha/app1/manager.log
cp /home/soft/mha4mysql-manager-0.58/samples/conf/* /etc/mha/
配置主从配置文件,candidate_master=1的意思是是否能提升为master,可以不加。
Vi /etc/mha/app1.cnf
[server default]
manager_workdir=/var/log/mha/app1
manager_log=/var/log/mha/app1/manager.log
[server1]
hostname=172.16.134.24
port=3310
[server2]
hostname=172.16.134.25
port=3310
candidate_master=1
check_repl_delay=0
[server3]
hostname=172.16.134.26
port=3310
no_master=1
配置主要的配置文件
[server default]
user=root
password=123456
repl_user=repl
repl_password=rep1123
ssh_user=root
ping_interval=1
master_binlog_dir= /home/mysql3310/binlog/
manager_workdir=/var/log/mha/app1
manager_log=/var/log/mha/app1/manager.log
master_ip_failover_script="/etc/mha/scripts/master_ip_failover"
master_ip_online_change_script="/etc/mha/scripts/master_ip_online_change"
report_script="/etc/mha/scripts/send_report"
remote_workdir=/tmp
secondary_check_script= /usr/local/bin/masterha_secondary_check -s 172.16.134.25 -s 172.16.134.26
shutdown_script=""
[server1]
hostname=172.16.134.24
port=3310
[server2]
hostname=172.16.134.25
port=3310
candidate_master=1
check_repl_delay=0
[server3]
hostname=172.16.134.26
port=3310
no_master=1
脚本配置(vip)
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
);
my $vip = '10.89.181.9/25';
my $key = '0';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip netmask 255.255.255.128";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
exit &main();
sub main {
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=
port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
2.6 信任检测
root@ebsproddb.ys:/etc/mha/scripts$ masterha_check_ssh --conf=/etc/mha/app1.cnf
Fri Mar 10 18:25:20 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Mar 10 18:25:20 2023 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Fri Mar 10 18:25:20 2023 - [info] Reading server configuration from /etc/mha/app1.cnf..
Fri Mar 10 18:25:20 2023 - [info] Starting SSH connection tests..
Fri Mar 10 18:25:21 2023 - [debug]
Fri Mar 10 18:25:20 2023 - [debug] Connecting via SSH from root@172.16.134.24(172.16.134.24:22) to root@172.16.134.25(172.16.134.25:22)..
Fri Mar 10 18:25:20 2023 - [debug] ok.
Fri Mar 10 18:25:20 2023 - [debug] Connecting via SSH from root@172.16.134.24(172.16.134.24:22) to root@172.16.134.26(172.16.134.26:22)..
Fri Mar 10 18:25:21 2023 - [debug] ok.
Fri Mar 10 18:25:22 2023 - [debug]
Fri Mar 10 18:25:21 2023 - [debug] Connecting via SSH from root@172.16.134.25(172.16.134.25:22) to root@172.16.134.24(172.16.134.24:22)..
Fri Mar 10 18:25:21 2023 - [debug] ok.
Fri Mar 10 18:25:21 2023 - [debug] Connecting via SSH from root@172.16.134.25(172.16.134.25:22) to root@172.16.134.26(172.16.134.26:22)..
Fri Mar 10 18:25:21 2023 - [debug] ok.
Fri Mar 10 18:25:23 2023 - [debug]
Fri Mar 10 18:25:21 2023 - [debug] Connecting via SSH from root@172.16.134.26(172.16.134.26:22) to root@172.16.134.24(172.16.134.24:22)..
Fri Mar 10 18:25:21 2023 - [debug] ok.
Fri Mar 10 18:25:21 2023 - [debug] Connecting via SSH from root@172.16.134.26(172.16.134.26:22) to root@172.16.134.25(172.16.134.25:22)..
Fri Mar 10 18:25:22 2023 - [debug] ok.
Fri Mar 10 18:25:23 2023 - [info] All SSH connection tests passed successfully
检查成功
2.7 检查主从复制是否成功
root@ebsproddb.ys:/etc/mha$ masterha_check_repl --conf=/etc/mha/masterha_default.cnf
Fri Mar 10 19:18:34 2023 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Mar 10 19:18:34 2023 - [info] Reading application default configuration from /etc/mha/masterha_default.cnf..
Fri Mar 10 19:18:34 2023 - [info] Reading server configuration from /etc/mha/masterha_default.cnf..
Fri Mar 10 19:18:34 2023 - [info] MHA::MasterMonitor version 0.58.
Fri Mar 10 19:18:35 2023 - [info] GTID failover mode = 1
Fri Mar 10 19:18:35 2023 - [info] Dead Servers:
Fri Mar 10 19:18:35 2023 - [info] Alive Servers:
Fri Mar 10 19:18:35 2023 - [info] 172.16.134.25(172.16.134.25:3310)
Fri Mar 10 19:18:35 2023 - [info] 172.16.134.26(172.16.134.26:3310)
Fri Mar 10 19:18:35 2023 - [info] Alive Slaves:
Fri Mar 10 19:18:35 2023 - [info] 172.16.134.26(172.16.134.26:3310) Version=8.0.30 (oldest major version between slaves) log-bin:enabled
Fri Mar 10 19:18:35 2023 - [info] GTID ON
Fri Mar 10 19:18:35 2023 - [info] Replicating from 172.16.134.25(172.16.134.25:3310)
Fri Mar 10 19:18:35 2023 - [info] Current Alive Master: 172.16.134.25(172.16.134.25:3310)
Fri Mar 10 19:18:35 2023 - [info] Checking slave configurations..
Fri Mar 10 19:18:35 2023 - [info] Checking replication filtering settings..
Fri Mar 10 19:18:35 2023 - [info] binlog_do_db= , binlog_ignore_db=
Fri Mar 10 19:18:35 2023 - [info] Replication filtering check ok.
Fri Mar 10 19:18:35 2023 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Mar 10 19:18:35 2023 - [info] Checking SSH publickey authentication settings on the current master..
Fri Mar 10 19:18:35 2023 - [info] HealthCheck: SSH to 172.16.134.25 is reachable.
Fri Mar 10 19:18:35 2023 - [info]
172.16.134.25(172.16.134.25:3310) (current master)
+--172.16.134.26(172.16.134.26:3310)
Fri Mar 10 19:18:35 2023 - [info] Checking replication health on 172.16.134.26..
Fri Mar 10 19:18:35 2023 - [info] ok.
Fri Mar 10 19:18:35 2023 - [info] Checking master_ip_failover_script status:
Fri Mar 10 19:18:35 2023 - [info] /etc/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.134.25 --orig_master_ip=172.16.134.25 --orig_master_port=3310
Fri Mar 10 19:18:35 2023 - [info] OK.
Fri Mar 10 19:18:35 2023 - [warning] shutdown_script is not defined.
Fri Mar 10 19:18:35 2023 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
2.8 启动监控进程
nohup masterha_manager --conf=/etc/mha/masterha_default.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log &
查看状态
root@ebsproddb.ys:/etc/mha$ masterha_check_status --conf=/etc/mha/masterha_default.cnf
masterha_default (pid:14977) is running(0:PING_OK), master:172.16.134.24
参数说明:
--remove_dead_master_conf 该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除
--manger_log 日志存放位置
--ignore_last_failover 在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover
停止MHA监控
masterha_stop --conf=/etc/mha/masterha_default.cnf