为什么

传统复制和GTIDs切换的缺点

当replication故障出现之后,最头疼是replication架构的调整

一旦master down了,就需要配置某一台slave作为master

slave上开启二进制日志文件,写操作配置成新的slave。

如果架构是MSS,新的relay提升为master,后面的slave都需要change master to host,binlog-file,postion。还得保证数据的一致性,所以所要花费的时间很长

 

GTIDs只需要change master to new_host,但是在每台机器都要执行

 

所以我们使用mysql自带的fail-over,并且提供它提供python API。后期可以整合到自动运维平台中去

 

下载与安装软件

MySQL Utilities https://dev.mysql.com/downloads/utilities/

##maintaining and administering MySQL servers

Connector/Python https://dev.mysql.com/downloads/connector/python/

## a standardized database driver for Python platforms and development

yum install mysql-connector-python-2.1.3-1.el6.x86_64.rpm mysql-utilities-1.5.6-1.el6.noarch.rpm -y

 

服务搭建:

环境准备

monitor     server1     192.168.88.121        ##监控最好有独立的服务器

master     server2     192.168.88.122

slave     server3     192.168.88.123

slave     server4     192.168.88.124

监控机需要连接到MS上,获取运行状态

授权: 基本的:super, replication,slave,reload,

有些时候,当多个程序运行mysql failover,##监控避免单点

create,insert,drop,select (--force否则failed,一旦出错就会停止复制)

 

grant create,insert,drop,select,super,replication slave,reload on *.* to repm@192.168.88.121 identified by '123' with grant option;

 

检测授权是否成功

show grants for 'repm'@'192.168.88.121'

在monitor测试mysql -urepmon -predhat -h 192.168.88.122

 

配置文件

##删除skip-slave-start

+++增加配置

#add fail-over

report-host=自己IP                 ##向监控端报告自己的IP

master-info-repository=table            ##将主机信息保存在表中

relay-log-info-repository=table        ##将中继信息保存在表中

+++

将相应的replication配置的缓存文件保存到数据表中,一般的情况下,slave它的master的相关信息以及复制当前的信息保存在master.info和relay-log.info,用处:在重新启动mysqld,mysql将自动启动slave,而主机的信息和复制的信息就通过这两个文件中的信息来获取

 

如果想自动化监控复制和切换,故障出现,就得重新指定master和binlog,position,如果保存在文件中的话,可能监控端需要相关的权限操作文件,所以可以在mysql的表中,这样修改的,也可以实时生效

注意:如果把master和relay-info保存在mysql的表,mysql锁创建的表是Myisam表,但是官方建议使用Innodb存储引擎,5.6之后呢默认时innodb,避免Myisam的自动修复功能

 

修改完成重启mysqld。

注意下mysql数据库中slave_master_info slave_relay_log_info两张表

 

启动监控端:

mysqlfailover --master=repmon:redhat@192.168.88.122 --discover-slaves-login=repmon:redhat

--master指定M,后接"用户名:密码@host"

--discover-slaves-login自动发现slave。后接连接slave的用户名和密码

--log=file.log        ##指定日志

--failover-mode    ##auto(default,没有slave可选就退出),elect(在制定的slave选取),fail(用于监控,没有failover)

 

#####

GTID Executed Set

c09756b8-a7e7-11e5-9468-000c29df5442:1-24

 

WARNING: Errant transaction(s) found on slave(s).

Replication Health Status

+-----------------+-------+---------+--------+------------+---------+

| host | port | role | state | gtid_mode | health |

+-----------------+-------+---------+--------+------------+---------+

| 192.168.88.122 | 3306 | MASTER | UP | ON | OK |

| 192.168.88.123 | 3306 | SLAVE | UP | ON | OK |

| 192.168.88.124 | 3306 | SLAVE | UP | ON | OK |

+-----------------+-------+---------+--------+------------+---------+

#####

现在测试功能

停掉master,看slave是否接管master,并调整架构

/etc/init.d/mysqld stop

下面是monitor上的调整信息

Failed to reconnect to the master after 3 attemps.

 

Failover starting in 'auto' mode...

# Candidate slave 192.168.88.123:3306 will become the new master.

# Checking slaves status (before failover).

# Preparing candidate for failover.

# Creating replication user if it does not exist.

# ERROR: ERROR: Cannot grant replication slave to replication user.

# Stopping slaves.

# Performing STOP on all slaves.

# Switching slaves to new master.

# Disconnecting new master as slave.

# Starting slaves.

# Performing START on all slaves.

# Checking slaves for errors.

# Failover complete.

# Discovering slaves for master at 192.168.88.123:3306

######新的架构

b89f9be8-a8af-11e5-9980-000c29ccacd8:1-2 [...]

 

Transactions executed on the servers:

+-----------------+-------+---------+--------+------------+---------+

| host | port | role | state | gtid_mode | health |

+-----------------+-------+---------+--------+------------+---------+

| 192.168.88.123 | 3306 | MASTER | UP | ON | OK |

| 192.168.88.124 | 3306 | SLAVE | UP | ON | OK |

+-----------------+-------+---------+--------+------------+---------+

####

在新的master(server3)上测试插入之后数据时候同步

 

但是当原来的master(server2) 恢复会正常的时候,mysql failover是不能够自动发现,并调整为原来的架构的。

所以要把master添加到集群,只能手动的调整

> change master to

> master_host='192.168.88.123',

> master_user='rep',

> master_password='redhat',

> master_auto_position=1;

此时监控端又可以检测到server2了