基于MySQL-mmm实现MySQL的高可用

原创

SueK1ng 2014-09-21 20:12:22 博主文章分类：MySQL数据库 ©著作权

文章标签 monitor 高可用 mysql-mmm mariadb 文章分类 服务器

©著作权归作者所有：来自51CTO博客作者SueK1ng的原创作品，请联系作者获取转载授权，否则将追究法律责任

MySQL-mmm是Multi-Master Replication Manager For MySQL的缩写，该软件是为MySQL数据库实现主主复制功能而实现的一系列脚本。

MySQL-mmm的功用：

可以实现MySQL数据库的高可用效果，当主数据库出现故障后，会自动切换其他主服务器，保证业务不会中断，也能对主从数据库服务器实现读负载均衡，在具体的实现细节中，它可以实现一组基于复制的虚拟IP，还能有对数据备份、节点之间重新同步的功能。

MySQL-mmm的使用场景：

用户的读写请求已经完成了读写分离，无论是前端的应用程序层面解决还是使用了MySQL语句路由，到达数据库时，读写分离的工作已经完成，MySQL-MMM主要完成的是数据库节点间工作状态的监控及资源的流转操作。

MySQL-mmm运行的进程：

mysql-mmm是一组脚本套件，其安装套件中有mysql-mmm-monitor、mysql-mmm-tools、mysql-mmm-agent、mysql-mmm四部分。

monitor程序包安装后启动的服务用于监控在common配置文件中定义的数据库工作状况，通常我们使用的数据库主服务器中有一个节点是允许写入数据，其他的节点只能读取数据，如果主服务器故障，监控程序会通知另外的服务器端启动主服务器端原有的资源，并代替之前的主服务器的工作，更多的是接替了主服务器的写入的任务。

mysql-agent程序包是安装在各个数据库节点上的代理程序，其主要功用在于当其他节点尤其是主节点出现故障后，其可以收到mysql-monitor进程的信息，并根据自身的配置自动修改其工作特性，比如提升自己为主服务器，接收写请求，或者是修改主服务器指向等。

实例：使用MySQL-mmm实现双主模型的数据库的高可用

使用三台虚拟机：node1为主服务器，接收用户的写操作，IP：172.16.103.2

node2为备用主服务器，接收用户的读请求，IP：172.16.103.3

monitor主机为监控用服务器，其IP：172.16.103.1

两个主服务器使用的VIP地址为：172.16.103.200（接收写请求的IP）、172.16.103.201、172.16.103.202（后两个IP接收读请求）

实验用简易拓扑：

一、准备工作：

配置好/etc/hosts文件，注意配置好hosts文件很关键，因为连接数据库服务器默认使用的是第一个与IP地址匹配的主机名，这个名称要和授权的IP地址反解出来的主机名保持一致，否则就会出现无法连接或者无法写入的错误，示例：

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.103.1 monitor monitor.cluster.com
172.16.103.2 node1 node1.cluster.com
172.16.103.3 node2 node2.cluster.com
172.16.0.1 server.magelinux.com server

注意：简短的主机名放在了完整主机名的前面比如172.16.103.1 monitor monitor.cluster.com而不是习惯性的172.16.103.1 monitor.cluster.com monitor，因为在后续的配置mmm的过程中都是使用的简短的主机名，而我们在连接数据库建立授权账号的时候又往往使用的是IP地址，这些解析的对应关系一定要弄清楚，数据库只解析第一个匹配到的主机名，注意在这里是只解析前面的简短的主机名，我的意思是在解析172.16.103.1时数据库会只解析node1而不会去解析node1.cluster.com，如果完整的主机名放在了前面，那么后续在解析时是会出问题的，也即无法连接数据库去执行写操作。如果觉得这样的配置很繁琐，就在数据库的配置文件中mysqld字段添加skip-name-resolve一行就可以避免这些问题了。

二、在两台虚拟机上安装数据库，安装过程未给出，请参考前面博客内容，直接给出数据库的配置文件：

1、node1:(也只列出了需要添加或者需要留意的条目）

datadir = /mydata/data
auto-increment-increment = 2
auto-increment-offset = 2
relay-log=mysql-relay
relay-log-index=mysql-relay.index
log-bin=mysql-bin
binlog_format=mixed
server-id  = 1

由于两个数据库服务器是主主模型，都要从对方复制数据，所以都要开启中继日志和二进制日志，而且为了避免数据出现讹误，记录数据时使用不同的ID，偏移位置是不同的，而且要注意服务器的ID号一定是不同的。

2、node2:

datadir = /mydata/data
auto-increment-increment = 2
auto-increment-offset = 1
relay-log=mysql-relay
relay-log-index=mysql-relay.index
log-bin=mysql-bin
server-id = 2

3、连接node1数据库，创建授权账号，由于两个数据库节点之间是复制数据的，所以可以在一个数据库上执行授权操作，另一个数据库只需要在指向主服务器时，指向的记录位置在创建账号之前就可以了。

MariaDB [(none)]> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000003 |      245 |              |                  |
+------------------+----------+--------------+------------------+

MariaDB [(none)]> GRANT REPLICATION CLIENT,REPLICATION SLAVE ON *.* TO repl@'172.16.%.%' IDENTIFIED BY 'repl';
MariaDB [(none)]> GRANT REPLICATION CLIENT,REPLICATION SLAVE ON *.* TO 'mmm_monitor'@'172.16.%.%' IDENTIFIED BY 'mmm_monitor';
MariaDB [(none)]> GRANT PROCESS,SUPER,REPLICATION CLIENT,REPLICATION SLAVE ON *.* TO 'mmm_agent'@'172.16.%.%' IDENTIFIED BY 'mmm_agent'

4、连接node2数据库：执行CHANGE MASTER TO操作，要注意指向的数据库文件及pos！

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='172.16.103.2',MASTER_USER='repl',MASTER_PASSWORD='repl',MASTER_LOG_FILE='mysql-bin.000003',MASTER_LOG_POS=245;
MariaDB [(none)]> START SLAVE;
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.103.2
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 960
               Relay_Log_File: mysql-relay.000002
                Relay_Log_Pos: 529
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 960
              Relay_Log_Space: 819
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
1 row in set (0.00 sec)

之后，查看一下node2的Pos及二进制日志文件，以便在node1上指向到node2：

MariaDB [(none)]> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000003 |      880 |              |                  |
+------------------+---------

在node1上执行操作：

MariaDB [(none)]>  CHANGE MASTER TO MASTER_HOST='172.16.103.3',MASTER_USER='repl',MASTER_PASSWORD='repl',MASTER_LOG_FILE='mysql-bin.000003',MASTER_LOG_POS=880;
MariaDB [(none)]> START SLAVE;
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.103.3
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 880
               Relay_Log_File: mysql-relay.000002
                Relay_Log_Pos: 529
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 880
              Relay_Log_Space: 819
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2

双主的数据库配置完成。

三、在三台主机上安装mysql-mmm的程序包：

# yum install -y mysql-mmm*

1、在node1上配置：

[root@node1 ~]# cd /etc/mysql-mmm/
[root@node1 mysql-mmm]# vim mmm_agent.conf

在mmm_agent.conf配置文件中添加：对应的节点的信息：

include mmm_common.conf
# The 'this' variable refers to this server.  Proper operation requires 
# that 'this' server (db1 by default), as well as all other servers, have the 
# proper IP addresses set in mmm_common.conf.
this node1

编辑node1上的mmm_common.conf文件，配置内容为：

active_master_role      writer
<host default>
    cluster_interface       eth0
    pid_path                /var/run/mysql-mmm/mmm_agentd.pid
    bin_path                /usr/libexec/mysql-mmm/
    replication_user        repl
    replication_password    repl
    agent_user              mmm_agent
    agent_password          mmm_agent
</host>
<host node1>
    ip      172.16.103.2
    mode    master
    peer    node2
</host>
<host node2>
    ip      172.16.103.3
    mode    master
    peer    node1
</host>
<host monitor>
    ip      172.16.103.1
    mode    slave
</host>
<role writer>
    hosts   node1, node2
    ips     172.16.103.200
    mode    exclusive
</role>
<role reader>
    hosts   node1, node2
    ips     172.16.103.201, 172.16.103.202
    mode    balanced
</role>

mmm_common.conf文件在各个节点上内容相同，复制到其他的节点上去即可：

# scp mmm_common.conf monitor:/etc/mysql-mmm/
# scp mmm_common.conf node2:/etc/mysql-mmm/

2、在node2上编辑配置文件，内容如下：

[root@node2 mysql-mmm]# vim /etc/mysql-mmm/mmm_agent.conf      
include mmm_common.conf
this node2

3、在监控节点上配置

mmm_agent.conf内容为:

[root@monitor mysql-mmm]# vim mmm_agent.conf
include mmm_common.conf
this monitor

include mmm_common.conf
<monitor>
    ip                  127.0.0.1
    pid_path            /var/run/mysql-mmm/mmm_mond.pid
    bin_path            /usr/libexec/mysql-mmm
    status_path         /var/lib/mysql-mmm/mmm_mond.status
    ping_ips            172.16.103.2,172.16.103.3   #这里配置的是监控用的IP地址，但不要设置本机的IP地址，其用途是是探测本机与外部通信的
    auto_set_online     10    #这个参数的默认值是60，用于恢复数据库正常工作的时间间隔，默认时间过长，可以缩短一些
    # The kill_host_bin does not exist by default, though the monitor will
    # throw a warning about it missing.  See the section 5.10 "Kill Host
    # Functionality" in the PDF documentation.
    #
    # kill_host_bin     /usr/libexec/mysql-mmm/monitor/kill_host
    #
</monitor>
<host default>
    monitor_user        mmm_monitor   #设置之前授权的账号和密码信息
    monitor_password    mmm_monitor
</host>
debug 0

至此两台数据库服务器主机以及在三台节点上配置mmm的过程完毕，现在可以启动数据库及mmm服务了。

# service mysqld start

node1和node2上：

# service mysql-mmm-agent start

monitor上：

# service mysql-mmm-monitor start

在全部启动完毕后，监控端可以查看到各数据库的工作状态，使用mmm_control命令查看：

[root@monitor mysql-mmm]# mmm_control show
# Warning: agent on host monitor is not reachable
  monitor(172.16.103.1) slave/HARD_OFFLINE. Roles: 
  node1(172.16.103.2) master/ONLINE. Roles: reader(172.16.103.201), writer(172.16.103.200)
  node2(172.16.103.3) master/ONLINE. Roles: reader(172.16.103.202)

在node1上查看IP地址的配置情况：

[root@node1 mysql-mmm]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:e1:37:51 brd ff:ff:ff:ff:ff:ff
    inet 172.16.103.2/16 brd 172.16.255.255 scope global eth0
    inet 172.16.103.201/32 scope global eth0
    inet 172.16.103.200/32 scope global eth0
    inet6 fe80::20c:29ff:fee1:3751/64 scope link 
       valid_lft forever preferred_lft forever

在node2上启动的IP为：

[root@node2 mysql-mmm]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:cf:64:8f brd ff:ff:ff:ff:ff:ff
    inet 172.16.103.3/16 brd 172.16.255.255 scope global eth0
    inet 172.16.103.202/32 scope global eth0
    inet6 fe80::20c:29ff:fecf:648f/64 scope link 
       valid_lft forever preferred_lft forever

如果服务器故障了，所有配置在主服务器上的VIP都会流转到另一个主服务器上：

[root@node2 mysql-mmm]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:cf:64:8f brd ff:ff:ff:ff:ff:ff
    inet 172.16.103.3/16 brd 172.16.255.255 scope global eth0
    inet 172.16.103.202/32 scope global eth0
    inet 172.16.103.201/32 scope global eth0
    inet 172.16.103.200/32 scope global eth0
    inet6 fe80::20c:29ff:fecf:648f/64 scope link 
       valid_lft forever preferred_lft forever

可以看到VIP 200，201,202都配置在了node2上。