今天试验了一把MySQL MMM.(MySQL Master-Master Replication Manager)一个主主复制的管理工具,它提供了主机监控,MySQL服务监控,复制线程等多个方位的监控,使用VIP自动漂移,可以对应用程序实现透明。

    下面是试验步骤:

    OS: Ubuntu 11.04 X64
    DB1:192.168.1.186 (MASTER1)
    DB2:192.168.1.187 (MASTER2)
    DB3:192.168.1.184 (SLAVE)
    Monitor: 192.168.1.175
    虚拟ip:192.168.1.200(writer),192.168.1.201/202(reader)

    MySQL V5.1.50(源码安装)

    MMM V2.2.1 下载地址:http://mysql-mmm.org/downloads

    安装部分参考地址:参考地址:http://mysql-mmm.org/mmm2:guide

    Note:在建立复制环境时需要注意的几点:

    1、server_id 要设置不同

    2、Auto_increment_offset=1         Auto_increment_increment=2 (master1 上设置)

       Auto_increment_offset=2         Auto_increment_increment=2  (master2上设置)

       可以防止在某master假死并恢复的瞬间,两个“master” 都出现数据的更新,导致复制失败

    3、master上都要启用 log_slave_updates,log_bin

    MMM 安装:

    所需用户:

monitor user used by the mmm monitor to check the health of the MySQL servers REPLICATION CLIENT
agent user used by the mmm agent to change read-only mode, replication master, etc. SUPER, REPLICATION CLIENT, PROCESS
relication user used for replication REPLICATION SLAVE
 

     perl 依赖包安装:

     (DB需要安装部分:)

     perl -MCPAN -e shell 
     cpan> install Algorithm::Diff 
     cpan> install DBI 
     cpan>install Log::Dispatch 
     cpan> install Log::Log4perl 
     cpan> install Mail::Send 
     cpan> install Net::ARP 
     cpan> install Proc::Daemon 
     cpan> install Time::HiRes 
     cpan>install DBD::mysql 
     cpan>install File::stat 
     cpan>install File:basename

    monitoring  安装部分:

     perl -MCPAN -e shell 
     cpan> install Algorithm::Diff 
     cpan> install Class::Singleton 
     cpan> install Log::Dispatch 
     cpan> install Log::Log4perl 
     cpan> install Mail::Send 
     cpan> install Proc::Daemon 
     cpan> install Thread::Queue 
     cpan> install Time::HiRes 
     cpan> install DBI
     cpan>install DBD::mysql

    下载的二进制包,直接make  install (在所有机器上)

     MMM主要的功能通过下面三个脚本来实现:
mmm_mond
监控进程,负责所有的监控工作,决定和处理所有节点角色活动

     mmm_agentd
运行在每个mysql服务器上的代理进程,完成监控的探针工作和执行简单的远端服务设置

    mmm_control
 一个简单的脚本,提供管理mmm_mond进程的命令

    配置文件部分:

    active_master_role      writer
<host default>
        cluster_interface                      eth0
        pid_path                                /var/run/mmm_agentd.pid
        bin_path                                /usr/lib/mysql-mmm/
    replication_user                           abc
    replication_password                       MMM
        agent_user                             mmm_agent
        agent_password                         MMM
        mysql_port                             3308
</host>
<host db1>
        ip                                      192.168.1.186
        mode                                    master
        peer                                    db2
</host>
<host db2>
        ip                                      192.168.1.187
        mode                                    master
        peer                                    db1
</host>
<host db3>
        ip                                      192.168.1.184
        mode                                    slave
</host>
<role writer>
        hosts                                   db2,db1
        ips                                     192.168.1.200
        mode                                    exclusive
</role>
<role reader>
        hosts                                   db2,db3
        ips                                     192.168.1.201,192.168.1.202
        mode                                    balanced
</role>

    monitoring 部分:

    active_master_role      writer
<host default>
        cluster_interface                       eth0
         pid_path                               /var/run/mmm_agentd.pid
        bin_path                                /usr/lib/mysql-mmm/
     replication_user                           abc
    replication_password                        MMM
        agent_user                              mmm_agent
        agent_password                          MMM
        mysql_port                              3308
</host>
<host db1>
        ip                                      192.168.1.186
        mode                                    master
        peer                                    db2
</host>
<host db2>
        ip                                      192.168.1.187
        mode                                    master
        peer                                    db1
</host>
<host db3>
        ip                                      192.168.1.184
        mode                                    slave
</host>
<role writer>
        hosts                                   db2,db1
        ips                                     192.168.1.200
        mode                                    exclusive
</role>
<role reader>
        hosts                                   db2,db3
        ips                                     192.168.1.201,192.168.1.202
        mode                                    balanced
</role>

                       安装好后: mmm_control  show  查看状态: 
    db1(192.168.1.186) master/ONLINE. Roles: writer(192.168.1.200)

    db2(192.168.1.187) master/ONLINE. Roles: reader(192.168.1.201)

    db3(192.168.1.184) slave/ONLINE. Roles: reader(192.168.1.202)

    测试一、db2 宕机,恢复情况:

    日志分析:

    2012/08/23 17:02:59  WARN Check 'rep_backlog' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.187:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
    2012/08/23 17:02:59  WARN Check 'rep_threads' on 'db2' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.187:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
    2012/08/23 17:03:09 ERROR Check 'mysql' on 'db2' has failed for 10 seconds! Message: ERROR: Connect error (host = 192.168.1.187:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
    2012/08/23 17:03:09 ERROR Check 'rep_threads' on 'db1' has failed for 10 seconds! Message: ERROR: Replication is broken
    2012/08/23 17:03:10 FATAL State of host 'db2' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
    2012/08/23 17:03:10  INFO Removing all roles from host 'db2':
    2012/08/23 17:03:10  INFO     Removed role 'reader(192.168.1.201)' from host 'db2'
    2012/08/23 17:03:10  INFO Orphaned role 'reader(192.168.1.201)' has been assigned to 'db3'

    其中我们可以看到,monitor 通过10次 mysql检查,判定DB2已经从 online状态到 HARD_OFFLINE  主机存在,但是数据库已“死”,DB1的线程也已经停止,从 DB2上移除 reader角色到 DB3.

    当把DB2启动起来之后,它的状态会变为:changed from HARD_OFFLINE to AWAITING_RECOVERY,

    Check 'rep_threads' on 'db1' is ok!
    mmm_control set_online db2 手动变为online状态,它会重新获得reader角色:
    Admin changed state of 'db2' from AWAITING_RECOVERY to ONLINE
    INFO Moving role 'reader(192.168.1.202)' from host 'db3' to host 'db2'

    测试二:

    DB1宕机, DB2应该变为writer,db3 slave 把主定向为db2.

    日志分析:

    2012/08/23 17:14:51  WARN Check 'rep_threads' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.186:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
    2012/08/23 17:14:51  WARN Check 'rep_backlog' on 'db1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.186:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
    2012/08/23 17:15:00 ERROR Check 'mysql' on 'db1' has failed for 10 seconds! Message: ERROR: Connect error (host = 192.168.1.186:3308, user = mmm_monitor)! Lost connection to MySQL server at 'reading initial communication packet', system error: 111
    2012/08/23 17:15:01 ERROR Check 'rep_threads' on 'db2' has failed for 10 seconds! Message: ERROR: Replication is broken
    2012/08/23 17:15:01 ERROR Check 'rep_threads' on 'db3' has failed for 10 seconds! Message: ERROR: Replication is broken
    2012/08/23 17:15:02 FATAL State of host 'db1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
    2012/08/23 17:15:02  INFO Removing all roles from host 'db1':
    2012/08/23 17:15:02  INFO     Removed role 'writer(192.168.1.200)' from host 'db1'
    2012/08/23 17:15:02  INFO Orphaned role 'writer(192.168.1.200)' has been assigned to 'db2'
    2012/08/23 17:15:06 FATAL State of host 'db3' changed from ONLINE to REPLICATION_FAIL
    2012/08/23 17:15:06  INFO Removing all roles from host 'db3':
    2012/08/23 17:15:06  INFO     Removed role 'reader(192.168.1.201)' from host 'db3'
    2012/08/23 17:15:06  INFO Check 'rep_threads' on 'db3' is ok!
    2012/08/23 17:15:06  INFO Orphaned role 'reader(192.168.1.201)' has been assigned to 'db2'
    2012/08/23 17:15:09 FATAL State of host 'db3' changed from REPLICATION_FAIL to ONLINE
    2012/08/23 17:15:09  INFO Moving role 'reader(192.168.1.202)' from host 'db2' to host 'db3'

    Monitor 检测到 DB2和DB3 与DB1 之间的复制已经停止,10次检查后,确定DB1 已经 “死掉”,将它的writer ip 转移至 DB2,db3 复制线程停止后,将其拥有的 reader ip摘除,当 db3 重新与 DB2建立主从关系后,又获得 reader 角色;

    等DB1恢复后,  mmm_control show

    db1(192.168.1.186) master/ONLINE. Roles: 
    db2(192.168.1.187) master/ONLINE. Roles: reader(192.168.1.201), writer(192.168.1.200)
    db3(192.168.1.184) slave/ONLINE. Roles: reader(192.168.1.202)

   手动将DB1获得writer角色:

   mmm_control move_role writer db1
   db1(192.168.1.186) master/ONLINE. Roles: writer(192.168.1.200)
   db2(192.168.1.187) master/ONLINE. Roles: reader(192.168.1.201)
   db3(192.168.1.184) slave/ONLINE. Roles: reader(192.168.1.202)

   在测试的时候,间隔要超过60s,原因是:MMM防抖动检测:

   当主机从Online 状态转换为HARD_OFFLINE/REPLICATION_FAIL/REPLICATION_DELAY,又从这些状态转换为online(在小于60s的情况下)

    MMM 会阻止这种情况的发生,我们可以通过配置文件配置它,如果一个主机在flap_duration时间内宕掉了flap_count次,就认为主机处理flap状态,这个主机就不会自动被设置为ONLINE状 态,它将一直处于AWAITING_RECOVERY状态除非手动设置online(mmm_control set online host)。如果auto_set_online>0,处于flapping的主机在flap_duration时间后将制动被设置为ONLINE 状态。

   英文参考地址:http://mysql-mmm.org/mysql-mmm.html

   中文参考地址:http://linuxguest.blog.51cto.com/195664/578311