MySQL有各种HA方案,其中很多是基于复制的,复制是MySQL原生的数据冗余技术,很容易使用,但要做到无数据丢失却不容易。

异步的复制

默认的复制是异步的,即master commit时不等更新被slave接受就向客户端回话应答成功。slave会对master有一个更新延迟,当master宕机,slave被提升为新的master时,必然会发生数据丢失。

http://dev.mysql.com/doc/refman/5.6/en/replication-semisync.html ------------------------------------------------------------
MySQL replication by default is asynchronous. The master writes events to its binary log but does not know whether or when a slave has retrieved and processed them. With asynchronous replication, if the master crashes, transactions that it has committed might not have been transmitted to any slave. Consequently, failover from master to slave in this case may result in failover to a server that is missing transactions relative to the master.
------------------------------------------------------------
 

半同步复制

半同步复制时,master会等本地commit成功并且至少有一个半同步复制slave收到了binglog且写入relay log刷新到磁盘。
http://dev.mysql.com/doc/refman/5.6/en/replication-semisync.html ------------------------------------------------------------
Semisynchronous replication can be used as an alternative to asynchronous replication:


A slave indicates whether it is semisynchronous-capable when it connects to the master.


If semisynchronous replication is enabled on the master side and there is at least one semisynchronous slave, a thread that performs a transaction commit on the master blocks after the commit is done and waits until at least one semisynchronous slave acknowledges that it has received all events for the transaction, or until a timeout occurs.


The slave acknowledges receipt of a transaction's events only after the events have been written to its relay log and flushed to disk.


If a timeout occurs without any slave having acknowledged the transaction, the master reverts to asynchronous replication. When at least one semisynchronous slave catches up, the master returns to semisynchronous replication.


Semisynchronous replication must be enabled on both the master and slave sides. If semisynchronous replication is disabled on the master, or enabled on the master but on no slaves, the master uses asynchronous replication.

------------------------------------------------------------
半同步复制比起异步复制,可靠性得到提高,但仍然不能保证无数据丢失。
第一,从上面标红的文字可以看出,一旦salve有什么问题,MySQL随时准备逃离半同步复制回到异步复制。所以如果想做自动faiover,必须考虑一下风险。


第二,即使master宕机时处于半同步复制状态,仍然不能保证不丢失数据。原因如下:
在master本地commit完成,并且等待salve应答已接受到binlog的间隙,其他client可以看到这个提交。如果此时master宕机,slave还没有收到这次commit的binlog,slave被提升为新的master后,那么刚才看到这个提交的client将会发现那个提交不见了,也即数据丢失了。

http://dev.mysql.com/doc/refman/5.6/en/replication-semisync.html
------------------------------------------------------------
Compared to asynchronous replication, semisynchronous replication provides improved data integrity. When a commit returns successfully, it is known that the data exists in at least two places (on the master and at least one slave). If the master commits but a crash occurs while the master is waiting for acknowledgment from a slave, it is possible that the transaction may not have reached any slave.
------------------------------------------------------------
 

lossless半同步复制

为了解决上面故障切换时的数据丢失问题,有些基于复制的HA解决方案会在切换前试图找回那部分延迟的binlog,但这种方式实在有些麻烦。假如master恰恰是由于放置binglog的磁盘损坏才crash的,那么丢失的数据是无法找回的。
MySQL 5.7对半同步复制进行了改进,实现了loss-less的复制。但是MySQL 5.7目前还是开发版,生产环境还不能上。
改进点也很容易理解,就是先等待slave返回已接受到binglog并刷盘的应答,然后再提交Commit命令到存储层,说白了就是对调了一下两个操作的次序。


http://dev.mysql.com/doc/refman/5.7/en/replication-semisync.html ------------------------------------------------------------
The rpl_semi_sync_master_wait_point system variable controls the point at which a semisynchronous replication master waits for slave acknowledgment of transaction receipt before returning a status to the client that committed the transaction. These values are permitted:


AFTER_SYNC (the default): The master writes each transaction to its binary log and the slave, and syncs the binary log to disk. The master waits for slave acknowledgment of transaction receipt after the sync. Upon receiving acknowledgment, the master commits the transaction to the storage engine and returns a result to the client, which then can proceed.

------------------------------------------------------------
但是这仍然没有解决上面提到的第一个问题,即MySQL随时可能会从半同步复制回到异步复制状态,在这期间,数据的安全性是得不到保障的。而且,如果想要实现安全的无数据丢失的自动failover,必须要判断出宕机时slave和master之间数据是否是同步的。
在这方面,PostgreSQL的同步复制有所不同,PostgreSQL至少收到1个从机的应答事务才能成功,否则一直等待,所以同步复制下任何时候PostgreSQL进行failover都是不丢数据的。


由于上面的原因,基于复制的HA解决方案的要保证数据0丢失比较困难。


还有一个问题需要MySQL能够保证,当把事务提交到存储引擎时,master和slave的存储引擎必须要表现出相同的行为。即不能一个成功,另一个失败,或者2个的数据不一致。
比如,参考DTCC的一个PPT中的例子, row模式的replace语句可能出现自增长问题
 

  1. create table test ( a int(11) default null, id int(11) not null auto_increment, b int(11) default null, primary key (id), unique key d (d) );
  2. insert into test values(5,27,4);
  3. replace into test(a,id,b) values(6,35,4);
  4. commit;

show create table时:
主库: auto_increment=36
备库: auto_increment=28