在前面的两篇文章中,介绍了mysql组复制的特点及配置过程,本文演示mysql单组复制下的模拟故障测试。


一、组复制所有成员服务器宕机重启后的恢复

连接所有的mysql实例查询当前的组复制成员情况,状态都是OFFLINE,这种情况下如何恢复组复制?

mysql> select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+-----------+-------------+-------------+--------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |
+---------------------------+-----------+-------------+-------------+--------------+
1 row in set (0.00 sec)

在第一台mysql实例上将group_replication_bootstrap_group设置为ON,然后开启组复制,完成后再设置group_replication_bootstrap_group参数为OFF

mysql> set global group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.01 sec)

mysql> start group_replication;
Query OK, 0 rows affected (2.19 sec)

mysql> set global group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 8b643791-6d30-11e7-986a-000c29d53b31 | vm1         |        3306 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
1 row in set (0.00 sec)


Mysql组复制故障恢复测试_group


第二台mysql实例上开启组复制

mysql> select * from performance_schema.replication_group_members;
+---------------------------+-----------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+-----------+-------------+-------------+--------------+
| group_replication_applier |           |             |        NULL | OFFLINE      |
+---------------------------+-----------+-------------+-------------+--------------+
1 row in set (0.00 sec)

mysql> change master to master_user='repl',master_password='123456' for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.03 sec)

mysql> start group_replication;
Query OK, 0 rows affected (6.76 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 35c331eb-6d3d-11e7-91e3-000c29e07281 | vm2         |        3306 | ONLINE       |
| group_replication_applier | 941bce76-6d40-11e7-b2fe-000c2909332c | vm3         |        3306 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec)


Mysql组复制故障恢复测试_mysql_02


在第三台mysql实例开启组复制

mysql> change master to master_user='repl',master_password='123456' for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.03 sec)

mysql> start group_replication;
Query OK, 0 rows affected (3.39 sec)

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 35c331eb-6d3d-11e7-91e3-000c29e07281 | vm2         |        3306 | ONLINE       |
| group_replication_applier | 8b643791-6d30-11e7-986a-000c29d53b31 | vm1         |        3306 | ONLINE       |
| group_replication_applier | 941bce76-6d40-11e7-b2fe-000c2909332c | vm3         |        3306 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec)


Mysql组复制故障恢复测试_failover_03


一般来说,第一台启动group replication的为组复制中的master角色,我们可以使用如下的SQL进行确认

SELECT
	*
FROM
	PERFORMANCE_SCHEMA .replication_group_members
WHERE
	member_id = (
		SELECT
			variable_value
		FROM
			PERFORMANCE_SCHEMA .global_status
		WHERE
			VARIABLE_NAME = 'group_replication_primary_member'
	);

二、如果master宕机,组复制会如何?下面我们来测试一下

首先确认当前的master主机是VM2


Mysql组复制故障恢复测试_mysql_04


[root@vm2 ~]# service mysqld stop
Shutting down MySQL............. SUCCESS!
[root@vm2 ~]# tail -f /mydata/vm2.err 
2017-10-27T15:42:19.×××27Z 0 [Note] Giving 7 client threads a chance to die gracefully
2017-10-27T15:42:19.999375Z 0 [Note] Shutting down slave threads
2017-10-27T15:42:22.000341Z 0 [Note] Forcefully disconnecting 7 remaining clients
2017-10-27T15:42:22.000556Z 0 [Note] Plugin group_replication reported: 'Going to wait for view modification'
2017-10-27T15:42:22.002961Z 0 [Note] Plugin group_replication reported: 'getstart group_id 4317e324'
2017-10-27T15:42:25.462993Z 0 [Note] Plugin group_replication reported: 'state 4345 action xa_terminate'
2017-10-27T15:42:25.463472Z 0 [Note] Plugin group_replication reported: 'new state x_start'
2017-10-27T15:42:25.463514Z 0 [Note] Plugin group_replication reported: 'state 4272 action xa_exit'
2017-10-27T15:42:25.463650Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
2017-10-27T15:42:25.463675Z 0 [Note] Plugin group_replication reported: 'new state x_start'
2017-10-27T15:42:30.555581Z 0 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
2017-10-27T15:42:30.555639Z 0 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
2017-10-27T15:42:30.556496Z 19 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2017-10-27T15:42:30.564277Z 16 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
2017-10-27T15:42:30.565453Z 0 [Note] Event Scheduler: Purging the queue. 0 events
2017-10-27T15:42:30.613823Z 0 [Note] Binlog end
2017-10-27T15:42:30.619483Z 0 [Note] Shutting down plugin 'group_replication'
2017-10-27T15:42:30.619582Z 0 [Note] Plugin group_replication reported: 'All Group Replication server observers have been successfully unregistered'
2017-10-27T15:42:30.620753Z 0 [Note] Shutting down plugin 'ngram'
2017-10-27T15:42:30.620801Z 0 [Note] Shutting down plugin 'BLACKHOLE'
2017-10-27T15:42:30.620820Z 0 [Note] Shutting down plugin 'ARCHIVE'
2017-10-27T15:42:30.620832Z 0 [Note] Shutting down plugin 'partition'
2017-10-27T15:42:30.620841Z 0 [Note] Shutting down plugin 'INNODB_SYS_VIRTUAL'
2017-10-27T15:42:30.620853Z 0 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES'
2017-10-27T15:42:30.620937Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES'
2017-10-27T15:42:30.620946Z 0 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS'
2017-10-27T15:42:30.620951Z 0 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN'
2017-10-27T15:42:30.620957Z 0 [Note] Shutting down plugin 'INNODB_SYS_FIELDS'
2017-10-27T15:42:30.620961Z 0 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS'
2017-10-27T15:42:30.620966Z 0 [Note] Shutting down plugin 'INNODB_SYS_INDEXES'
2017-10-27T15:42:30.620971Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS'
2017-10-27T15:42:30.620976Z 0 [Note] Shutting down plugin 'INNODB_SYS_TABLES'
2017-10-27T15:42:30.620980Z 0 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE'
2017-10-27T15:42:30.620985Z 0 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE'
2017-10-27T15:42:30.620990Z 0 [Note] Shutting down plugin 'INNODB_FT_CONFIG'
2017-10-27T15:42:30.620994Z 0 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED'
2017-10-27T15:42:30.620999Z 0 [Note] Shutting down plugin 'INNODB_FT_DELETED'
2017-10-27T15:42:30.621003Z 0 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD'
2017-10-27T15:42:30.621008Z 0 [Note] Shutting down plugin 'INNODB_METRICS'
2017-10-27T15:42:30.621013Z 0 [Note] Shutting down plugin 'INNODB_TEMP_TABLE_INFO'
2017-10-27T15:42:30.621017Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS'
2017-10-27T15:42:30.621022Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU'
2017-10-27T15:42:30.621027Z 0 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE'
2017-10-27T15:42:30.621032Z 0 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET'
2017-10-27T15:42:30.621042Z 0 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX'
2017-10-27T15:42:30.621051Z 0 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET'
2017-10-27T15:42:30.621056Z 0 [Note] Shutting down plugin 'INNODB_CMPMEM'
2017-10-27T15:42:30.621061Z 0 [Note] Shutting down plugin 'INNODB_CMP_RESET'
2017-10-27T15:42:30.621066Z 0 [Note] Shutting down plugin 'INNODB_CMP'
2017-10-27T15:42:30.621071Z 0 [Note] Shutting down plugin 'INNODB_LOCK_WAITS'
2017-10-27T15:42:30.621076Z 0 [Note] Shutting down plugin 'INNODB_LOCKS'
2017-10-27T15:42:30.621081Z 0 [Note] Shutting down plugin 'INNODB_TRX'
2017-10-27T15:42:30.621085Z 0 [Note] Shutting down plugin 'InnoDB'
2017-10-27T15:42:30.621309Z 0 [Note] InnoDB: FTS optimize thread exiting.
2017-10-27T15:42:30.622401Z 0 [Note] InnoDB: Starting shutdown...
2017-10-27T15:42:30.723215Z 0 [Note] InnoDB: Dumping buffer pool(s) to /mydata/ib_buffer_pool
2017-10-27T15:42:30.723542Z 0 [Note] InnoDB: Buffer pool(s) dump completed at 171027 11:42:30
2017-10-27T15:42:32.472064Z 0 [Note] InnoDB: Shutdown completed; log sequence number 2767811
2017-10-27T15:42:32.474350Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2017-10-27T15:42:32.474373Z 0 [Note] Shutting down plugin 'MEMORY'
2017-10-27T15:42:32.474385Z 0 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA'
2017-10-27T15:42:32.474419Z 0 [Note] Shutting down plugin 'MRG_MYISAM'
2017-10-27T15:42:32.474431Z 0 [Note] Shutting down plugin 'MyISAM'
2017-10-27T15:42:32.474459Z 0 [Note] Shutting down plugin 'CSV'
2017-10-27T15:42:32.474471Z 0 [Note] Shutting down plugin 'sha256_password'
2017-10-27T15:42:32.474481Z 0 [Note] Shutting down plugin 'mysql_native_password'
2017-10-27T15:42:32.475429Z 0 [Note] Shutting down plugin 'binlog'
2017-10-27T15:42:32.477356Z 0 [Note] /usr/local/mysql/bin/mysqld: Shutdown complete


vm1主机的日志

[root@vm1 ~]# tail -f /mydata/vm1.err 
2017-10-27T15:42:22.634326Z 0 [Note] Plugin group_replication reported: 'getstart group_id 4317e324'
2017-10-27T15:42:23.095198Z 0 [Note] Plugin group_replication reported: 'A new primary was elected, enabled conflict detection until the new primary applies all relay logs'
2017-10-27T15:42:23.095627Z 0 [Note] Plugin group_replication reported: 'Unsetting super_read_only.'
2017-10-27T15:42:23.095987Z 5 [Note] Plugin group_replication reported: 'A new primary was elected, enabled conflict detection until the new primary applies all relay logs'

vm3主机的日志

[root@vm3 ~]# tail -f /mydata/vm3.err
2017-10-27T15:42:22.235922Z 0 [Note] Plugin group_replication reported: 'getstart group_id 4317e324'
2017-10-27T15:42:22.696811Z 0 [Note] Plugin group_replication reported: 'A new primary was elected, enabled conflict detection until the new primary applies all relay logs'
2017-10-27T15:42:22.697087Z 0 [Note] Plugin group_replication reported: 'Setting super_read_only.'
2017-10-27T15:42:22.697303Z 5 [Note] Plugin group_replication reported: 'A new primary was elected, enabled conflict detection until the new primary applies all relay logs'

可以看到这个时候VM1自动提升为主了,且写入的数据可以继续同步到vm3,vm3的角色为slave,不能写入数据

mysql> use yang
Database changed
mysql> insert into t1 values (2,'two');
Query OK, 1 row affected (0.03 sec)

mysql> select * from t1;
+----+------+
| id | name |
+----+------+
|  1 | one  |
|  2 | two  |
+----+------+
2 rows in set (0.00 sec)


Mysql组复制故障恢复测试_failover_05


mysql> select * from yang.t1;
+----+------+
| id | name |
+----+------+
|  1 | one  |
|  2 | two  |
+----+------+
2 rows in set (0.00 sec)

mysql> insert into yang.t1 values (3,'three');
ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement


Mysql组复制故障恢复测试_group_06


如果此时vm1也跪了会如何呢?

[root@vm1 ~]# service mysqld stop
Shutting down MySQL............. SUCCESS!

观察vm3的日志

[root@vm3 ~]# tail -f /mydata/vm3.err 
2017-10-27T15:42:22.235922Z 0 [Note] Plugin group_replication reported: 'getstart group_id 4317e324'
2017-10-27T15:42:22.696811Z 0 [Note] Plugin group_replication reported: 'A new primary was elected, enabled conflict detection until the new primary applies all relay logs'
2017-10-27T15:42:22.697087Z 0 [Note] Plugin group_replication reported: 'Setting super_read_only.'
2017-10-27T15:42:22.697303Z 5 [Note] Plugin group_replication reported: 'A new primary was elected, enabled conflict detection until the new primary applies all relay logs'
2017-10-27T15:46:54.618998Z 5 [Note] Plugin group_replication reported: 'Primary had applied all relay logs, disabled conflict detection'
2017-10-27T15:49:57.546061Z 0 [Note] Plugin group_replication reported: 'getstart group_id 4317e324'
2017-10-27T15:49:58.612942Z 0 [Note] Plugin group_replication reported: 'Unsetting super_read_only.'
2017-10-27T15:49:58.613092Z 5 [Note] Plugin group_replication reported: 'A new primary was elected, enabled conflict detection until the new primary applies all relay logs'

VM3写入测试

mysql> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 941bce76-6d40-11e7-b2fe-000c2909332c | vm3         |        3306 | ONLINE       |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
1 row in set (0.00 sec)

mysql> insert into yang.t1 values (3,'three');
Query OK, 1 row affected (0.02 sec)

mysql> select * from yang.t1;
+----+-------+
| id | name  |
+----+-------+
|  1 | one   |
|  2 | two   |
|  3 | three |
+----+-------+
3 rows in set (0.00 sec)


Mysql组复制故障恢复测试_failover_07


三、将VM1和VM2恢复后会如何?我们来测试一下

Mysql组复制故障恢复测试_group_08

Mysql组复制故障恢复测试_mysql_09

在最早宕机VM2查询数据是否正常

Mysql组复制故障恢复测试_failover_10


四、总结

通过上述简单的测试,发现mysql的组复制还是挺好用的,3台mysql实例组成的组复制结构下,只要有1台主机存活,整个mysql服务就可用,在故障主机恢复之后,组复制会自动同步数据,恢复组复制状态。

但在实际的使用过程中,需要考虑到客户端或者数据库中间件的连接问题。因为在单组复制模式下,slave是自读的,所有的写入请求都通过master,如果master宕机,程序或者客户端、数据库中间件读写分离的连接如何平滑处理?