mysql 主从1236 mysql主从1032

转载

mob6454cc762e37 2023-10-02 16:59:36

文章标签 mysql 主从1236 数据 mysql SQL 文章分类 MySQL 数据库

1问题的引出

当我们在处理 MySQL 复制故障时，经常遇到1032、1062错误，多数情况下属于人为连到从库执行了DML操作导致，少数情况是由于主库挂了，数据没有正常同步，角色切换后，数据没有补齐导致。

mysql 主从1236 mysql主从1032_mysql 主从1236

处理方法如下几种：

1. 1062错误时，在从库把重复的数据删除即可,处理起来最方便;
2. 1032错误时，根据“show slave status”的结果去主库分析
binlog，找到对应的数据，把数据插入到从库即可。少量的数据插入或更新操作起来也方便；批量数据操作时，人工确定哪一行有错误的话就比较累了，干脆就重新搭建主从环境。
3. 跳过当前事务，使用 pt 工具去校验数据。这个操作起来有限制，不能在业务高峰时操作，因为是全量比较。可以全库，也可以全表。

2四个测试

针对上面的痛点，我们先做些测试。这边不测试事务的跳过，因为跳过不确定会有什么新坑在等着我们。

一、环境介绍：

5.7.13-log MySQL Community Server
Master: 192.168.1.5
Slave: 192.168.1.4

二、数据准备：

CREATE TABLE `test_repl` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(10) DEFAULT NULL,
  `b` varchar(10) DEFAULT NULL,
  `c` varchar(20) DEFAULT NULL,
  `d` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uiq_index` (`a`,`b`)
);

insert into test_repl (a, b, c, d) values('aa', 'bb', 'cc', now());
insert into test_repl (a, b, c, d) values('aaa', 'bbb', 'ccc', now());
insert into test_repl (a, b, c, d) values('aaaa', 'bbbb', 'cccc', now());
insert into test_repl (a, b, d) values('a1', 'b1', now());
insert into test_repl (a, b, d) values('a2', 'b2', now());
insert into test_repl (a, b, d) values('a3', 'b3', now());

mysql> select * from test_repl;
+----+------+------+------+---------------------+
| id | a    | b    | c    | d                   |
+----+------+------+------+---------------------+
|  1 | aa   | bb   | cc   | 2017-08-15 14:42:19 |
|  2 | aaa  | bbb  | ccc  | 2017-08-15 14:42:19 |
|  3 | aaaa | bbbb | cccc | 2017-08-15 14:42:19 |
|  4 | a1   | b1   | NULL | 2017-08-15 14:42:19 |
|  5 | a2   | b2   | NULL | 2017-08-15 14:42:19 |
|  6 | a3   | b3   | NULL | 2017-08-15 14:42:20 |
+----+------+------+------+---------------------+
6 rows in set (0.00 sec)

三、测试的故障类型

1. 测试1062,主键数据重复；
2. 测试1062,复合唯一索引数据重复；
3. 测试1032,更新时记录不存在；
4. 测试1032,删除时记录不存在。

注：在每次测试前，使用之前准备好的数据。

测试一：测试1062,主键数据重复

id=7

insert into test_repl values(7, 'a4', 'b4', 'c4', now());

在主库也插入上面语句

insert into test_repl values(7, 'a4', 'b4', 'c4', now());

同步报错，部分日志如下：

Slave_SQL_Running: No
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 1062
                    Last_Error: Could not execute Write_rows event on table test.test_repl; Duplicate entry '7' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000057, end_log_pos 256531451

修复思路，直接去从库把重复记录删除, 删除前记得保存下，保不定哪天查出来操作有误呢！

delete from test.test_repl where id = 7;
 start slave;

查看结果

Slave_IO_Running: Yes
             Slave_SQL_Running: Yes

测试二、测试1062,复合唯一索引数据重复

id=7

insert into test_repl values(7, 'a4', 'b4', 'c4', now());

id=8,a='a4', b='b4', c='c4', r_time=now()

的记录

insert into test_repl values(8, 'a4', 'b4', 'c4', now());

同步报错，部分内容如下：

Slave_IO_Running: Yes
             Slave_SQL_Running: No
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 1062
                    Last_Error: Could not execute Write_rows event on table test.test_repl; Duplicate entry 'a4-b4' for key 'uiq_index', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000057, end_log_pos 256534236

修复思路

uiq_index

(a,b)
获取错误中的显示的重复记录，’a4-b4’
使其顺序对应，这边的结果为 a=’a4’, b=’b4’，这边作为条件，去从库删除数据

delete from test.test_repl where a='a4' and b='b4';
start slave;

查看结果，部分内容

Slave_IO_Running: Yes
             Slave_SQL_Running: Yes

测试三：测试1032,更新时记录不存在

id=4

的记录

delete from test_repl where id = 4;

id=4

update test_repl set a='new_a1111' where id = 4;

同步报错，部分内容如下：

Slave_IO_Running: Yes
             Slave_SQL_Running: No
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 1032
                    Last_Error: Could not execute Update_rows event on table test.test_repl; Can't find record in 'test_repl', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000057, end_log_pos 256538982
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 256538707

这个时候我们无法知道是哪条记录出了问题，得去分析下主库的 binlog，也可以去分析 relary log，但是，无法知道这个事务在 relay log中的结束位置，只知道开始位置，所以不建议去分析 relary log。

这个事务的位置可以通过上面信息去主库的 binlog 文件里面查询，start-pos = Exec_Master_Log_Pos, stop-pos = Last_Error 里面的 end_log_pos。拿着这两个信息，去主库分析 binlog。

4. 分析主库 binlog

[mysql@mq5 mysqlData]$ mysqlbinlog  -v --base64-output=decode-row binlog.000057 --start-position=256538707 stop-position=256538982
 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
 DELIMITER /*!*/;
 # at 256538707
 #170815 15:13:10 server id 5  end_log_pos 256538772 CRC32 0x376dfc56     GTID    last_committed=3052    sequence_number=3053 SET @@SESSION.GTID_NEXT= 'df101ae3-6f23-11e6-ab0d-9c5c8ed1c335:2264390'/*!*/;
 # at 256538772
 #170815 15:13:10 server id 5  end_log_pos 256538844 CRC32 0x8eae63a5     Query    thread_id=784    exec_time=0    error_code=0 SET TIMESTAMP=1502781190/*!*/;
 SET @@session.pseudo_thread_id=784/*!*/;
 SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
 SET @@session.sql_mode=1436549152/*!*/;
 SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
 /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
 SET @@session.lc_time_names=0/*!*/;
 SET @@session.collation_database=DEFAULT/*!*/;
 BEGIN
 /*!*/;
 # at 256538844
 #170815 15:13:10 server id 5  end_log_pos 256538907 CRC32 0xaf0ebe48     Table_map: `test`.`test_repl` mapped to number 575
 # at 256538907
 #170815 15:13:10 server id 5  end_log_pos 256538982 CRC32 0x6c8220fd     Update_rows: table id 575 flags: STMT_END_F
 ### UPDATE `test`.`test_repl`
 ### WHERE
 ###   @1=4
 ###   @2='a1'
 ###   @3='b1'
 ###   @4=NULL
 ###   @5='2017-08-15 15:12:32'
 ### SET
 ###   @1=4
 ###   @2='new_a1111'
 ###   @3='b1'
 ###   @4=NULL
 ###   @5='2017-08-15 15:12:32'
 # at 256538982
 #170815 15:13:10 server id 5  end_log_pos 256539013 CRC32 0x15d15eef     Xid = 166042
 COMMIT/*!*/;
 mysqlbinlog: File 'stop-position=256538982' not found (Errcode: 2 - No such file or directory)
 SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
 DELIMITER ;
 # End of log file
 /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

5. 拿着分析出来的binlog，加上对应的列名去从库验证数据是否在

mysql> select 1 from `test`.`test_repl` WHERE id = 4 and a='a1' and b='b1' and c is NULL and d='2017-08-15 15:12:32';
 Empty set (0.00 sec)

 mysql> select * from test_repl;
 +----+------+------+------+---------------------+
 | id | a    | b    | c    | d                   |
 +----+------+------+------+---------------------+
 |  1 | aa   | bb   | cc   | 2017-08-15 15:12:32 |
 |  2 | aaa  | bbb  | ccc  | 2017-08-15 15:12:32 |
 |  3 | aaaa | bbbb | cccc | 2017-08-15 15:12:32 |
 |  5 | a2   | b2   | NULL | 2017-08-15 15:12:32 |
 |  6 | a3   | b3   | NULL | 2017-08-15 15:12:32 |
 +----+------+------+------+---------------------+
 5 rows in set (0.00 sec)

6. 插入不存在的数据

INSERT INTO `test`.`test_repl` VALUES( 4 , 'a1' , 'b1' , NULL , '2017-08-15 15:12:32');
 start slave;

7. 查看结果，部分内容

Slave_IO_Running: Yes
             Slave_SQL_Running: Yes

测试四：测试1032,删除时记录不存在

id=2

delete from test_repl where id = 2;

在主库也执行同样的操作

delete from test_repl where id = 2;

同步报错，部分内容如下：

Slave_IO_Running: Yes
             Slave_SQL_Running: No
               Replicate_Do_DB: 
           Replicate_Ignore_DB: 
            Replicate_Do_Table: 
        Replicate_Ignore_Table: 
       Replicate_Wild_Do_Table: 
   Replicate_Wild_Ignore_Table: 
                    Last_Errno: 1032
                    Last_Error: Could not execute Delete_rows event on table test.test_repl; Can't find record in 'test_repl', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000057, end_log_pos 256541184
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 256540927

同样去分析主库的binlog

分析主库binlog

[mysql@mq5 mysqlData]$ mysqlbinlog  -v --base64-output=decode-row binlog.000057 --start-position=256540927 --stop-position=256541184
 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
 /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
 DELIMITER /*!*/;
 # at 256540927
 #170815 15:35:12 server id 5  end_log_pos 256540992 CRC32 0x3890a134     GTID    last_committed=3060    sequence_number=3061
 SET @@SESSION.GTID_NEXT= 'df101ae3-6f23-11e6-ab0d-9c5c8ed1c335:2264398'/*!*/;
 # at 256540992
 #170815 15:35:12 server id 5  end_log_pos 256541064 CRC32 0xa3b6d7d7     Query    thread_id=784    exec_time=0    error_code=0
 SET TIMESTAMP=1502782512/*!*/;
 SET @@session.pseudo_thread_id=784/*!*/;
 SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
 SET @@session.sql_mode=1436549152/*!*/;
 SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
 /*!\C utf8 *//*!*/;
 SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
 SET @@session.lc_time_names=0/*!*/;
 SET @@session.collation_database=DEFAULT/*!*/;
 BEGIN
 /*!*/;
 # at 256541064
 #170815 15:35:12 server id 5  end_log_pos 256541127 CRC32 0x1b26e71f     Table_map: `test`.`test_repl` mapped to number 576
 # at 256541127
 #170815 15:35:12 server id 5  end_log_pos 256541184 CRC32 0xa6eb491b     Delete_rows: table id 576 flags: STMT_END_F
 ### DELETE FROM `test`.`test_repl`
 ### WHERE
 ###   @1=2
 ###   @2='aaa'
 ###   @3='bbb'
 ###   @4='ccc'
 ###   @5='2017-08-15 15:30:35'
 ROLLBACK /* added by mysqlbinlog */ /*!*/;
 SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
 DELIMITER ;
 # End of log file
 /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

拿着 binlog 里面的记录去从库查询,确定数据是否存在
` mysql> select 1 from `test`.`test_repl` where id=2 and a='aaa' and b='bbb' and c='ccc' and d='2017-08-15 15:30:35'; Empty set (0.00 sec)
从库插入不存在的记录

insert into `test`.`test_repl` values(2, 'aaa', 'bbb', 'ccc', '2017-08-15 15:30:35');

查看结果，部分内容如下：

Slave_IO_Running: Yes
             Slave_SQL_Running: Yes

3测试结论

经过上面四个实验，可以得出处理思路就是：数据重复，在从库删除重复数据，数据不存在，在从库插入不存在的记录。

具体要怎么删除，怎么插入，还得去具体分析，过程还是会有点累的，下面给出四种方案的具体分析方法：

主键重复时，根据直接主键列去删除记录就好，这个列需要从表结构中获取，并不是所有的主键列都是id；
唯一索引重复时，思路也是跟主键一样，需要获取具体的列，特别是复合索引时，有多个列，列与值需要对应起来；
更新记录不存在是，需要把根据 show slave status
删除时记录不存在，处理方法与第三个方法一样，只能一条条试。

附：根据上面四种问题的分析思路，将处理步骤写成了脚本，使用方式与环境在脚本的最上面有说明。

如何减少 1062、1032 错误出现的次数?

从库设置只读
应用账号的权限不能 super 权限
定期使用 pt 工具校验数据一致性

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：获取对称加密密钥生成 java 对称密钥加密技术

下一篇：java项目集成传感器集成式传感器的能力

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯