一、背景

首先我们在日常的开发中,大概率会使用批量更新,或者在一个事务里面做增删改查,那么就有可能不同事务之间导致死锁的发生。这里主要讲的是如何将当时发生死锁的信息输出到日志文件中,以及具体的SQL打印。

二、如何实现

查了很多网上的文章,都是使用什么下面之类的命令

show engine innodb status \G;
SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX;

但是我发现这些命令都是查正在死锁的语句,并不是历史死锁的语句,想想我们在日常工作中包括生产环境,如果发生了死锁,应用监控告警了,那么我们应该去找的历史死锁日志。

言归正传,那么如何找到历史的死锁日志。

(1)确认错误日志打印

确认错误日志打印路径,死锁的日志都会打印到这个错误日志文件,是可以看到历史日志的关键

通过命令 show global variables like "log_error%";查看错误日志路径

mysql查看死锁记录 mysql 查看死锁日志_mysql查看死锁记录

(2)打开死锁检测机制(必要)

MySQL有死锁检测机制,需要打开死锁检测机制(网上说死锁检测机制比较耗费性能,这个看个人取舍,我个人认为在现在的硬件环境支持下,这一点性能损耗是可以接受的)

show variables like 'innodb_deadlock_detect';
set GLOBAL innodb_deadlock_detect = ON;

当然你也可以在你的My.ini配置文件中写死(推荐)
innodb_deadlock_detect = ON;

 (3)打开MySQL关于锁的Monitor监视器输出

什么是MySQL的Monitor不在这里细说

show global variables like "innodb_status_output%";

set GLOBAL innodb_status_output = ON
set GLOBAL innodb_status_output_locks = ON

同样最好在My.ini进行配置写死,免得每次重启服务器都要重新设置
innodb_status_output = ON
innodb_status_output_locks = ON

(4)打印所有锁日志

(非必要,博主发现不开这个参数也能实现打印历史的锁日志,如果你为了锁的日志更全面,也可以开启)

show global variables like "innodb_print_all_deadlocks%";
set GLOBAL innodb_print_all_deadlocks = ON;

同样建议在my.ini配置文件中写死

innodb_print_all_deadlocks = ON

(5)如何查找日志

将上面的配置开启后,如果MySQL遇见死锁,则会帮你检测出来,并且回滚权重比较小的事务。这样另外一个事务就能往下执行(相当于自动帮你解除死锁)

首先找到模拟一波

开启两个事务

mysql查看死锁记录 mysql 查看死锁日志_数据库_02

交叉update id = 1 和 id = 2的记录,其中事务1出现了死锁。

Deadlock found when trying to get lock; try restarting transaction

同时也证明了死锁检测的好处,会自动帮你回滚其中一个事务的操作,让另外一个竞争的事务能够往下执行。

接下来才是我们的重头戏,我们如何查找错误日志。

根据第一步错误日志的文件

使用关键词INNODB MONITOR OUTPUT对日志进行过滤,一般会有开头和结束,例如我这次模拟的错的日志文件如下:

2022-11-10 07:46:26 0x6ef4 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 2 srv_active, 0 srv_shutdown, 131 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 3
OS WAIT ARRAY INFO: signal count 3
RW-shared spins 0, rounds 0, OS waits 0
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 0.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-11-10 07:46:22 0x7a38
*** (1) TRANSACTION:
TRANSACTION 273418, ACTIVE 25 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 9, OS thread handle 4580, query id 72 localhost 127.0.0.1 root updating
update foo set name = '123' where id = 2

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273418 lock_mode X locks rec but not gap
Record lock, heap no 28 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 00000001; asc     ;;
 1: len 6; hex 000000042c0a; asc     , ;;
 2: len 7; hex 02000000b726d4; asc      & ;;
 3: len 3; hex 313233; asc 123;;


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273418 lock_mode X locks rec but not gap waiting
Record lock, heap no 27 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 00000002; asc     ;;
 1: len 6; hex 00000004260a; asc     & ;;
 2: len 7; hex 02000000b725b8; asc      % ;;
 3: len 4; hex 33343536; asc 3456;;


*** (2) TRANSACTION:
TRANSACTION 273419, ACTIVE 19 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 10, OS thread handle 18376, query id 73 localhost 127.0.0.1 root updating
update foo set name = '3456' where id = 1

*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273419 lock_mode X locks rec but not gap
Record lock, heap no 27 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 00000002; asc     ;;
 1: len 6; hex 00000004260a; asc     & ;;
 2: len 7; hex 02000000b725b8; asc      % ;;
 3: len 4; hex 33343536; asc 3456;;


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273419 lock_mode X locks rec but not gap waiting
Record lock, heap no 28 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 00000001; asc     ;;
 1: len 6; hex 000000042c0a; asc     , ;;
 2: len 7; hex 02000000b726d4; asc      & ;;
 3: len 3; hex 313233; asc 123;;

*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 273420
Purge done for trx's n:o < 273414 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 284451204563104, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 284451204561440, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 284451204560608, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 273418, ACTIVE 29 sec
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 9, OS thread handle 4580, query id 72 localhost 127.0.0.1 root
TABLE LOCK table `wechat_min_pro`.`foo` trx id 273418 lock mode IX
RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273418 lock_mode X locks rec but not gap
Record lock, heap no 28 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 00000001; asc     ;;
 1: len 6; hex 000000042c0a; asc     , ;;
 2: len 7; hex 02000000b726d4; asc      & ;;
 3: len 3; hex 313233; asc 123;;

RECORD LOCKS space id 711 page no 4 n bits 96 index PRIMARY of table `wechat_min_pro`.`foo` trx id 273418 lock_mode X locks rec but not gap
Record lock, heap no 27 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 00000002; asc     ;;
 1: len 6; hex 000000042c0a; asc     , ;;
 2: len 7; hex 02000000b726f6; asc      & ;;
 3: len 3; hex 313233; asc 123;;

--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
1500 OS file reads, 232 OS file writes, 55 OS fsyncs
13.15 reads/s, 16384 avg bytes/read, 0.75 writes/s, 0.40 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 195, seg size 197, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 2267, node heap has 1 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 1 buffer(s)
0.00 hash searches/s, 0.15 non-hash searches/s
---
LOG
---
Log sequence number          6117561793
Log buffer assigned up to    6117561793
Log buffer completed up to   6117561793
Log written up to            6117561793
Log flushed up to            6117561793
Added dirty pages up to      6117561793
Pages flushed up to          6117561793
Last checkpoint at           6117561793
21 log i/o's done, 0.10 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 8585216
Dictionary memory allocated 388960
Buffer pool size   512
Free buffers       252
Database pages     258
Old database pages 0
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 1475, created 142, written 166
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 353 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 258, unzip_LRU len: 0
I/O sum[291]:cur[2], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=9120, Main thread ID=00000000000053B0 , state=sleeping
Number of rows inserted 0, updated 2, deleted 0, read 3
0.00 inserts/s, 0.05 updates/s, 0.00 deletes/s, 0.05 reads/s
Number of system rows inserted 0, updated 315, deleted 0, read 5798
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

我们发现产生死锁的两条SQL语句都被输出出来了

 update foo set name = '123' where id = 2

 update foo set name = '3456' where id = 1

定位出具体的SQL,那么我们就可以在我们的代码上进行优化了,具体优化思路可以参照我另外一篇博客


三、总结

在生产环境日志,我这边是建议开启死锁检测、以及锁日志打印的,因为一旦发生死锁,首先死锁检测能帮助我们解除死锁,其次我们能够快速的定位具体的SQL以及代码问题,比起牺牲的性能来说,我们排查问题的效率就大大提升了。