问题描述:研发反馈测试环境跑批量任务时总是出现死锁,而可以确认的是相同ACCT_NO不会出现在不同进程,因此就不存在多个进程update相同数据的可能性.
数据库:oracle 11.2.0.4 rac
后台日志:
2023-11-17T08:43:33.626029+08:00
Global Enqueue Services Deadlock detected (DID = 3_0_48). More information in file
/u01/app/oracle/diag/rdbms/orclupg/hisdb1/trace/hisdb1_lmd0_17772.trc.
2023-11-17T08:43:50.715420+08:00
Global Enqueue Services Deadlock detected (DID = 3_0_49). More information in file
/u01/app/oracle/diag/rdbms/orclupg/hisdb1/trace/hisdb1_lmd0_17772.trc.
1、问题分析
--根据gv$active_session_history查询到该时间段存在两个会话相互阻塞的相关信息.
alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
select sql_id,
sql_exec_id,
sql_exec_start,
to_char(sample_time, 'yyyymmdd hh24:mi:ss') st,
inst_id,
session_id,
sql_id,
machine,
event,
blocking_inst_id,
blocking_session
from gv$active_session_history
where sample_time > to_date('20231117 08:43', 'yyyymmdd hh24:mi')
and sample_time < to_date('20231117 08:44', 'yyyymmdd hh24:mi')
order by st;
sql_id sql_exec_id sql_exec_start st inst_id session_id machine event blocking_inst_id blocking_session
--------------- ----------- ------------------- ----------------- ------- ----------- ------------ ---------------------------- ----------------- ----------------------------
82gz6jhs91s4k 17928968 2023-11-17 08:42:55 20231117 08:43:33 1 2182 orclxgea03 enq: TX - allocate ITL entry 1 3437
82gz6jhs91s4k 17928938 2023-11-17 08:42:51 20231117 08:43:33 1 3437 orclxgea04 enq: TX - allocate ITL entry 1 2182
......
82gz6jhs91s4k 17924183 2023-11-17 08:42:08 20231117 08:43:50 1 4231 orclxgea01 enq: TX - allocate ITL entry 2 1056
82gz6jhs91s4k 49682925 2023-11-17 08:42:51 20231117 08:43:50 2 1056 orclxgea09 enq: TX - allocate ITL entry 1 4231
--查询SQL语句
select sql_text from v$sqltext where sql_id='82gz6jhs91s4k';
--查询绑定变量
SQL> select b.name,b.position,b.datatype_string,b.value_string,b.last_captured from v$sql_bind_capture b where b.sql_id='82gz6jhs91s4k';
NAME POSITION DATATYPE_STRING VALUE_STRING LAST_CAPTURED
--------------- ---------- --------------- -------------------- -------------------
:1 1 VARCHAR2(32)
:2 2 VARCHAR2(32)
:3 3 NUMBER
:4 4 VARCHAR2(32)
:5 5 NUMBER
:6 6 VARCHAR2(32)
:7 7 VARCHAR2(32)
:8 8 VARCHAR2(32)
:9 9 NUMBER
:10 10 VARCHAR2(32)
:11 11 VARCHAR2(32)
NAME POSITION DATATYPE_STRING VALUE_STRING LAST_CAPTURED
--------------- ---------- --------------- -------------------- -------------------
:12 12 VARCHAR2(32)
:13 13 VARCHAR2(32)
:14 14 VARCHAR2(128) 191034056069361769 2023-11-17 08:40:58
:15 15 VARCHAR2(128) Pint 2023-11-17 08:40:58
:16 16 VARCHAR2(32) 9999 2023-11-17 08:40:58
--查历史绑定变量
select b.name,b.datatype_string,b.value_string,b.last_captured from dba_hist_sqlbind b where b.sql_id='82gz6jhs91s4k' order by 4;
NAME DATATYPE_STRING VALUE_STRING LAST_CAPTURED
--------------- --------------- -------------------- -------------------
:16 VARCHAR2(32) 9999 2023-11-17 08:40:58
:15 VARCHAR2(128) Pint 2023-11-17 08:40:58
:14 VARCHAR2(128) 191034056069361769 2023-11-17 08:40:58
:16 VARCHAR2(32) 9999 2023-11-17 08:41:01
:15 VARCHAR2(128) Pint 2023-11-17 08:41:01
:14 VARCHAR2(128) 181034000069361706 2023-11-17 08:41:01
2、分析结果
造成此异常的直接原因为事务一直没有提交,数据库一直处于ITL卡槽分配等待状态,最终事务超时.
系统为什么会产生enq: TX - allocate ITL entry等待事件?
经过分析确认批量业务处理时会按数据量使用多线程处理,程序按照业务的任务数量分配线程,当所有线程处理结束后统一提交事务。
1) 业务程序处理大量数据,主线程等待所有线程处理完成后才会被唤醒执行提交事务;
2) 子线程中执行程序中的UPDATE时多个事务让ORACLE分配ITL事务卡槽,而discountInstance数据块的卡槽已全部分配,此时便会等其他使用卡槽的事务释放,而其他事务也在等待主线程通知commit,最终造成系统程序的主线程与Oracle事务中ITL卡槽分配互锁.
3、解决方案
--通过增加INITRANS或者PCTFREE来解决该异常,以下为具体实施步骤.
3.1、查表AT_ACT_BIL信息
--生产环境
SQL> select table_name,tablespace_name,ini_trans,max_trans,OWNER,PCT_FREE,PCT_USED from all_tables where table_name='AT_ACT_BIL';
TABLE_NAME TABLESPACE_NAME INI_TRANS MAX_TRANS OWNER PCT_FREE PCT_USED
------------------------------ ------------------------------ ---------- ---------- --------------- ---------- ----------
AT_ACT_BIL TBS_orcl_DATA 16 255 orcl 20
--测试环境
SQL> select table_name,tablespace_name,ini_trans,max_trans,OWNER,PCT_FREE,PCT_USED from all_tables where table_name='AT_ACT_BIL';
TABLE_NAME TABLESPACE_NAME INI_TRANS MAX_TRANS OWNER PCT_FREE PCT_USED
------------------------------ ------------------------------ ---------- ---------- --------------- ---------- ----------
AT_ACT_BIL TBS_orcl_DATA 1 255 orcl 10
3.2、修改表AT_ACT_BIL参数
--测试环境修改表AT_ACT_BIL参数
SQL> alter table TIMER.AT_ACT_BIL initrans 16;
Table altered.
SQL> alter table TIMER.AT_ACT_BIL pctfree 20;
Table altered.
SQL> select table_name,tablespace_name,ini_trans,max_trans,OWNER,PCT_FREE,PCT_USED from all_tables where table_name='AT_ACT_BIL';
TABLE_NAME TABLESPACE_NAME INI_TRANS MAX_TRANS OWNER PCT_FREE PCT_USED
--------------- ------------------------------ ---------- ---------- -------------------- ---------- ----------
AT_ACT_BIL TBS_orcl_DATA 16 255 TIMER 20
3.3、re-organize table
SQL> alter table TIMER.AT_ACT_BIL move;
Table altered.
后台日志:
2023-11-17T14:55:13.656224+08:00
Thread 1 advanced to log sequence 3270 (LGWR switch), current SCN: 16570803317
Current log# 2 seq# 3270 mem# 0: +DATA/orclUPG/ONLINELOG/group_2.837.1151426175
2023-11-17T14:55:15.535886+08:00
ARC0 (PID:18213): Archived Log entry 8237 added for T-1.S-3269 ID 0x8b1b2026 LAD:1
2023-11-17T14:55:22.598016+08:00
Some indexes or index [sub]partitions of table TIMER.AT_ACT_BIL have been marked unusable
2023-11-17T14:55:51.651350+08:00
3.4、rebuild index
SQL> select INDEX_NAME,TABLE_OWNER,TABLE_NAME,TABLESPACE_NAME from dba_indexes where TABLE_NAME='AT_ACT_BIL'
INDEX_NAME TABLE_OWNER TABLE_NAME TABLESPACE_NAME
------------------------- ------------------------- --------------- ------------------------------
IDX_AT_ACT_BIL_AL TIMER AT_ACT_BIL TBS_orcl_IDX
IDX_UK_AT_ACT_BIL_ABL TIMER AT_ACT_BIL TBS_orcl_IDX
SQL> alter index TIMER.IDX_AT_ACT_BIL_AL rebuild pctfree 20 initrans 16;
SQL> alter index TIMER.IDX_UK_AT_ACT_BIL_ABL rebuild pctfree 20 initrans 16;
说明:此后恢复正常.
TX - allocate ITL entry等待事件
原创Liujun_Deng 博主文章分类:Oracle ©著作权
文章标签 ITL allocate ITL 文章分类 Oracle 数据库