杨柳 数据和云

导读:数据库表空间扩容是我们在运维过程中经常需要做的事情,常用的方法有增加表空间数据文件或resize数据文件等,本文分享一起使用resize datafile扩容表空间时偶遇enq: TX - index contention引起系统异常的特殊案例,供大家参考。


问题描述



某客户生产系统核心库在2019-08-21 23:50:00左右出现了业务告警,应用无法连接的情况。

查看故障期间数据库的告警日志,发现在23:49:56,数据库进行了数据库文件扩容操作。如何巧妙处理enq: TX - index contention问题?_Jav

查看故障期间数据库的监听日志,发现从23:58:44到00:05:23这段时间内,数据库没有新的JDBC连接进来。如何巧妙处理enq: TX - index contention问题?_Jav_02

专家解答



查看故障期间的ash信息,发现在应用异常的时间段内,SQL_ID为g64z3fw1nffvj的insert操作产生了大量的enq: TX - index contention等待,并且造成这个等待事件的原因就是alter database操作和它本身的insert操作,在resize的执行过程中enq: TX - index contention等待事件一直升高,resize操作成功后(即2019-08-22 00:05:00左右)等待事件恢复正常。

如何巧妙处理enq: TX - index contention问题?_Jav_03

如何巧妙处理enq: TX - index contention问题?_Jav_04

如何巧妙处理enq: TX - index contention问题?_Jav_05

如何巧妙处理enq: TX - index contention问题?_Jav_06

查看相关的trace文件信息:


*** 2019-08-21 23:57:14.907===============================================================================HANG ANALYSIS:  instances (db_name.oracle_sid): xxxxx =============================================================================== Chains most likely to have caused the hang<<<<<<<<<<<<<<<<<<<<<<查看HANG住三个可能因素 [a] Chain 1 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention'     Chain 1 Signature Hash: 0xbeca5ef7 [b] Chain 2 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention'     Chain 2 Signature Hash: 0xbeca5ef7 [c] Chain 3 Signature: <not in a wait><='row cache lock'     Chain 3 Signature Hash: 0xccefbfc0 ===============================================================================Non-intersecting chains: -------------------------------------------------------------------------------Chain 1:-------------------------------------------------------------------------------    Oracle session identified by:    {                instance: 2 (xxxxx)                   os id: 15343              process id: 67              session id: 2882        session serial #: 39285    }    is waiting for 'enq: TX - index contention' with wait info:    {                      p1: 'name|mode'=0x54580004                      p2: 'usn<<16 | slot'=0x470000                      p3: 'sequence'=0xa32a2d  ……    and is blocked by => Oracle session identified by:    {                instance: 2 (xxxxxxx)                   os id: 105251              process id: 79              session id: 3400       <<<<<<<<<<<<<<<<<<<<<<chain 1被session_id=3400会话阻塞了        session serial #: 22853    }    which is not in a wait:    {               last wait: 36.074155 sec ago                blocking: 4 sessions            wait history:              1.       event: 'ASM file metadata operation'                 time waited: 0.000018 sec (last interval)                 time waited: 0.001953 sec (total)                     wait id: 94              p1: 'msgop'=0x21                                              p2: 'locn'=0x0              * time between wait #1 and #2: 0.000000 sec              2.       event: 'KSV master wait'                 time waited: 0.001856 sec                     wait id: 96                            * time between wait #2 and #3: 0.000000 sec              3.       event: 'ASM file metadata operation'                 time waited: 0.000003 sec (last interval)                 time waited: 0.000079 sec (total)                     wait id: 94              p1: 'msgop'=0x21                                              p2: 'locn'=0x0    }Chain 1 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention'Chain 1 Signature Hash: 0xbeca5ef7------------------------------------------------------------------------------- ===============================================================================Intersecting chains: -------------------------------------------------------------------------------Chain 2:-------------------------------------------------------------------------------    Oracle session identified by:    {                instance: 2 (xxxxxxx)                   os id: 70061              process id: 69              session id: 2969        session serial #: 46127    }    …….    and is blocked by 'instance: 2, os id: 83344, session id: 2840',    which is a member of 'Chain 1'. <<<<<<<<<<<<<<<<<<<<<<被chain1阻塞了  Chain 2 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention'Chain 2 Signature Hash: 0xbeca5ef7------------------------------------------------------------------------------- -------------------------------------------------------------------------------Chain 3:-------------------------------------------------------------------------------    Oracle session identified by:    {                instance: 2 (xxxxxx)                   os id: 102165              process id: 75              session id: 3227        session serial #: 3457    }    is waiting for 'row cache lock' with wait info:    {                      p1: 'cache id'=0x6                      p2: 'mode'=0x0                      p3: 'request'=0x3            time in wait: 2.020260 sec      heur. time in wait: 1 min 59 sec           timeout after: 0.979740 sec                 wait id: 360                blocking: 0 sessions            wait history:              * time between current wait and wait #1: 0.000015 sec              1.       event: 'row cache lock'                 time waited: 3.001099 sec                     wait id: 359             p1: 'cache id'=0x6                                              p2: 'mode'=0x0                                              p3: 'request'=0x3              * time between wait #1 and #2: 0.000015 sec              2.       event: 'row cache lock'                 time waited: 3.000983 sec                     wait id: 358             p1: 'cache id'=0x6                                              p2: 'mode'=0x0                                              p3: 'request'=0x3              * time between wait #2 and #3: 0.000013 sec              3.       event: 'row cache lock'                 time waited: 3.000978 sec                     wait id: 357             p1: 'cache id'=0x6                                              p2: 'mode'=0x0                                              p3: 'request'=0x3    }    and is blocked by 'instance: 2, os id: 105251, session id: 3400'    which is a member of 'Chain 1'. <<<<<<<<<<<<<<<<<<<<<<被session_id 3400阻塞了 Chain 3 Signature: <not in a wait><='row cache lock'Chain 3 Signature Hash: 0xccefbfc0------------------------------------------------------------------------------- ===============================================================================Extra information that will be dumped at higher levels:[level  4] :   1 node dumps -- [LEAF] [LEAF_NW] [level  5] :   4 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW] State of ALL nodes [2839]/2/2840/17657/0x13d1a33000/83344/NLEAF/[3399][2881]/2/2882/39285/0x13c1a48b38/15343/NLEAF/[2839][2968]/2/2969/46127/0x13b1a69c08/70061/NLEAF/[2839][3226]/2/3227/3457/0x13c1ade618/102165/NLEAF/[3399][3399]/2/3400/22853/0x1391b4cad8/105251/LEAF_NW/*** 2019-08-21 23:57:14.908===============================================================================END OF HANG ANALYSIS===============================================================================*** 2019-08-21 23:58:53.765Suspected Hangs in the System                     Root       Chain Total               Hang                 Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution             ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action              ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------      1 HANG    VALID    2  3400     3   286   HIGH  LOCAL Terminate Process <<<<<<<<<<<<<<<<<<< 系统阻塞的可能原因是session id为3400的本地进程引起的  inst# SessId  Ser#     OSPID Event  ----- ------ ----- --------- -----      2      2 19927    111623 enq: TX - index contention      2   2840 17657     83344 buffer busy waits      2   3400 22853    105251 not in wait *** 2019-08-21 23:58:53.766HM: Short Stack of immediate waiter session ID 2840, OSPID 83344 of hang ID 1Short stack dump:ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2022<-kslwaitctx()+163<-kcbzwb()+1548<-kcbgcur()+8885<-ktfbhget()+254<-ktfbfsearch()+300<-ktfbtgex()+1240<-ktsxs_add()+2094<-ktspnr_next()+1115<-ktrsexec()+529<-ktspbmphwm()+1236<-ktspmvhwm()+53<-ktsp_bump_hwm()+212<-ktspgsp_main()+1941<-kdisnew()+279<-kdisnewle()+115<-kdisle()+4332<-kdiins0()+34743<-kdiinsp()+91<-kauxsin()+1784<-qesltcLoadIndexList()+922<-qesltcLoadIndexes()+55<-qerltcNoKdtBufferedInsRowCBK()+374<-qerltcSingleRowLoad()+279<-qerltcFetch()+379<-insexe()+691<-opiexe()+5632<-kpoal8()+2380<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
HM: Short Stack of root session ID 3400, OSPID 105251 of hang ID 1Short stack dump:ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-pwrite(

从上述trace文件可以看出系统HANG住的可能原因是session_id为 3400的会话引起的。根据查看dba_hist_active_sess_history视图可以看出session_id为 3400的会话就是alter database datafile resize的操作。

根据该故障现象,查询mos发现:Alter Tablespace Resize on Bigfile Tablespace Causes Sessions to Hang with ‘Enq: TX - contention’ and ‘Buffer busy waits’ (Doc ID 2089689.1) ,说明ORACLE 11.2.0.4及后续版本中存在该BUG,调整表空间大小时将阻塞表空间中的其他DDL操作。如何巧妙处理enq: TX - index contention问题?_Jav_07

问题解决



  • 建议在数据库负载较低的时候进行数据文件扩容

  • 建议每次表空间扩容的大小不要过大



出处:墨天轮(https://www.modb.pro/db/7440,复制到网页中打开)