Oracle有时会出现hang住的现象,原因可能很多,例如资源争用,如果需要知道为什么出现hang,可以分析等待链,而使用hanganalyze是其中一种方式,今天尝试一下。

首先,我们模拟场景,会话1,更新test表的第一行记录,但是不提交,

SQL> select sid from v$mystat where rownum=1;  
  SID
----------
  42


SQL> update test set object_id=1 where rownum=1;
1 row updated.

会话2,更新相同行,此时出现了hang,

SQL> update test set object_id=1 where rownum=1;

会话3,更新相同行,还是出现了hang,

SQL> update test set object_id=1 where rownum=1;

可能当Oracle在hang状态下正常的sqlplus已经不能连上,从10g开始,提供了另外一种登陆方式,就是带着prelim,

[oracle@bisal ~]$ sqlplus -prelim / as sysdba


SQL*Plus: Release 19.0.0.0.0 - Production on Fri Apr 23 09:03:52 2021
Version 19.3.0.0.0


Copyright (c) 1982, 2019, Oracle.  All rights reserved.

P.S. prelim的意思是preliminary connection

可以指定3级的hanganalyze,

SQL> oradebug hanganalyze 3
Statement processed.

不同的LEVEL打印的内容是不同的,当然级别越高,打印越多,消耗就会高,

1-2:Only HANGANALYZE output, no process dump at all(只有hanganalyeze输出,不dump任何进程)

3:Level 2 + Dump only processes thought to be in a hang (IN_HANG state)(DUMP IN_HANGE状态的进程)

4:Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)

5:Level 4 + Dump all processes involved in wait chains (NLEAF state)

10:Dump all processes (IGN state)

如果是CDB环境,要确认是CDB级别的hang还是PDB级别的hang。如果是PDB级别的hang,只需要收集PDB的信息。如果无法确认,建议收集一下CDB的信息。

从11.2.0.2起,使用"sqlplus -prelim"登陆后,hanganalyze不再有内容输出,因为需要一个进程状态对象和一个会话状态对象。所以执行hanganalyze分析后,只是显示执行成功,trace中只会记录,

HANG ANALYSIS:


ERROR: Can not perform hang analysis dump without a process
       state object and a session state object.
  ( process=(nil), sess=(nil) )

正常连接执行hanganalyze,会显示打印的trace文件路径和名称,

[oracle@bisal ~]$ sys
SQL*Plus: Release 19.0.0.0.0 - Production on Fri Apr 23 09:05:19 2021
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle.  All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0


SQL> oradebug hanganalyze 3
Hang Analysis in /opt/oracle/diag/rdbms/bisalcdb/BISALCDB/trace/BISALCDB_ora_47241.trc

从文件中,可以看到当前系统中存在两条等待链,等待事件都是enq: TX - row lock contention,Chain 1的会话是被sid=42的初始会话阻塞,Chain 2的会话同样被sid=42阻塞(which is a member of 'Chain 1'),

===============================================================================


Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
     Chain 1 Signature Hash: 0x38c48850
 [b] Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
     Chain 2 Signature Hash: 0x38c48850


===============================================================================
Non-intersecting chains:


-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (bisalcdb.bisalcdb)
                   os id: 41963
              process id: 51, oracle@bisal
              session id: 79
        session serial #: 18025
             module name: 5 (SQL*Plusbisal (TNS V1-V3))
                  pdb id: 5 (BISALPDB2)
    }
    is waiting for 'enq: TX - row lock contention' with wait info:
    {
                      p1: 'name|mode'=0x54580006
                      p2: 'usn<<16 | slot'=0x5000a
                      p3: 'sequence'=0x465
            time in wait: 1 min 42 sec
           timeout after: never
                 wait id: 25
                blocking: 0 sessions
          current sql_id: 2786213102
             current sql: update test set object_id=1 where rownum = 1
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2218<-kslwaitctx()+188<-ksqcmi()+21656<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+230<-ktcwit1()+367<-kdddgb()+5478<-kdusru()+458<-updrowFastPath()+1209<-qerupUpdRow()+1548<-qerupFetch()+471<-updaul()+1416<-updThreePhaseExe()+340<-updexe()+405<-opiexe()+11573<-kpoal8()+2387<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1895<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real(
            wait history:
              * time between current wait and wait #1: 0.044253 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 13.489353 sec
                     wait id: 24               p1: 'driver id'=0x54435000
                                               p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000000 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 23               p1: 'driver id'=0x54435000
                                               p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000011 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 0.000027 sec
                     wait id: 22               p1: 'driver id'=0x54435000
                                               p2: '#bytes'=0x1
}
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (bisalcdb.bisalcdb)
                   os id: 18256
              process id: 46, oracle@bisal
              session id: 42
        session serial #: 18536
             module name: 5 (SQL*Plusbisal (TNS V1-V3))
                  pdb id: 5 (BISALPDB2)
    }
    which is waiting for 'SQL*Net message from client' with wait info:
    {
                      p1: 'driver id'=0x54435000
                      p2: '#bytes'=0x1
            time in wait: 2 min 6 sec
           timeout after: never
                 wait id: 82
                blocking: 1 session
          current sql_id: 0
             current sql: <none>
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-read()+14<-snttread()+16<-nttfprd()+354<-nsbasic_brc()+399<-nioqrc()+438<-opikndf2()+999<-opitsk()+905<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
            wait history:
              * time between current wait and wait #1: 0.000007 sec
              1.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 81               p1: 'driver id'=0x54435000
                                               p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.001595 sec
              2.       event: 'db file scattered read'
                 time waited: 0.005296 sec
                     wait id: 80               p1: 'file#'=0x13
                                               p2: 'block#'=0x763
                                               p3: 'blocks'=0x5
              * time between wait #2 and #3: 0.000287 sec
              3.       event: 'db file sequential read'
                 time waited: 0.019587 sec
                     wait id: 79               p1: 'file#'=0x13
                                               p2: 'block#'=0x762
                                               p3: 'blocks'=0x1
    }


Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
-------------------------------------------------------------------------------


===============================================================================


Intersecting chains:


-------------------------------------------------------------------------------
Chain 2:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (bisalcdb.bisalcdb)
                   os id: 41963
              process id: 51, oracle@bisal
              session id: 79
        session serial #: 18025
             module name: 5 (SQL*Plusbisal (TNS V1-V3))
                  pdb id: 5 (BISALPDB2)
    }
    is waiting for 'enq: TX - row lock contention' with wait info:
    {
                      p1: 'name|mode'=0x54580006
                      p2: 'usn<<16 | slot'=0x5000a
                      p3: 'sequence'=0x465
            time in wait: 4 min 40 sec
           timeout after: never
                 wait id: 25
                blocking: 0 sessions
          current sql_id: 2786213102
             current sql: update test set object_id=1 where rownum = 1
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+187<-ksliwat()+2218<-kslwaitctx()+188<-ksqcmi()+21656<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+230<-ktcwit1()+367<-kdddgb()+5478<-kdusru()+458<-updrowFastPath()+1209<-qerupUpdRow()+1548<-qerupFetch()+471<-updaul()+1416<-updThreePhaseExe()+340<-updexe()+405<-opiexe()+11573<-kpoal8()+2387<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1895<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real(
            wait history:
              * time between current wait and wait #1: 0.044253 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 13.489353 sec
                     wait id: 24               p1: 'driver id'=0x54435000
                                               p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000000 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 23               p1: 'driver id'=0x54435000
                                               p2: '#bytes'=0x1
              * time between wait #2 and #3: 0.000011 sec
              3.       event: 'SQL*Net message from client'
                 time waited: 0.000027 sec
                     wait id: 22               p1: 'driver id'=0x54435000
                                               p2: '#bytes'=0x1
    }
    and is blocked by 'instance: 1, os id: 18256, session id: 42',
    which is a member of 'Chain 1'.


Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 2 Signature Hash: 0x38c48850
-------------------------------------------------------------------------------


===============================================================================

因此,从这个trace文件中,就可以很清晰地看到等待链的逻辑关系,进而有针对性的管理会话。