1.1ROW CACHE对象的定位

跟踪文件向下显示了更进一步的信息,地址为4f4e57138的Row Cache Parent Object紧跟着之前的信息显示出来,跟踪信息同时显示是在DC_OBJECTS层面出现的问题。

跟踪信息显示对象的锁定模式为排他锁定(mode=X)。

下图是跟踪文件的截取,我们可以看到Oracle的记录方式:

systemtap 参数_SQL

进一步的,跟踪文件里也显示了29号进程执行的SQL为INSERT操作:

----------------------------------------
SO: 4f2e82c88, type: 53, owner: 6c10508e8, flag:
INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=4f2e82c88
handle=4f528d510 mode=N
call pin=0 session pin=0 hpc=0000
hlc=0000
htl=4f2e82d08[4f2de4dd8,4f2e844c0]
htb=4e84c5db0 ssga=4e84c57c8
user=6c10508e8 session=6c10508e8 count=1
flags=[0000] savepoint=0x4bad2ee7
LIBRARY OBJECT HANDLE: handle=4f528d510
mtx=4f528d640(1) cdp=1
name=INSERT
/*+ APPEND*/ INTO CACI_INV_BLB_DC NOLOGGING SELECT :B1 , T.*, SYSDATE FROM
CACI_INV_BLB T
hash=6734e347f90993bcd607836585310c4d
timestamp=03-24-2010 06:01:54
namespace=CRSR
flags=RON/KGHP/TIM/PN0/MED/KST/DBN/MTX/[500100d0]
kkkk-dddd-llll=0000-0001-0001 lock=N pin=0
latch#=12 hpc=ffec hlc=ffec
lwt=4f528d5b8[4f528d5b8,4f528d5b8]
ltm=4f528d5c8[4f528d5c8,4f528d5c8]
pwt=4f528d580[4f528d580,4f528d580]
ptm=4f528d590[4f528d590,4f528d590]
ref=4f528d5e8[4f528d5e8,4f528d5e8]
lnd=4f528d600[4f581b4d8,4f5d190a8]
LIBRARY OBJECT: object=4a7227a50
type=CRSR flags=EXS[0001] pflags=[0000]
status=VALD load=0
CHILDREN: size=16
child#table referencehandle
------ -------- --------- --------
0
4a7227518 4a7227188 4ae9ed1f0
1 4a7227518 4a7227420 494cd5878
DATA BLOCKS:
data#heappointerstatus pins change whr
----- -------- -------- --------- ----
------ ---
0 4aebaa950 4a7227b68 I/P/A/-/-0 NONE00
----------------------------------------

好了,那么现在我们来看看是哪一个进程排他的锁定了之前的4f4e57138对象。在跟踪文件中搜索4f4e57138就可以很容易的找到这个持有排他锁定的SO对象。

以下显示了相关信息,Row

Cache对象的信息在此同样有所显示:

----------------------------------------
SO: 4e86f0508, type: 50, owner: 8c183c258, flag: INIT/-/-/0x00
row cache enqueue:
count=1 session=8c005d7c8 object=4f4e57138, mode=X
savepoint=0x2716
row cache parent object:
address=4f4e57138 cid=8(dc_objects)
hash=b363a728 typ=11
transaction=8c183c258 flags=00000002
own=4f4e57208[4e86f0538,4e86f0538]
wat=4f4e57218[4e86f0418,4e86f0418] mode=X
status=VALID/-/-/-/-/-/-/-/-
set=0, complete=FALSE
data=
00000038 00134944 585f4341 43495f49
4e565f42 4c425f44 43000000 00000000
00000000 04000009 505f3230 31305f51
31000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 000209ca
ffff0000 000209ca 14786e01 020e3239
786e0102 0e323978 6e01020e 32390100
00000000 00000000 00000000 00000000
00000000 00000006
----------------------------------------

再向上找到这个进程的信息,发现其进程号为16:

PROCESS 16:
----------------------------------------
SO: 8c00037d0, type: 2, owner: 0, flag:
INIT/-/-/0x00
(process) Oracle pid=16, calls cur/top:
8c0095028/8c0094aa8, flag: (0) -
int error: 0, call error: 0, sess
error: 0, txn error 0
(post info) last post received: 115 0 4
last post received-location:
kslpsr
last process to post me:
6c1002800 1 6
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me:
6c1002800 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc:
4f818c298
O/S info: user: oracle, term: UNKNOWN,
ospid: 8200
OSD pid info: Unix process pid: 8200,
image: oracle@SF2900 (J000)

在这里可以看到16号进程是一个JOB进程,其进程为J000,那么这个JOB进程在执行什么操作呢?下面的信息可以看出一些端倪,JOB由DBMS_SCHEDULER调度,执行AUTO_SPACE_ADVISOR_JOB任务,处于Wait for shrink lock等待:

Job Slave State
Object
Slave ID: 0, Job ID:
8913
----------------------------------------
SO: 8c005d7c8, type: 4, owner: 8c00037d0,
flag: INIT/-/-/0x00
(session) sid: 45 trans: 8c183c258, creator:
8c00037d0, flag: (48100041) USR/- BSY/-/-/-/-/-
DID: 0001-0010-0007BFA6,
short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0,
user: 0/SYS
O/S info: user: oracle, term: UNKNOWN, ospid:
8200, machine: SF2900
program: oracle@SF2900 (J000)
application
name: DBMS_SCHEDULER, hash value=2478762354
action name:
AUTO_SPACE_ADVISOR_JOB, hash value=348111556
waiting for 'Wait for
shrink lock' blocking sess=0x0 seq=5909 wait_time=0 seconds since wait
started=3101
object_id=0, lock_mode=0, =0
Dumping Session Wait History
for 'Wait for shrink lock' count=1
wait_time=380596
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107262
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107263
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107246
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107139
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107248
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107003
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107169
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107233
object_id=0, lock_mode=0, =0
for 'Wait for shrink lock' count=1
wait_time=107069
object_id=0, lock_mode=0, =0
temporary object counter: 3

进一步向下查找,可以找到Shrink操作执行的SQL语句:

----------------------------------------
SO: 4e8a2c6c0, type:
53, owner: 8c005d7c8, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK:
lock=4e8a2c6c0 handle=4c3c1ce60 mode=N
call pin=0 session
pin=0 hpc=0000 hlc=0000
htl=4e8a2c740[4e81436e0,4e8c80c98]
htb=4e8937910 ssga=4e8936e48
user=8c005d7c8
session=8c005d7c8 count=1 flags=[0000] savepoint=0x4bad2eec
LIBRARY OBJECT
HANDLE: handle=4c3c1ce60 mtx=4c3c1cf90(1) cdp=1
name=alter index
"CACI"."IDX_CACI_INV_BLB_DC" modify partition
"P_2010_Q1" shrink space CHECK
hash=0ed1a6f7b2cf775661d314b8d1b7659b
timestamp=03-25-2010 22:05:09
namespace=CRSR
flags=RON/KGHP/TIM/PN0/MED/KST/DBN/MTX/[500100d0]
kkkk-dddd-llll=0000-0001-0001
lock=N pin=0 latch#=15 hpc=0002 hlc=0002
lwt=4c3c1cf08[4c3c1cf08,4c3c1cf08]
ltm=4c3c1cf18[4c3c1cf18,4c3c1cf18]
pwt=4c3c1ced0[4c3c1ced0,4c3c1ced0]
ptm=4c3c1cee0[4c3c1cee0,4c3c1cee0]
ref=4c3c1cf38[4c3c1cf38,4c3c1cf38]
lnd=4c3c1cf50[4c3c1cf50,4c3c1cf50]
LIBRARY OBJECT: object=4aa2bf668
type=CRSR flags=EXS[0001] pflags=[0000]
status=VALD load=0
CHILDREN: size=16
child#table referencehandle
------ -------- --------- --------
0 49efa3fe8 49efa3c58 4c3ad91a8
1 49efa3fe8 49efa3ed8 4c3941608
DATA BLOCKS:
data#heappointerstatus pins change whr
----- -------- -------- --------- ---- ------
---
0 4c3589b38 4aa2bf780 I/P/A/-/-0 NONE00
----------------------------------------

至此,真相大白于天下:

1.系统通过DBMS_SCHEDULER调度执行AUTO_SPACE_ADVISOR_JOB任务

发出了SQL语句:

alter index

"CACI"."IDX_CACI_INV_BLB_DC" modify partition

"P_2010_Q1" shrink space CHECK

2.定时任务不能及时完成产生了排他锁定

3.客户端执行的INSERT操作挂起

INSERT语句为:

INSERT /*+ APPEND*/ INTO

CACI_INV_BLB_DC NOLOGGING SELECT :B1 , T.*, SYSDATE FROM CACI_INV_BLB T

Shrink Space的语句所以不能成功完成是因为该索引的相关数据表的数据量过为巨大。在Oracle10g中,缺省的有一个任务定时进行分析为用户提供空间管理建议,在进行空间建议前,Oracle执行Shrink Space Check,这个检查工作和Shrink的具体内部工作完全相同,只是不执行具体动作。

这个Shrink Space的检查对于客户环境显得多余。

现场解决这个问题,只需要将16号进程Kill掉,即可释放了锁定,后面的操作可以顺利进行下去。

By eygle on 2011-05-26 08:26 |

Comments (3) |

Case | 2804 |

3 Comments

鄙人觉得仿佛有更简洁的办法.

比如11gR1的v$session(BLOCKING_SESSION_STATUS,BLOCKING_SESSION,ROW_WAIT_OBJ#).

比如11gR2的V$SESSION_BLOCKERS.

或者 v$lock, v$latch 和 V$LATCH_CHILDREN and v$session,

"在出现数据库系统或进程失去响应时,如果SQL*Plus工具仍然可以连接,可能视图查询没有相应"

对不起, 忽略了上下文.

eygle大师,其实我不太认同你这个分析过程,明显你找到的这个process不是导致系统hang住的主要原因!能否将trace文件分享一下?