同事说某套Oracle中有一张94G容量的表需要清空,用delete删除的,但是一直卡着。

这个能理解,因为delete是DML操作,事务控制的,在删除的时候,会写入redo、undo等日志,为了能做回滚,或者异常恢复,删除的数据越多,消耗的资源越高,等待时间越久,因此不建议对大量数据的删除使用delete,应该用批量操作,分多个事务执行,分散资源消耗。

针对这个需求,是清空这张表,不用备份,所以应该用truncate,他是DDL操作,不会像delete这种消耗太多的资源。

于是开始执行truncate,

truncate table test;

此时提示错误,

ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

他的意思是存在未提交的事务,truncate不能获得锁资源,可以用如下SQL找出锁住的表对象,

select b.owner,b.object_name,a.session_id,a.locked_mode 
from v$locked_object a,dba_objects b 
where b.object_id = a.object_id;

可以用如下SQL找出引起锁住的会话,

select b.username,b.sid,b.serial#,logon_time 
from v$locked_object a,v$session b 
where a.session_id = b.sid order by b.logon_time;

有了这个sid和serial#,可以看下他正在执行的是什么SQL,

select distinct c.spid,b.sid,b.username,a.module,a.hash_value,sql_text 
from v$sql a,v$session b,v$process c
where a.hash_value=b.sql_hash_value and a.address=b.sql_address 
and b.paddr=c.addr and b.sid=&sid and b.serial#=&serial;

针对这个案例,看到阻塞会话正执行的是delete,module显示是dbeaver,这正是刚刚同事执行出现hang且未做提交的SQL,

delete from test where 1=1;

确认可删除会话,执行如下SQL,

alter system kill session 'sid,serial#';

此时再次执行,但是一直是hang住的,等了好几分钟,

truncate table test;

正常来讲,truncate应该执行很快,执行慢,说明有等待,看下等的是什么,等待的是"enq: RO - fast object reuse",

Oracle中执行truncate操作出现hang_python

图中显示,session_id=196是truncate,正在阻塞的是session_id=770,他对应的是后台进程CKPT,即检查点进程,

Oracle中执行truncate操作出现hang_队列_02

MOS这篇文章《Resolving Issues Where 'enq: RO - fast object reuse' Contention Seen During Drop or Truncate Operations (Doc ID 1475659.1)》,介绍了下enq: RO - fast object reuse这个等待事件,

The RO enqueue known as "Multiple object resue" enqueue, is used to synchronise operations between foreground process and a background process such as DBWR or CKPT. It is typically used when dropping objects or truncating tables.

Following is the sequence of events When a truncate/drop occurs:
1. Foreground process acquires the "RO" enqueue in exclusive mode
2. Cross instance calls (or one call if it is a single object) are issued ("CI" enqueue is acquired)
3. CKPT processes on each of instances requests the DBWR to write the dirty buffers to the disk and invalidate all the clean buffers.
4. After DBWR completes writing all blocks, the foreground process releases the RO enqueue.

大致意思是RO队列是用来同步前台进程(执行truncate、drop等操作)和后台进程操作(DBWR、CKPT)的,这个队列的等待会特别发生在drop删除对象和truncate表的时候。

而且和RO等待以及truncate执行慢相关的bug还是不少的,

Oracle中执行truncate操作出现hang_python_03

在查资料的过程中,truncate竟然跑完了,14:56-15:15,大约20分钟。这套库很忙,频繁的写入,日志切换很频繁,可能会引起DBWR进程的等待,而执行truncate需要做Checkpoint,同样会等待DBWR,怀疑有无可能这些并发的操作,导致RO等待队列变长,作为临时解决,清除缓存,或者重启数据库,或许可行,当然不排除bug存在的可能。

找机会重现问题,我们再做讨论。

参考资料,

http://www.dboracle.com/archivers/enq-ro-fast-object-reuse%E9%97%AE%E9%A2%98%E5%A4%84%E7%90%86.html