Bug 25874678 - XA_COMMIT / ROLLBACK from remote node do nothing silently on RAC Multitenant environm_数据库

说一下因某种原因无法升级应该如何处理

分布式错误处理  

select local_tran_id,state from dba_2pc_pending;

LOCAL_TRAN_ID STATE  
---------------------- ----------------  
2.12.64845 collecting

远程: select local_tran_id,state from dba_2pc_pending; no rows selected 即表示本地数据库要求其他点做好commit或者rollback准备,现在正在“收集”其他点的数据库的返回信息,但是远程数据库未知状态(in doubt)。

我们需要将本地的Global Coordinator的状态清除掉:

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');

或者我们在查询的时候发现是如下的状态:

本地:

select local_tran_id,state from dba_2pc_pending;  

LOCAL_TRAN_ID STATE
---------------------- ----------------  
2.12.64845 prepared

远程: select local_tran_id,state from dba_2pc_pending; no rows selected 即表示本地Global Coordinator已经做好准备,已经将分布式锁放到各个事务的表上,但是远程数据库的状态再次未知(in doubt),我们需要手工的将本地的transaction rollback掉,并且清除分布式事务信息:

本地:

rollback force 'local_tran_id'; 
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('132.11.53872'); 

COMMIT PHASE:

Global Coordinator将最大scn传到commit point site,要求其commit。  

commit point尝试commit或者rollback。分布式事务锁释放。  

commit point通知Global Coordinator已经commit。  

Global Coordinator通知分布式事务的所有点进行commit。

如果数据库在此阶段出现问题,我们查询本地:  

select local_tran_id,state from dba_2pc_pending;

LOCAL_TRAN_ID STATE
---------------------- ----------------  
2.12.64845 prepared

远程:

select local_tran_id,state from dba_2pc_pending;

LOCAL_TRAN_ID STATE
---------------------- ----------------  
1.92.66874 commited

即远程数据库可能已经commit,但是本地Global Coordinator未知远程数据库的状态,还是处于prepare的状态。我们需要在如下处理:

本地:

commit force 'local_tran_id'; 
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('4.11.73640');

远程:

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');

或者我们在查询的时候发现是如下的状态:本地:  

select local_tran_id,state from dba_2pc_pending;  

LOCAL_TRAN_ID STATE
---------------------- ----------------
2.12.64845 commited

远程:

select local_tran_id,state from dba_2pc_pending;  

LOCAL_TRAN_ID STATE  
---------------------- ----------------  
1.92.66874 commited

即远程数据库和本地数据库均已经完成commit,但是分布式事务的信息尚未清除,我们需要在本地和远程运行:本地:

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');

远程:

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');


FORGET PHASE:  

参与的点通知commit point site他们已经完成commit,commit point site就能忘记(forget)这个事务。  

commit point site在远程数据库上清除分布式事务信息。  

commit point site通知Global Coordinator可以清除本地的分布式事务信息。

Global Coordinator清除分布式事务信息。此时如果出现问题,我们查询:

本地:

select local_tran_id,state from dba_2pc_pending;

LOCAL_TRAN_ID STATE  
---------------------- ----------------  
2.12.64845 commited

远程:

select local_tran_id,state from dba_2pc_pending;

no rows selected 即远程commit point site已经完成commit,通知Global Coordinator清除本地的分布式事务信息,但是Global Coordinator没有收到该信息。我们需要这样处理:本地:

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');

综上,分布式事务的依次状态为:

本地DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY prepared / 本地rollback force后PURGE_LOST_DB_ENTRY commit prepared commited 本地commit force后本地和远程均PURGE commited commited 本地和远程均PURGE_LOST_DB_ENTRY forget commited / 本地PURGE_LOST_DB_ENTRY 另,当我们遇到使用rollback/commit force的时候,无法正常的清除分布式事务的信息,会报错ORA-02058: no prepared transaction found with ID X.XX.XXXXX时,我们需要通过手工方式来清除该信息。(注,以下方式修改数据字典,存在风险,使用前请备份好你的数据库)情况1,在dba_2pc表中还有事务记录,但是实际已经不存在该事务了:  

select local_tran_id, state from dba_2pc_pending;  

LOCAL_TRAN_ID STATE
---------------------- ----------------  
1.92.66874 prepared

(注:'1.92.66874' 的结构为rbs#, slot#, wrap#,此事务在rollback segment #1)我

们再用如下语句找出使用rollback segment #1且状态是active的transaction:  

select tablespace_name, status, segment_name 
from dba_rollback_segs 
where status != 'OFFLINE'; 
SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */ KTUXESTA Status, KTUXECFL Flags 
FROM x$ktuxe 
WHERE ktuxesta!='INACTIVE' AND ktuxeusn= 50;

<== 这是rollback segment#,即rbs# no rows selected 因此我们在rollback force的时候会报错:  
ORA-02058: no prepared transaction found with ID 1.92.66874

我们需要如下处理:

delete from sys.pending_trans$ where local_tran_id = '7.30.2568551'; 
delete from sys.pending_sessions$ where local_tran_id = '7.30.2568551';
delete from sys.pending_sub_sessions$ where local_tran_id = '7.30.2568551'; 
commit;

情况2,这种情况比较少见,在dba_2pc表中无法查到分布式事务信息,但是实际上却是存在该分布式事务的:我们在alertlog中可以看到:  

ORA-1591: lock held by in-doubt distributed transaction 1.92.66874  

我们查询dba_2pc的表,发现没有分布式事务信息:  

select local_tran_id, state from dba_2pc_pending where local_tran_id='7.30.2568551';

no rows selected 但是去查实际的rollback segment信息,却发现有prepared状态的分布式事务存在:  

SELECT KTUXEUSN, KTUXESLT, KTUXESQN, KTUXESTA Status, KTUXECFL Flags 
FROM x$ktuxe WHERE ktuxesta!='INACTIVE' AND ktuxeusn= 7;

 KTUXEUSN KTUXESLT KTUXESQN STATUS FLAGS  
---------- ---------- ---------- ---------------- ------------------------  
1 92 66874 PREPARED SCO|COL|REV|DEAD

我们无法做commit force或者rollback force: rollback force '1.92.66874'; ORA-02058: no prepared transaction found with ID 1.92.66874  

我们用如下的方式手工清理:

alter system disable distributed recovery; 
insert into pending_trans$ ( LOCAL_TRAN_ID,GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME, RECO_TIME) values( '7.30.2568551',306206,'XXXXXXX.12345.1.2.3','prepared','P', hextoraw( '00000001' ),hextoraw( '00000000' ),0, sysdate, sysdate ); 
insert into pending_sessions$ values( '7.30.2568551',1, hextoraw('05004F003A1500000104'), 'C', 0, 30258592, '',146); 
commit; 
commit force '7.30.2568551';

此时如果commit force还是出现报错,需要继续执行:  

delete from pending_trans$ where local_tran_id='1.92.66874'; 
delete from pending_sessions$ where local_tran_id='1.92.66874';

 commit;

 alter system enable distributed recovery;

此时如果没有报错,则执行以下语句:  

alter system enable distributed recovery; 
and purge the dummy entry from the dictionary, using connect / as sysdba 
alter session set "_smu_debug_mode" = 4;

(注:如果使用auto的undo管理方式,需要执行此步骤,此步骤能避免在后续执行purge_lost_db_entry出现ORA-01453 的报错,详细信息可见Bug2191458)  

commit;  
exec dbms_transaction.purge_lost_db_entry('7.30.2568551')