1.在检查rman删除归档日志的crontab时,发现日志里有ora-15028的报错,rman意外终止
  1. RMAN-00571: ===========================================================
  2. RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
  3. RMAN-00571: ===========================================================
  4. RMAN-03009: failure of delete command on default channel at 06/22/2021 07:33:02
  5. ORA-15028: ASM file '+ARCH/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313' not dropped; currently being accessed
  6.  
复制代码
先来看一下15028的错误描述
  1. [oracle@zystudio2 ~]$ oerr ora 15028
  2. 15028, 00000, "ASM file '%s' not dropped; currently being accessed"
  3. // *Cause:  An attempt was made to drop an ASM file, but the file was
  4. //          being accessed by one or more clients and therefore could
  5. //          not be dropped.
  6. // *Action: Stop all clients that are using this file and retry the drop
  7. //          command. Query the V$ASM_CLIENT fixed view in an ASM instance
  8. //          or use ASMCMD LSOF to list active clients.
  9.  
复制代码
   2.尝试手动删除,依然报错
  1. ASMCMD> cp thread_2_seq_252744.6807.1065318313 /tmp/thread_2_seq_252744.6807.1065318313
  2. copying +arch/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313 -> /tmp/thread_2_seq_252744.6807.1065318313
  3. ASMCMD> rm thread_2_seq_252744.6807.1065318313
  4. ORA-15032: not all alterations performed
  5. ORA-15028: ASM file '+arch/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)
复制代码
   3.asm里,lsof看一下文件被哪个实例占用
  1. ASMCMD> lsof -G arch
  2. DB_Name  Instance_Name  Path                                                                     
  3. zystudio  zystudio2         +arch/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313
复制代码
查询一下这个归档文件的状态
  1. SQL> select NAME,FIRST_TIME,CREATOR,REGISTRAR,STATUS,COMPLETION_TIME,archived,deleted from v$archived_log where name like '%2527%'
  2.   2  /
  3. NAME                                                                             FIRST_TIME   CREATOR REGISTR S COMPLETION_T ARC DEL
  4. -------------------------------------------------------------------------------- ------------ ------- ------- - ------------ --- ---
  5. +ARCH/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313          24-FEB-21    RMAN    RMAN    A 16-JUN-21    YES NO
  6.  
复制代码

    4.设置15028事件,在ASM实例下sqlplus / as sysasm
  1. SQL>alter system set events '15028 trace name systemstate_global level 267';
  2.  
  3. then,manual delete the archivelog in asmcmd
  4.  
  5. SQL>alter system set events '15028 trace name systemstate_global off';
复制代码
但是,发现trace目录没有生成diag*.trc的文件,所以需要重置一下
  1.     First identify the process id of the background process in question.  In this example I will use the LGWR background process:
  2.         SQL> select pid, program from v$process where program like '%DIAG%';
  3.  
  4.                 PID PROGRAM
  5.         ---------- ------------------------------------------------
  6.                   6 oracle@zystudio2 (DIAG)
  7.     OR
  8.         SQL> select pid, spid, pname from v$process where pname like '%DIAG%';
  9.  
  10.         PID SPID PNAME
  11.         --------------- ------------------------ -----
  12.         39 105458 DIAG
  13.     Secondly, use oradebug to set the orapid and thereby attach to the background process:
  14.         SQL> oradebug setorapid 6
  15.         Unix process pid: 21955, image: oracle@zystudio2 (DIAG)
  16.  
  17.         SQL> oradebug close_trace
复制代码
确认trace目录有diag.trc文件后,再做一遍15028事件

    5.再asm/db上分别跑如下脚本,会分别生生asm/db的会话和文件打开情况,已被后续排查(可选)
  1. spool <ASM|db>_<#>_lock_session_process_details.html
  2. SET MARKUP HTML ON
  3. set echo on
  4. set pagesize 200
  5. alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
  6. select 'THIS REPORT WAS GENERATED AT: ==)> ' , sysdate " "  from dual;
  7. select 'HOSTNAME ASSOCIATED WITH THIS INSTANCE: ==)> ' , MACHINE " " from v$session where program like '%SMON%';
  8. select * from gv$lock ;
  9. select * from v$session;
  10. select sw.sid, l.inst_id, l.block from gv$lock l, gv$session_wait sw where l.sid = sw.sid and l.inst_id = sw.inst_id;
  11. select * from v$process;
  12. select * from v$version;
  13. show parameter all
  14. spool off
  15. exit
复制代码
   6.查看15028事件生成的diag.trc文件,直接搜索有问题的文件,找到打开该文件的pid,ospid信息
  1. opennm: +ARCH/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313
  2.                     openflags: 0x80000000
  3.                     fullnm: +arch/zystudio/archivelog/2021_02_24/thread_2_seq_252744.6807.1065318313
  4.                     redun: 0x11, fdflg: 0x0, blksiz: 512, fsiz: 906972
  5.                     ftype: 4, extsz: [4294967295,0,0]
  6.                     extents: 443, start: 60, count: 383, xp: 0x0
  7.                     base: 0, lxcnt: 0
  8.                     enq: 0x700000131107bb8, cod: 0x0
  9.                     unlock[0].au: 0, unlock[0].disk: 0, unlock[0].flags 0x0
  10.                     unlock[1].au: 0, unlock[1].disk: 0, unlock[1].flags 0x0
  11.                     unlock[2].au: 0, unlock[2].disk: 0, unlock[2].flags 0x0
  12.                     unlock[3].au: 0, unlock[3].disk: 0, unlock[3].flags 0x0
  13.                     unlock[4].au: 0, unlock[4].disk: 0, unlock[4].flags 0x0
  14.                     unlock[5].au: 0, unlock[5].disk: 0, unlock[5].flags 0x0
  15.                     strpsz: 1048576, strpwdth: 1
  16.                     lnk: 0x0, 0x0
  17.                     aba: 0, 0
  18.                     <font color="Red">client pid: 361 osid: 20972400</font>
复制代码
   7.通过v$process,v$session查询client pid: 361 osid: 20972400的session信息,或者从第五步生成的html文件里直接搜索,得到的信息如下
  1. select sid,serial#,osuser,username,PROCESS,program,LOGON_TIME,PREV_EXEC_START,event,status from v$session where program like '%rman%';
  2.                                        
  3.        SID    SERIAL# OSUSER          USERNAME    PROGRAM                        LOGON_TIME   PREV_EXEC_ST EVENT                                    STATUS
  4. ---------- ---------- --------------- ----------- ------------------------------ ------------ ------------ ---------------------------------------- ----
  5.       1634      20363 oracle          SYS         rman@zystudio2 (TNS V1-V3)        02-DEC-20    02-DEC-20    Backup: MML write backup piece           ACTIVE
  6.       2503      13129 oracle          SYS         rman@zystudio2 (TNS V1-V3)        02-DEC-20    02-DEC-20    Backup: MML write backup piece           ACTIVE
  7.       2840      42185 oracle          SYS         rman@zystudio2 (TNS V1-V3)        24-FEB-21    24-FEB-21    Backup: MML write backup piece           ACTIVE
  8.       4058      19839 oracle          SYS         rman@zystudio2 (TNS V1-V3)        24-FEB-21    24-FEB-21    Backup: MML write backup piece           KILLED
  9.       4490        385 oracle          SYS         rman@zystudio2 (TNS V1-V3)        29-NOV-20    29-NOV-20    Backup: MML write backup piece           ACTIVE
  10.       4514       1279 oracle          SYS         rman@zystudio2 (TNS V1-V3)        29-NOV-20    29-NOV-20    Backup: MML write backup piece           ACTIVE
复制代码
   8.原来是被rman会话占用,从日期可以看到这个会话已经很久了,确认杀掉没有问题
  1. SQL> alter system disconnect session '4058,19839' immediate;
  2. SQL> alter system disconnect session '2840,42185' immediate;
  3. SQL> alter system disconnect session '1634,20363' immediate;
  4. SQL> alter system disconnect session '2503,13129' immediate;
  5. SQL> alter system disconnect session '4490,385' immediate;
  6. SQL> alter system disconnect session '4514,1279' immediate;
复制代码
   9.再次调用删除归档的rman脚本,问题解决。

ora-15028 rman 删除归档文件故障处理_rman