模拟服务器突然掉电,数据库丢失,事务无法完成回滚故障的恢复
数据库的状态如下:
SQL> select instance_name,status from v$instance;
INSTANCE_NAME    STATUS
---------------- ------------
marven           OPEN
非归档模式:
SQL> archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            /u01/app/oracle/product/10.2.0/db_1/dbs/arch
Oldest online log sequence     25
Current log sequence           27

SQL> set linesize 150 pagesize 300
SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
         1          1         25   52428800          1 NO  CURRENT                 384492 29-JAN-12
         2          1         23   52428800          1 NO  INACTIVE                371795 29-JAN-12
         3          1         24   52428800          1 NO  INACTIVE                377698 29-JAN-12

采用shutdown abort模拟突然掉电,数据库关闭的情形,此时数据库可以正常启动到MOUNT状态
SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2084296 bytes
Variable Size             385876536 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
Database mounted.

此时日志状态如下:
SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
         1          1         25   52428800          1 NO  ACTIVE                  384492 29-JAN-12
         3          1         27   52428800          1 NO  CURRENT                 391086 29-JAN-12
         2          1         26   52428800          1 NO  ACTIVE                  389733 29-JAN-12
删除所有的重做日志文件:
SQL>!rm /u01/app/oracle/oradata/marven/redo*.log
SQL> shutdown abort
ORACLE instance shut down.

SQL>startup nomount

Total System Global Area 1610612736 bytes
Fixed Size                  2084296 bytes
Variable Size             385876536 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
SQL> alter database mount;

Database altered.

在打开数据库时发现无法正常打开
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/marven/redo01.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
试图通过resetlog打开数据库发现是行不通的:
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01139: RESETLOGS option only valid after an incomplete database recovery

spfile参数文件的目录如下:
SQL> show parameter spfile;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /u01/app/oracle/product/10.2.0
                                                /db_1/dbs/spfilemarven.ora
备份spfile文件:
SQL> create pfile from spfile;

File created.
添加如下参数:
SQL> alter system set "_allow_resetlogs_corruption"=true scope=spfile;


System altered.

SQL> SQL> shutdown immediate
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2084296 bytes
Variable Size             385876536 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
Database mounted.
此时采用控制文件做不完全恢复:
SQL> recover database using backup controlfile until cancel;
ORA-00279: change 387606 generated at 01/29/2012 15:34:10 needed for thread 1
ORA-00289: suggestion : /u01/app/oracle/product/10.2.0/db_1/dbs/arch1_25_773853180.dbf
ORA-00280: change 387606 for thread 1 is in sequence #25


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/marven/system01.dbf'


ORA-01112: media recovery not started


SQL> recover database using backup controlfile until cancel;
ORA-00279: change 387606 generated at 01/29/2012 15:34:10 needed for thread 1
ORA-00289: suggestion : /u01/app/oracle/product/10.2.0/db_1/dbs/arch1_25_773853180.dbf
ORA-00280: change 387606 for thread 1 is in sequence #25


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

ORA-00308: cannot open archived log '/u01/app/oracle/product/10.2.0/db_1/dbs/arch1_25_773853180.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3


ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u01/app/oracle/oradata/marven/system01.dbf'

此时采用resetlogs打开数据库出现如报错:
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced

SQL> startup  
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn't exist
SQL> quit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

重新连接后启动即可:
[oracle@hpserver2 ~]$ sqlplus "/as sysdba"

SQL*Plus: Release 10.2.0.4.0 - Production on Sun Jan 29 16:03:33 2012

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2084296 bytes
Variable Size             385876536 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
Database mounted.

数据库此时可以正常打开:
SQL> alter database open;

Database altered.

SQL> select status from v$instance;

STATUS
------------
OPEN

但是在alert中出现了如下大量告警,并且在正常关闭数据库时也会出现如下告警:
Completed: alter database open
Sun Jan 29 16:04:48 2012
SMON: slave died unexpectedly, downgrading to serial recovery
Sun Jan 29 16:04:51 2012
Errors in file /u01/app/oracle/admin/marven/bdump/marven_smon_27919.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
ORACLE Instance marven (pid = 8) - Error 600 encountered while recovering transaction (8, 44).
Sun Jan 29 16:04:51 2012
Errors in file /u01/app/oracle/admin/marven/bdump/marven_smon_27919.trc:
ORA-00600: internal error code, arguments: [4137], [], [], [], [], [], [], []
Sun Jan 29 16:04:52 2012
Errors in file /u01/app/oracle/admin/marven/bdump/marven_smon_27919.trc:
ORA-00600: internal error code, arguments: [4194], [6], [17], [], [], [], [], []
Doing block recovery for file 2 block 38906
Block recovery from logseq 2, block 83 to scn 407712
Sun Jan 29 16:04:52 2012
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/marven/redo02.log
Block recovery stopped at EOT rba 2.87.16
Block recovery completed at rba 2.87.16, scn 0.407711
Doing block recovery for file 2 block 57
Block recovery from logseq 2, block 83 to scn 407710
Sun Jan 29 16:04:52 2012
Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0
  Mem# 0: /u01/app/oracle/oradata/marven/redo02.log
Block recovery completed at rba 2.87.16, scn 0.407711
Sun Jan 29 16:04:52 2012
Errors in file /u01/app/oracle/admin/marven/bdump/marven_smon_27919.trc:
ORA-01595: error freeing extent (42) of rollback segment (4))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [6], [17], [], [], [], [], []
Sun Jan 29 16:09:53 2012
Errors in file /u01/app/oracle/admin/marven/bdump/marven_p003_28666.trc:
ORA-07445: exception encountered: core dump [kcbipnns()+28] [SIGSEGV] [Address not mapped to object] [0x7FA3000000C8] [] []
其中trace文件中大量如下信息:
SMON: mark undo segment 8 as available
SMON: mark undo segment 8 as available
SMON: mark undo segment 8 as available
SMON: mark undo segment 8 as available
SMON: mark undo segment 8 as available

另外数据库无法正常关闭:
SQL> shutdown immediate
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [6], [17], [], [], [], [],[]

SQL> startup
ORA-01081: cannot start already-running ORACLE - shut it down first
SQL> shutdown immediate
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [6], [17], [], [], [], [],[]

SQL> select status from v$instance;

STATUS
------------
OPEN

以上是由于采用了resetlogs不一致打开数据库造成,部分回滚段事务无法使用造成,由于oracle 10g 回滚段是采用自动管理,所以调整成手工管理:
SQL> alter system set undo_management=manual scope=spfile;

System altered.
以上可见8号回滚段存在问题:
SQL> alter system set "_corrupted_rollback_segments"='_SYSSMU8$' scope=spfile;

System altered.

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2084296 bytes
Variable Size             385876536 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
Database mounted.
Database opened.
SQL> alter system switch logfile;

System altered.

SQL> alter system checkpoint;

System altered.

SQL> alter system switch logfile;

System altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1610612736 bytes
Fixed Size                  2084296 bytes
Variable Size             385876536 bytes
Database Buffers         1207959552 bytes
Redo Buffers               14692352 bytes
Database mounted.
Database opened.
SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

数据库正常。