客户环境为Oracle 10201,64位数据库。数据库启动时报 ORA-00600: [kcratr1_lastbwr] 错误无法打开。查看mos给的处理方法如下:

APPLIES TO:

Oracle Server - Enterprise Edition - Version 10.2.0.1 to 11.1.0.6 [Release 10.2 to 11.1]
Information in this document applies to any platform.
***Checked for relevance on 13-Sep-2010***


SYMPTOMS

After a disk failure that caused the database to crash, the instance fails to start up with ORA-00600: arguments: [kcratr1_lastbwr].
The alert log file shows the following entries :
Completed: ALTER DATABASE MOUNT
Tue Sep 19 09:43:03 2006
ALTER DATABASE OPEN
Block change tracking file is current.
Tue Sep 19 09:43:04 2006
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Tue Sep 19 09:43:04 2006
Started redo scan
Tue Sep 19 09:43:05 2006
Errors in file gns80_ora_9936.trc:
ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [],[], [], []
Tue Sep 19 09:43:06 2006
Aborting crash recovery due to error 600

CHANGES

There was a disk problem that caused the database to crash.

CAUSE

Oracle is unable to perform instance recover but it works when is invoked manually.

SOLUTION

Mount the database and issue a recover statement

SQL> startup mount;

SQL> recover database;

SQL> alter database open;

根据mos方法无法恢复数据库。最终通过http://www.ixdba.net/article/f9/277.html站点提供方法问题得以解决。以下为转载内容:


案例1:


数据库掉电,重启发现如下错误:

SQL> select open_mode from v$database;
OPEN_MODE
----------
MOUNTED

奇怪,于是先把数据库关闭,然后重新启动,报错如下:

SQL> startup
ORACLE instance started.
Total System Global Area 1375731712 bytes
Fixed Size 1260780 bytes
Variable Size 603980564 bytes
Database Buffers 754974720 bytes
Redo Buffers 15515648 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [], [], [], []

查看ALERT日志,发现

Errors in file /opt/oracle/admin/billdb/udump/billdb_ora_7186.trc:
ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

打开上面提到的trace文件

*** SERVICE NAME:() 2007-10-11 10:08:49.493
*** SESSION ID:(989.3) 2007-10-11 10:08:49.493
Successfully allocated 2 recovery slaves
Using 543 overflow buffers per recovery slave
Thread 1 checkpoint: logseq 67179, block 2, scn 2132065607
cache-low rba: logseq 67179, block 5453
on-disk rba: logseq 67179, block 5828, scn 2132066974
Starting CRASH recovery for thread 1 sequence 67180 block 1
Thread 1 current log 5
Scanning log 5 thread 1 sequence 67179
Scanning log 6 thread 1 sequence 67178
Cannot find online redo log for thread 1 sequence 67180
start recovery at logseq 67179, block 5453, scn 0
----- Redo read statistics for thread 1 -----
Read rate (ASYNC): 252Kb in 0.03s => 8.22 Mb/sec
Total physical reads: 252Kb
Longest record: 9Kb, moves: 0/341 (0%)
Change moves: 1/24 (4%), moved: 0Mb
Longest LWN: 35Kb, moves: 0/110 (0%), moved: 0Mb
Last redo scn: 0x0000.7f14c2af (2132066991)
----------------------------------------------
******** WRITE VERIFICATION FAILED ********
File 15 Block 89 (rdba 0x3c00059)
BWR version: 0x0000.7f14c25e.01 flg: 0x04
Disk version: 0x0000.7f14c12d.01 flag: 0x04
*** 2007-10-11 10:08:49.541
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [], [], [], []
Current SQL statement for this session:
ALTER DATABASE OPEN

上面提示说找不到67180的日志了,去日志目录看看明明在的,于是跑到metalink上去,查到如下结果:


Changes

There was a disk problem that caused the database to crash.

Cause

Oracle is unable to perform instance recover but it works when is invoked manually.

Solution

Mount the database and issue a recover statement

接着执行恢复:

SQL> startup mount;
SQL> recover database;
SQL> alter database open

于是照搬上面的步骤,手工恢复后问题解决


以下是官方解释:

Unbreakable Oracle 10g Release 2 : What if you have ORA-600 kcratr1_lastbwr ?

This an interesting story that happened yesterday on one of our customer site. An engineer powered off the wrong rack of equipment containing a Sun Fire X4600 running Oracle 10g Release 2. Almost no transactions were performed at time so when the system came up the customer expected the database to be up and running very quickly.


In reality this is what happened :


Completed: ALTER DATABASE MOUNT

Tue Nov 7 11:19:42 2006

ALTER DATABASE OPEN

Tue Nov 7 11:19:42 2006

Beginning crash recovery of 1 threads

parallel recovery started with 16 processes

Tue Nov 7 11:19:44 2006

Started redo scan

Tue Nov 7 11:19:44 2006

Errors in file /xxx/oracle/oracle/product/10.2.0/db_1/admin/xxx/udump/xxx_ora_947.trc:

ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [], [], [], []

Tue Nov 7 11:19:44 2006

Aborting crash recovery due to error 600

Tue Nov 7 11:19:44 2006

Errors in file /xxx/oracle/oracle/product/10.2.0/db_1/admin/xxxtest/udump/xxxtest_ora_947.trc:

ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [], [], [], []

ORA-600 signalled during: ALTER DATABASE OPEN...



Not too pretty ! Checking the ASM configuration and the IO subsystem showed nothing wrong. So what to do if you do not have a backup handy ?


Well, here is the idea .... what would we do if we had a backup that was inconsistent ?

The recover database command will start an Oracle process which will roll forward all transactions stored in the restored archived logs necessary to make the database consistent again. The recovery process must run up to a point that corresponds with the time just before the error occurred after which the log sequence must be reset to prevent any further system changes from being applied to the database.

So we tried :


startup mount


ALTER DATABASE MOUNT

Tue Nov 7 11:54:03 2006

Starting background process ASMB

ASMB started with pid=61, OS id=1070

Starting background process RBAL

RBAL started with pid=67, OS id=1074

Tue Nov 7 11:54:13 2006

SUCCESS: diskgroup xxxTESTDATA was mounted

Tue Nov 7 11:54:17 2006

Setting recovery target incarnation to 2

Tue Nov 7 11:54:17 2006

Successful mount of redo thread 1, with mount id 2364224219

Tue Nov 7 11:54:17 2006

Database mounted in Exclusive Mode

Completed: ALTER DATABASE MOUNT

Tue Nov 7 11:54:32 2006



recover database


ALTER DATABASE RECOVER database

Tue Nov 7 11:54:32 2006

Media Recovery Start

parallel recovery started with 16 processes

Tue Nov 7 11:54:33 2006

Recovery of Online Redo Log: Thread 1 Group 3 Seq 4 Reading mem 0

Mem# 0 errs 0: +xxxTESTDATA/xxxtest/onlinelog/group_3.263.605819131

Tue Nov 7 11:59:25 2006

Media Recovery Complete (xxxtest)

Tue Nov 7 11:59:27 2006

Completed: ALTER DATABASE RECOVER database



alter database open


Tue Nov 7 12:03:01 2006

alter database open

Tue Nov 7 12:03:01 2006

Beginning crash recovery of 1 threads

parallel recovery started with 16 processes

Tue Nov 7 12:03:01 2006

Started redo scan

Tue Nov 7 12:03:01 2006

Completed redo scan

273 redo blocks read, 0 data blocks need recovery

Tue Nov 7 12:03:01 2006

Started redo application at

Thread 1: logseq 4, block 12858574

Tue Nov 7 12:03:01 2006

Recovery of Online Redo Log: Thread 1 Group 3 Seq 4 Reading mem 0

Mem# 0 errs 0: +xxxTESTDATA/xxxtest/onlinelog/group_3.263.605819131

Tue Nov 7 12:03:01 2006

Completed redo application

Tue Nov 7 12:03:01 2006

Completed crash recovery at

Thread 1: logseq 4, block 12858847, scn 824040

0 data blocks read, 0 data blocks written, 273 redo blocks read

Tue Nov 7 12:03:02 2006

Thread 1 advanced to log sequence 5

Thread 1 opened at log sequence 5

Current log# 1 seq# 5 mem# 0: +xxxTESTDATA/xxxtest/onlinelog/group_1.261.605819081

Successful open of redo thread 1

Tue Nov 7 12:03:02 2006

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

Tue Nov 7 12:03:02 2006

SMON: enabling cache recovery

Tue Nov 7 12:03:03 2006

Successfully onlined Undo Tablespace 1.

Tue Nov 7 12:03:03 2006

SMON: enabling tx recovery

Tue Nov 7 12:03:03 2006

Database Characterset is UTF8

replication_dependency_tracking turned off (no async multimaster replication found)

Starting background process QMNC

QMNC started with pid=56, OS id=1128

Tue Nov 7 12:03:05 2006

Completed: alter database open


And we are up and running ! The real thing that Oracle should work on is the quality and clarity of their error messages.

At this point this is quite poor ...


Unbreakable database, maybe. Automatic (and simple) , not yet.



案例二:


如果在执行recover时刻出现


SQL> recover database;

ORA-00283: recovery session canceled due to errors

ORA-12801: error signaled in parallel query server P002

ORA-10562: Error occurred while applying redo to data block (file# 1, block#

4568)

ORA-10564: tablespace SYSTEM

ORA-01110: data file 1: '/opt/oracle/oradata/orcl/system01.dbf'

ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 576

ORA-00600: internal error code, arguments: [6101], [


此时,检查日志信息如下:


Mon Nov 19 15:38:50 2007

ALTER DATABASE RECOVER database

Mon Nov 19 15:38:50 2007

Media Recovery Start

parallel recovery started with 3 processes

Mon Nov 19 15:38:50 2007

Recovery of Online Redo Log: Thread 1 Group 3 Seq 16 Reading mem 0

Mem# 0 errs 0: /opt/oracle/oradata/orcl/redo03.log

Mon Nov 19 15:38:50 2007

Errors in file /opt/oracle/admin/orcl/bdump/orcl_p002_7917.trc:

ORA-00600: internal error code, arguments: [6101], [0], [17], [0], [], [], [], []

Mon Nov 19 15:38:50 2007

Errors in file /opt/oracle/admin/orcl/bdump/orcl_p000_7913.trc:

ORA-00600: internal error code, arguments: [3020], [2], [882], [8389490], [], [], [], []

ORA-10567: Redo is inconsistent with data block (file# 2, block# 882)

ORA-10564: tablespace UNDOTBS1

ORA-01110: data file 2: '/opt/oracle/oradata/orcl/undotbs01.dbf'

ORA-10560: block type 'KTU UNDO BLOCK'

Mon Nov 19 15:38:51 2007

Errors in file /opt/oracle/admin/orcl/bdump/orcl_p000_7913.trc:

ORA-00600: internal error code, arguments: [3020], [2], [882], [8389490], [], [], [], []

ORA-10567: Redo is inconsistent with data block (file# 2, block# 882)

ORA-10564: tablespace UNDOTBS1

ORA-01110: data file 2: '/opt/oracle/oradata/orcl/undotbs01.dbf'

ORA-10560: block type 'KTU UNDO BLOCK'

Mon Nov 19 15:38:51 2007

Errors in file /opt/oracle/admin/orcl/bdump/orcl_p002_7917.trc:

ORA-10562: Error occurred while applying redo to data block (file# 1, block# 4568)

ORA-10564: tablespace SYSTEM

ORA-01110: data file 1: '/opt/oracle/oradata/orcl/system01.dbf'

ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object# 576

ORA-00600: internal error code, arguments: [6101], [0], [17], [0], [], [], [], []

Mon Nov 19 15:38:54 2007

Errors in file /opt/oracle/admin/orcl/bdump/orcl_p001_7915.trc:

ORA-00600: internal error code, arguments: [kddummy_blkchk], [1], [1658], [6101], [], [], [], []

Mon Nov 19 15:38:54 2007

Errors in file /opt/oracle/admin/orcl/bdump/orcl_p001_7915.trc:

ORA-10562: Error occurred while applying redo to data block (file# 1, block# 1658)

ORA-10564: tablespace SYSTEM

ORA-01110: data file 1: '/opt/oracle/oradata/orcl/system01.dbf'

ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 237

ORA-00607: Internal error occurred while making a change to a data block

ORA-00600: internal error code, arguments: [kddummy_blkchk], [1], [1658], [6101], [], [], [], []

Mon Nov 19 15:38:54 2007

Media Recovery failed with error 12801

ORA-283 signalled during: ALTER DATABASE RECOVER database ...


从上面信息中抓取了一个信息:

ORA-10562: Error occurred while applying redo to data block (file# 1, block# 1658)

针对这个错误解决如下:


ORA-10562: Error occurred while applying redo to data block (file# string, block# string)

Cause: See other errors on error stack.

Action: Investigate why the error occurred and how important is the data block. Media and standby database recovery usually can continue if user allows recovery to corrupt this data block。

从日志信息可以基本判断问题如下:

当前在线日志损坏,导致undo回滚段出现问题,又由于系统突然掉电,系统表空间在重启实例后要进行实例恢复,当前在线日志损坏,系统表空间无法进行recover,因而出现了上面的错误。



找到了问题,解决访问就有了。


由于没有备份,数据库也运行在非归档模式下,所以恢复如下步骤:


SQL>startup mount

SQL>recover database using backup controlfile until cancel;

SQL>alter database open resetlogs;


SQL> startup mount

SQL> alter system set “_allow_resetlogs_corruption”=true scope=spfile;

SQL>shutdown immediate

SQL> startup mount

SQL> alter database open resetlogs;


SQL> startup



我的基本恢复步骤:

SQL>startup mount

SQL>recover database using backup controlfile until cancel;

Cancel

SQL>alter database open resetlogs;

此时会提示system表空间需要恢复,但是由于当前日志损坏,无法进行恢复,所以需要加入隐含参数,oracle才不会监测scn的一致性,才能打开数据库。


重启数据库加入隐含参数

SQL> startup mount

SQL> alter system set “_allow_resetlogs_corruption”=true scope=spfile;

SQL> shutdown immediate

SQL> startup mount

SQL> alter database open resetlogs;

alter database open resetlogs

*

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

不管这些,再次登录sqlplus起动数据库

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Nov 16 08:03:43 2007

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to:

Oracle Database 10gEnterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options


SQL>startup


数据库正常打开,exp备份需要的数据,然后重新建库,导入数据。