首先我们来看下朋友传我的alert log:

Sat May 24 18:29:27 2014       


        SMON: enabling tx recovery       


        Sat May 24 18:29:27 2014       


        Database Characterset is ZHS16GBK       


        Sat May 24 18:29:28 2014       


        FAST_START_MTTR_TARGET 300 is out of the valid MTTR range, use 540 instead.       


        Sat May 24 18:31:36 2014       


        Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc:       


        ORA-00600: internal error code, arguments: [kcoapl_blkchk], [1], [51720], [6401], [], [], [], []       





        ......省略部分内容       


        Sat May 24 18:55:36 2014       


        Errors in file e:\oracle\admin\oracle\udump\oracle_ora_404.trc:       


        ORA-00600: 内部错误代码,参数: [12700], [18], [4246724], [2], [], [], [], []       





        ORACLE Instance oracle (pid = 6) - Error 600 encountered while recovering transaction (10, 38) on object 36.       


        Sat May 24 18:58:13 2014       


        Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc:       


        ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []       





        Sat May 24 19:09:55 2014       


        Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc:       


        ORA-00600: internal error code, arguments: [kcoapl_blkchk], [1], [51720], [6401], [], [], [], []       





        ORACLE Instance oracle (pid = 6) - Error 607 encountered while recovering transaction (10, 37) on object 3.       


        Sat May 24 19:17:50 2014       


        Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc:       


        ORA-00607: Internal error occurred while making a change to a data block       


        ORA-00600: internal error code, arguments: [kcoapl_blkchk], [1], [51720], [6401], [], [], [], []       





        Recovery of Online Redo Log: Thread 1 Group 3 Seq 4 Reading mem 0       


                Mem# 0 errs 0: E:\ORACLE\ORADATA\ORACLE\REDO03.LOG       


        Sat May 24 19:18:59 2014       


        Errors in file e:\oracle\admin\oracle\udump\oracle_ora_4072.trc:       


        ORA-00600: 内部错误代码,参数: [12700], [18], [4246724], [2], [], [], [], []       





        Sat May 24 19:29:35 2014       


        Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc:       


        ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []       





        Sat May 24 19:38:19 2014       


        Errors in file e:\oracle\admin\oracle\udump\oracle_ora_2296.trc:       


        ORA-00600: 内部错误代码,参数: [12700], [18], [4246724], [2], [], [], [], []


可以看到虽然该数据库能正常open,但是不停的报ora-00600错误,这里主要涉及到3个internal错误,分别为如下:

ORA-00600: internal error code, arguments: [kcoapl_blkchk]
ORA-00600: internal error code, arguments: [6006]
ORA-00600: 内部错误代码,参数: [12700]

我们先来看第一个错误:[kcoapl_blkchk],从该错误来看,可以判断是block存在问题,而又异常的block为:file 1 block 51720

我们可以看到open后smon进程在进行事务rollback的时候,回滚(10, 37) 事务失败了,主要是操作对象object 3.

这里可以简单的解释一下:(10,37) 中,标示回滚段编号,37标示slot编号。

SQL> select * from v        $version        ;       





        BANNER       


        ----------------------------------------------------------------       


        Oracle9i Enterprise Edition Release 9.2.0.8.0 - Production       


        PL/SQL Release 9.2.0.8.0 - Production       


        CORE    9.2.0.8.0       Production       


        TNS         for         Linux: Version 9.2.0.8.0 - Production       


        NLSRTL Version 9.2.0.8.0 - Production       





        SQL> select owner,object_name,object_type from dba_objects where object_id=3;       





        OWNER                          OBJECT_NAME               OBJECT_TYPE       


        ------------------------------ ------------------------- ------------------       


        SYS                            I_OBJ#                    INDEX


我们可以确认,在Oracle 9i的环境中,object id=3 是i_obj#这个index。

我们从trace中搜索下,看下这个51720 的block有问题是什么问题 ?

buffer tsn: 0 rdba: 0x0040ca08 (1/51720)       


        scn: 0x0000.0a2c991d seq: 0x01 flg: 0x06 tail: 0x991d0601       


        frmt: 0x02 chkval: 0x221e type: 0x06=trans data       


        Hex dump of corrupt header 3 = CHKVAL       


        。。。。。       


        6BF8DFF0 00000000 00000000 00000000 991D0601  [................]       


        Block Checking: DBA = 4246024, Block Type = KTB-managed data block       


        **** actual rows locked by itl 2  = 1 != # in trans. header = 0       


        ---- end index block validation       


        rechecking block failed with error code 6401       





        Disk Block image:       


        buffer tsn: 0 rdba: 0x0040ca08 (1/51720)       


        scn: 0x0000.0a2c991d seq: 0x01 flg: 0x06 tail: 0x991d0601       


        frmt: 0x02 chkval: 0x221e type: 0x06=trans data       


        Block header dump:  0x0040ca08       


                Object id on Block? Y       


                seg/obj: 0x3  csc: 0x00.a2c991b  itc: 3  flg: O  typ: 2 - INDEX       


                fsl: 0  fnx: 0x0 ver: 0x01       





        Itl           Xid                  Uba         Flag  Lck        Scn/Fsc       


        0x01   0x0002.002.00020b74  0x0080080c.0f56.01  CB--    0  scn 0x0000.09cdb7c4       


        0x02   0x0009.007.00021f49  0x00800093.10c7.21  --U-    1  fsc 0x0000.0a2c991d       


        0x03   0x0009.023.00021f3d  0x00800093.10c7.1f  --U-    1  fsc 0x0012.0a2c991c       





        Leaf block dump       


        ===============       


        header address 104080500=0x6342474       


        kdxcolev 0       


        KDXCOLEV Flags = - - -       


        kdxcolok 0       


        kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y       


        kdxconco 1       


        kdxcosdc 0       


        kdxconro 160       


        kdxcofbo 356=0x164       


        kdxcofeo 3406=0xd4e       


        kdxcoavs 5096       


        kdxlespl 0       


        kdxlende 1       


        kdxlenxt 0=0x0       


        kdxleprv 4246023=0x40ca07       


        kdxledsz 8       


        kdxlebksz 8012


从上面的错误可以看出,实际上这个Index block中有2个需要操作的ITL,其LCK都为1. 而Oracle认为这个block
这里需要操作的ITL应该只有1个。 这应该就是掉电导致没写入更改的原因。

换句话,要处理这个坏块,我们只需要将第3个ITL的信息改掉就行了。 通过bbed可以很容易的处理,修改状态,LCK等信息即可。

下面我继续看第2个ora-00600 [6006]错误。 我以前讲过多次,对于Oracle ora-00600错误,后面第一个错误号的范围
是4000~8000,其都跟Oracle事务有关系。 这里也不例外。 从这里的错误来看,Oracle smon进程在恢复事务(10, 38) on object 36
时出现异常,进而抛出该错误。

1*         select         owner,object_name,object_type         from         dba_objects         where         object_id=36       


        SQL> /       





        OWNER                          OBJECT_NAME               OBJECT_TYPE       


        ------------------------------ ------------------------- ------------------       


        SYS                            I_OBJ1                            INDEX       





        SQL>


可以看到,9i中,object id=36 也是一个Index。 从这里的alert log来看,无法得到导致ora-00600 6006错误是什么block导致。

我们这里搜索smon trace即可,怎么搜索呢,很简单,搜索  seg/obj: 0×24即可,我们可以搜到如下内容:

----- end of leaf block dump -----       


        [1] Error during leaf key undo operation: 3 count=3       


        KTB Redo       


        op: 0x04  ver: 0x01       


        op: L  itl: xid:  0x000a.02b.00021f77 uba: 0x008000c9.1036.4d       


                flg: C---    lkc:  0     scn: 0x0000.0a2c96d4       


        Dump kdilk : itl=2, kdxlkflg=0x1 sdc=77869776 indexid=0x4000d9 block=0x0040cf0f       


        purge leaf row       


        (6):  05 c4 02 0d 15 16       


        dump block being looked at now       


        Block header dump:  0x0040cf0f       


                Object id on Block? Y       


                seg/obj: 0x24  csc: 0x00.a2ca278  itc: 3  flg: O  typ: 2 - INDEX       


                fsl: 0  fnx: 0x40cf10 ver: 0x01       





        Itl           Xid                  Uba         Flag  Lck        Scn/Fsc       


        0x01   0x000a.00c.0001fe24  0x00800272.0eee.01  CB--    0  scn 0x0000.098b9ed9       


        0x02   0x000a.00d.00021f8b  0x008000ce.1036.37  C---    0  scn 0x0000.0a2c9bfa       


        0x03   0x0004.01a.00021f16  0x00800964.104c.3c  C---    0  scn 0x0000.0a2c9c22       





        Leaf block dump       


        ===============       


        header address 1804271732=0x6b8b0074       


        kdxcolev 0       


        KDXCOLEV Flags = - - -       


        kdxcolok 0       


        kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y       


        kdxconco 1       


        kdxcosdc 0       


        kdxconro 223       


        kdxcofbo 482=0x1e2       


        kdxcofeo 3017=0xbc9       


        kdxcoavs 4411       


        kdxlespl 0       


        kdxlende 0       


        kdxlenxt 0=0x0       


        kdxleprv 4247310=0x40cf0e       


        kdxledsz 6       


        kdxlebksz 8012


根据这里的uba信息,我们还能搜索到回滚段的信息,如下;

index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num       


                ------------------------------------------------------------------------------------------------       


                0x00 9 0x00 0x21f89 0x0007 0x0000.0a2c9502 0x0080009c 0x0000.000.00000000 0x00000003 0x00000000       


                0x01    9    0x00  0x21f84  0x001e  0x0000.0a2c9542  0x0080009c  0x0000.000.00000000  0x00000001   0x00000000       


                。。。。。       


                0x24    9    0x00  0x21f82  0x0004  0x0000.0a2c949e  0x0080009a  0x0000.000.00000000  0x00000001   0x00000000       


                0x25   10    0x10  0x21f84  0x0001  0x0000.0a2c9771  0x008000ca  0x0000.000.00000000  0x00000001   0x00000000       


                0x26   10    0x90  0x21f84  0x0001  0x0000.0a2c976c  0x008000cc  0x0000.000.00000000  0x00000001   0x00000000       


                0x27    9    0x00  0x21f80  0xffff  0x0000.0a2dc56f  0x00000000  0x0000.000.00000000  0x00000000   0x00000000       


                0x28    9    0x00  0x21f87  0x0019  0x0000.0a2c976d  0x008000ca  0x0000.000.00000000  0x00000001   0x00000000       


                0x29    9    0x00  0x21f7d  0x000b  0x0000.0a2c976f  0x008000ca  0x0000.000.00000000  0x00000001   0x00000000       


                0x2a    9    0x00  0x21f7f  0x000f  0x0000.0a2c95c5  0x0080009c  0x0000.000.00000000  0x00000001   0x00000000       


                0x2b    9    0x00  0x21f77  0x0028  0x0000.0a2c96d4  0x008000cb  0x0000.000.00000000  0x00000002   0x00000000       


                0x2c    9    0x00  0x21f87  0x0018  0x0000.0a2c96a0  0x0080009e  0x0000.000.00000000  0x00000001   0x00000000       


                0x2d    9    0x00  0x21f74  0x0016  0x0000.0a2c9652  0x0080009e  0x0000.000.00000000  0x00000001   0x00000000       


                0x2e    9    0x00  0x21f7a  0x002a  0x0000.0a2c95b6  0x0080009c  0x0000.000.00000000  0x00000001   0x00000000       


                0x2f    9    0x00  0x21f85  0x001b  0x0000.0a2c9574  0x0080009c  0x0000.000.00000000  0x00000001   0x00000000       





        The buffer with tsn: 1 rdba: 0x00800099 (2/153) has already been dumped       





        The buffer with tsn: 1 rdba: 0x00800099 (2/153) has already been dumped       





        BH (0x6BBED3B8) file#: 2 rdba: 0x008000cc (2/204) class 36 ba: 0x6B8E4000       


                set: 3 dbwrid: 0 obj: -1 objn: 0       


                hash: [6c3ee6d0,6bbef05c] lru: [6b7ec6c8,6bbed578]       


                ckptq: [NULL] fileq: [NULL]       


                st: XCURRENT md: NULL rsop: 0x00000000 tch: 1       


                flags: gotten_in_current_mode       


                LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [255] RRBA: [0x0.0.0]       


                buffer tsn: 1 rdba: 0x008000cc (2/204)       


                scn: 0x0000.0a2c9856 seq: 0x01 flg: 0x04 tail: 0x98560201       


                frmt: 0x02 chkval: 0xb53d type: 0x02=KTU UNDO BLOCK       





        ********************************************************************************       


        UNDO BLK:       


        xid: 0x000a.026.00021f84  seq: 0x1036 cnt: 0x14  irb: 0x13  icl: 0x0   flg: 0x0000       





        Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset       


        ---------------------------------------------------------------------------       


        0x01 0x1f74     0x02 0x1ef8     0x03 0x1ea4     0x04 0x1e30     0x05 0x1db4       


        0x06 0x1d60     0x07 0x1cec     0x08 0x1c70     0x09 0x1c1c     0x0a 0x1ba8       


        0x0b 0x1b2c     0x0c 0x1ad8     0x0d 0x1a64     0x0e 0x19e8     0x0f 0x1994       


        0x10 0x1938     0x11 0x18ec     0x12 0x1884     0x13 0x1828     0x14 0x17bc       


        。。。。。。       


        。。。。。。       


        *-----------------------------       


        * Rec #0x12  slt: 0x26  objn: 18(0x00000012)  objd: 18  tblspc: 0(0x00000000)       


        *       Layer:  11 (Row)   opc: 1   rci 0x00       


        Undo type:  Regular undo    Begin trans    Last buffer split:  No       


        Temp Object:  No       


        Tablespace Undo:  No       


        rdba: 0x00000000       


        *-----------------------------       


        uba: 0x008000cc.1036.0d ctl max scn: 0x0000.0a2c930b prv tx scn: 0x0000.0a2c931e       


        KDO undo record:       


        KTB Redo       


        op: 0x04  ver: 0x01       


        op: L  itl: xid:  0x000a.02b.00021f77 uba: 0x008000c9.1036.4c       


                flg: C---    lkc:  0     scn: 0x0000.0a2c96d4       


        KDO Op code: DRP row dependencies Disabled       


                xtype: XA  bdba: 0x0040ccc4  hdba: 0x00400079       


        itli: 2  ispac: 0  maxfr: 4863       


        tabn: 0 slot: 41(0x29)       


        *-----------------------------       


        * Rec #0x13  slt: 0x26  objn: 36(0x00000024)  objd: 36  tblspc: 0(0x00000000)       


        *       Layer:  10 (Index)   opc: 22   rci 0x12       


        Undo type:  Regular undo   Last buffer split:  No       


        Temp Object:  No       


        Tablespace Undo:  No       


        rdba: 0x00000000       


        *-----------------------------       


        index undo for leaf key operations       


        KTB Redo       


        op: 0x04  ver: 0x01       


        op: L  itl: xid:  0x000a.02b.00021f77 uba: 0x008000c9.1036.4d       


                flg: C---    lkc:  0     scn: 0x0000.0a2c96d4       


        Dump kdilk : itl=2, kdxlkflg=0x1 sdc=77869776 indexid=0x4000d9 block=0x0040cf0f       


        purge leaf row       


        key <img class="wp-smiley" alt=":(" src="http://www.killdb.com/wp-includes/images/smilies/icon_sad.gif"> 6):  05 c4 02 0d 15 16       


        。。。。


从上面的信息回滚段头的dump内容可以看出,该回滚段涉及2个活动事务,事务槽编号为0×25,0×26 即:37,38。 这和前面的错误是符合的.

我们可以清楚的看到,这里的0×26的这个事务涉及的blockdump来看,LCK都是0,看上去没啥异常,为什么这个事务会有异常呢 ?

在Oracle中,smon 进行回滚操作,是以事务为单位进行的。对于undo而言,涉及到一个undo chain的结构。

关于Oracle undo chain,在我的Oracle特殊恢复课程里面讲过。

我们可以看出,0×26这个事务应该从0×13 这个record开始回滚,到0×12这个record这里就结束。从信息来看似乎也没有什么不对的地方?

那为什么这个事务会rollback失败呢 ?
大家注意看undo record 0×13,0×12的XID 信息:xid:  0x000a.02b.00021f77

关于XID的结构,在我的Oracle特殊恢复课程里面也有讲解。第2部分其实表是ktuxe结构中的index编号。第3部分标示ktuxe中的wrap#。

从这里看来,是Oracle没来得及更改block中的信息。因此这里我怀疑是undo有点问题,正常情况下,undo block中的xid的信息这里

应该会更改为0x21f84,同时ktuxe中的事务状态信息会更改,cflags的值也会更改为0×00. 这样才标示一个事务commit结束。
最后我们来看下ORA-00600: 内部错误代码,参数: [12700]这个错误。管哟12700错误,Oracle mos有一篇文档:

ERROR:       


                ORA-600 [12700] [a] [b] [c]       





        VERSIONS:       


                versions 6.0 to 9.2       





        DESCRIPTION:         





        Oracle is trying to access a row using its ROWID, which has been       


        obtained from an index.       





        in Oracle 8.x and 9.x, it is ORA-600 [12700][a][b][c] , where       


        Arg [a] dataobj# from sys.obj$       


                Arg [b] relative dba of the data block       


                Arg [c] slot number of the row in the data block


根据上面的解释来看下这个错误:
ORA-00600: 内部错误代码,参数: [12700], [18], [4246724], [2], [], [], [], []

我们可以看出,问题出在obj#=18 这个对象上的block(dba地址4246724)上的第2个ITL。

可以通过dbms包可以将该dba地址进行转换,我们来看下是什么block :

SQL>           select         dbms_utility.data_block_address_file(4246724) file_id,       


                2  dbms_utility.data_block_address_block(4246724) block_id         from         dual;       





        FILE_ID   BLOCK_ID       


        ---------- ----------       


                1      52420


我们搜索下trace,来看下该block的dump信息:

BH (0x6B7EAD88) file#: 1 rdba: 0x0040ccc4 (1/52420) class 1 ba: 0x6B47C000       


                set: 3 dbwrid: 0 obj: 18 objn: 18       


                hash: [67e8aa14,6bfee6d0] lru: [6b7edcd0,6b7ee36c]       


                LRU flags: hot_buffer       


                ckptq: [NULL] fileq: [NULL]       


                use: [67e3d1d0,67e3d1d0] wait: [NULL]       


                st: XCURRENT md: SHR rsop: 0x00000000 tch: 6       


                LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [255] RRBA: [0x0.0.0]       


                buffer tsn: 0 rdba: 0x0040ccc4 (1/52420)       


                scn: 0x0000.0a2ca3e0 seq: 0x01 flg: 0x04 tail: 0xa3e00601       


                frmt: 0x02 chkval: 0x2d3b type: 0x06=trans data       


        Block header dump:  0x0040ccc4       


                Object id on Block? Y       


                seg/obj: 0x12  csc: 0x00.a2c9c1d  itc: 2  flg: O  typ: 1 - DATA       


                fsl: 0  fnx: 0x40ccc2 ver: 0x01       





        Itl           Xid                  Uba         Flag  Lck        Scn/Fsc       


        0x01   0x0002.011.00021f3d  0x00800ace.1020.0a  C---    0  scn 0x0000.0a2c9c13       


        0x02   0x0004.01a.00021f16  0x00800964.104c.3b  --U-    2  fsc 0x0096.0a2c9c22       





        data_block_dump,data header at 0x6b47c05c       


        ===============       


        tsiz: 0x1fa0       


        hsiz: 0x96       


        pbl: 0x6b47c05c       


        bdba: 0x0040ccc4       


                76543210       


        flag=--------       


        ntab=1       


        nrow=66       


        frre=0       


        fsbo=0x96       


        fseo=0x6a2       


        avsp=0x1455       


        tosp=0x14ef       


        0xe:pti[0] nrow=66 offs=0       


        。。。。。


从dump来看,该block的第2个ITL 存在事务操作,锁定了2行记录. 这跟alert log的抛出的错误是符合的。

针对该错误,是Oracle读取时发现index和table的数据不一致导致的。针对表obj$,我们可以来看下报错的sql的执行计划:

Plan Table       


        --------       


        -------------------------------------------------------------------------------------------------------------------------       


        | Operation                      | Name               | Rows  | Bytes | Cost  |  TQ  |IN-OUT| PQ Distrib |Pstart| Pstop |       


        -------------------------------------------------------------------------------------------------------------------------       


        | SELECT STATEMENT               |                    |     0 |     0 |     0 |      |      |           |       |       |       


        |  SORT ORDER BY                 |                    |     0 |     0 |     0 |      |      |           |       |       |       


        |   FILTER                       |                    |     0 |     0 |     0 |      |      |           |       |       |       


        |    NESTED LOOPS                |                    |     0 |     0 |     0 |      |      |           |       |       |       


        |     NESTED LOOPS OUTER         |                    |     0 |     0 |     0 |      |      |           |       |       |       


        |      NESTED LOOPS              |                    |     0 |     0 |     0 |      |      |           |       |       |       


        |       NESTED LOOPS             |                    |     0 |     0 |     0 |      |      |           |       |       |       


        |        TABLE ACCESS FULL       | USER$              |     0 |     0 |     0 |      |      |           |       |       |       


        |        TABLE ACCESS BY INDEX R | OBJ$               |     0 |     0 |     0 |      |      |           |       |       |       


        |         INDEX RANGE SCAN       | I_OBJ2             |     0 |     0 |     0 |      |      |           |       |       |       


        |       TABLE ACCESS BY INDEX RO | SYN$               |     0 |     0 |     0 |      |      |           |       |       |       


        |        INDEX UNIQUE SCAN       | I_SYN1             |     0 |     0 |     0 |      |      |           |       |       |       


        |      TABLE ACCESS BY INDEX ROW | USER$              |     0 |     0 |     0 |      |      |           |       |       |       


        |       INDEX UNIQUE SCAN        | I_USER1            |     0 |     0 |     0 |      |      |           |       |       |       


        |     TABLE ACCESS CLUSTER       | USER$              |     0 |     0 |     0 |      |      |           |       |       |       


        |      INDEX UNIQUE SCAN         | I_USER#            |     0 |     0 |     0 |      |      |           |       |       |       


        |    TABLE ACCESS BY INDEX ROWID | OBJ$               |     0 |     0 |     0 |      |      |           |       |       |       


        |     INDEX RANGE SCAN           | I_OBJ2             |     0 |     0 |     0 |      |      |           |       |       |       


        -------------------------------------------------------------------------------------------------------------------------


很明显可以看出,这里对于obj$表的访问使用了2个Index,i_obj2. 看来问题就出在该对象之上。

这里其实可以用过如下命令来判断具体是什么行的信息不匹配导致:

analyze table obj$ validate structure cascade ;


定位到问题hang之后,由于这是bootstrap$的对象,因此无法通过在数据库open的时候进行rebuild 来进行解决。

可以通过如下2种方式来解决该问题:

1)   bbed modify index block
2) 通过bbed 将i_obj2这个index drop掉。