一套AIX上的10.2.0.3系统,在rman备份期间告警日志出现如下记录:
======================= alert log record ============================
Hex dump of (file 35, block 1087687) in trace file /oracle/product/10.2.0/admin/MS/udump/ms_ora_103548.trc
Corrupt block relative dba: 0x08d098c7 (file 35, block 1087687)
Fractured block found during backing up datafile
Data in bad block:
type: 6 format: 2 rdba: 0x08d098c7
last change scn: 0x0006.44443e06 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x6d910601
check value in block header: 0xc0b0
computed block checksum: 0x4286
Reread of blocknum=1087687, file=/dev/vx/rdsk/oradgMS/lv_ms_DB31. found valid data
=========== trace information for process 103548 ========================
Corrupt block relative dba: 0x08d098c7 (file 35, block 1087687)
Fractured block found during backing up datafile
Data in bad block:
type: 6 format: 2 rdba: 0x08d098c7
last change scn: 0x0006.44443e06 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x6d910601
check value in block header: 0xc0b0
computed block checksum: 0x4286
Reread of blocknum=1087687, file=/dev/vx/rdsk/oradgMS/lv_ms_DB31. found valid data
可以看到该块的type为6,下面列出了Oracle 9.2中已经存在的各种块类型:
Type |
Description |
1 |
KTU UNDO HEADER |
2 |
KTU UNDO BLOCK |
3 |
KTT SAVE UNDO HEADER |
4 |
KTT SAVE UNDO BLOCK |
5 |
DATA SEGMENT HEADER |
6 |
trans data |
7 |
Unknown |
8 |
Unknown |
9 |
Unknown |
10 |
DATA SEGMENT FREE LIST BLOCK |
11 |
Unknown |
12 |
DATA SEGMENT HEADER WITH FREE LIST BLOCKS |
13 |
Compatibility segment |
14 |
KTU UNDO HEADER W/UNLIMITED EXTENTS |
15 |
KTT SAVE UNDO HEADER W/UNLIMITED EXTENTS |
16 |
DATA SEGMENT HEADER - UNLIMITED |
17 |
DATA SEGMENT HEADER WITH FREE LIST BLKS - UNLIMITED |
18 |
EXTENT MAP BLOCK |
19 |
Unknown |
20 |
Unknown |
21 |
Unknown |
22 |
DATA SEGMENT FREE LIST BLOCK WITH FREE BLOCK COUNT |
23 |
BITMAPPED DATA SEGMENT HEADER |
24 |
BITMAPPED DATA SEGMENT FREELIST |
25 |
BITMAP INDEX BLOCK |
26 |
BITMAP BLOCK |
27 |
LOB BLOCK |
28 |
KTU BITMAP UNDO HEADER - LIMITED EXTENTS |
29 |
KTFB Bitmapped File Space Header |
30 |
KTFB Bitmapped File Space Bitmap |
31 |
TEMP INDEX BLOCK |
32 |
FIRST LEVEL BITMAP BLOCK |
33 |
SECOND LEVEL BITMAP BLOCK |
34 |
THIRD LEVEL BITMAP BLOCK |
35 |
PAGETABLE SEGMENT HEADER |
36 |
PAGETABLE EXTENT MAP BLOCK |
37 |
EXTENT MAP BLOCK OF SYSTEM MANAGED UNDO SEGMENT |
38 |
KTU SMU HEADER BLOCK |
39 |
Unknown |
40 |
PAGETABLE MANAGED LOB BLOCK |
41 |
Unknown |
42 |
Unknown |
43 |
Unknown |
44 |
Unknown |
45 |
Unknown |
46 |
Unknown |
47 |
Unknown |
表和索引的块均可能为type 6的trans data;也就是说rman在备份期间读取到该数据块,并在初次读取时发现该块断裂了(Fractured),但Fractured并不代表块就真的corrupted了;从告警日志看rman在初次读取发现该块Fractured后,又再次读取时发现数据块已经恢复正常(found valid data)。所以上述告警日志并代表所列出的数据块存在讹误,很有可能是该数据块所在数据文件在备份期间发生了剧烈的IO操作,当rman读取到该数据块时可能存储正在对其进行写的操作,所以rman在第一次读取时认为该快断裂了(Fractured);之后rman对该块进行reread发现"断裂"现象已不存在,而"Corrupt block"仅仅是一种假象;针对上述问题可以对表或索引进行进一步的analyze..validate操作以确保不存在坏块。
同时上述"Corrupt block误报"现象极有可能是因为在Rman备份期间个别数据文件的IO过于活跃所致(如频繁的dml操作),建议在磁盘活跃度低的时间段运行rman备份工作。
参考文档:
Fractured Block Messages in Alert.log During RMAN Backup of Datafile
* fact: Oracle Server - Enterprise Edition 8
* fact: Oracle Server - Enterprise Edition 9
* fact: Recovery Manager (RMAN)
* symptom: Fractured block found during backup up datafile
* symptom: Reread of blocknum found some corrupt data
* symptom: Analyze table validate structure cascade returns no errors
* change: NOTE ROLE: The messages are of the form Reread of blocknum=36256,
file=/pdscdata/pdsclive/data1/dispatch_data_large2. dbf.
found same corrupt data *** Corrupt block relative dba: 0xfc008dc0 (file 63, block 36288)
Fractured block found during backing up datafile Data in bad block -
type: 0 format: 0 rdba: 0x00000000 last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00 consistency
value in tail: 0x53494e53 check value in block header: 0x0, block checksum disabled
spare1: 0x0, spare2: 0x0, spare3: 0x0
* cause: RMAN backups of datafile are being performed while the datafile is involved in heavy I/O.
RMAN reads Oracle blocks from disk. If it finds that the block is fractured, which means it is being actively used,
it performs a reread of the block. If that fails again then the block is assumed to be corrupt.
By identifying the object that these blocks belong to by following Handling Oracle Block Corruptions in
Oracle7/8/8i and performing an analyze .. validate structure cascade on the object involved you can
confirm that the object is not corrupt.
fix:
Run the backups when the tablespace has less I/O activity.