使用BBED跳过归档进行恢复
数据库启动异常,提示6号文件丢失
SQL> startup
ORACLE instance started.
Total System Global Area 776646656 bytes
Fixed Size 2257272 bytes
Variable Size 490737288 bytes
Database Buffers 281018368 bytes
Redo Buffers 2633728 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/t01.dbf'
分析过程:
1. 查看告警日志和trace
[oracle@centos6 trace]$ pwd
/u01/app/oracle/diag/rdbms/orcl/orcl/trace
[oracle@centos6 trace]$ cat alert_orcl.log
日志内容如下:
ALTER DATABASE OPEN
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_dbw0_3020.trc:
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/t01.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3042.trc:
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/t01.dbf'
ORA-1157 signalled during: ALTER DATABASE OPEN...
根据告警日志中的信息查看trace文件
SQL> select * from v$diag_info;
[oracle@centos6 ~]$ vim /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3042.trc
trace内容:
DDE: Problem Key 'ORA 1110' was flood controlled (0x1) (no incident)
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/t01.dbf'
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/t01.dbf'
基本可以确认,是6号文件丢失
2. 使用rman还原6号文件
--查看数据库状态
RMAN> report schema;
using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name ORCL
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 760 SYSTEM *** /u01/app/oracle/oradata/orcl/system01.dbf
2 590 SYSAUX *** /u01/app/oracle/oradata/orcl/sysaux01.dbf
3 105 UNDOTBS1 *** /u01/app/oracle/oradata/orcl/undotbs01.dbf
4 5 USERS *** /u01/app/oracle/oradata/orcl/users01.dbf
5 330 EXAMPLE *** /u01/app/oracle/oradata/orcl/example01.dbf
6 0 T *** /u01/app/oracle/oradata/orcl/t01.dbf
7 100 TBS1 *** /u01/app/oracle/oradata/orcl/tbs_1.dbf
8 100 TBS2 *** /u01/app/oracle/oradata/orcl/tbs2.dbf
9 1 UNDO1 *** /u01/app/oracle/oradata/orcl/undo01.dbf
List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 29 TEMP 32767 /u01/app/oracle/oradata/orcl/temp01.dbf
6号文件大小为0,说明该数据文件已经丢失。
--restore还原6号文件
RMAN> restore datafile 6;
Starting restore at 20-NOV-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=21 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00006 to /u01/app/oracle/oradata/orcl/t01.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/backupset/2016_10_23/o1_mf_nnndf_TAG20161023T102912_d0r83s29_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/backupset/2016_10_23/o1_mf_nnndf_TAG20161023T102912_d0r83s29_.bkp tag=TAG20161023T102912
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 20-NOV-16
3. 尝试打开数据库
SQL> select status from v$instance;
STATUS
------------
MOUNTED
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 6 needs media recovery
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/t01.dbf'
提示需要介质恢复
4. 查看数据文件头部和控制文件记录的检查点信息
控制文件:
SQL> select FILE#,CHECKPOINT_CHANGE# from v$datafile;
FILE# CHECKPOINT_CHANGE#
---------- ------------------
1 2190210
2 2190210
3 2190210
4 2190210
5 2190210
6 2190210
7 2190210
8 2190210
9 2190210
9 rows selected. #可见控制文件中记录的检查点信息是一致的
数据文件头部:
SQL> select FILE# ,CHECKPOINT_CHANGE# from v$datafile_header;
FILE# CHECKPOINT_CHANGE#
---------- ------------------
1 2190210
2 2190210
3 2190210
4 2190210
5 2190210
6 1301034 #此时oracle会取读6号文件1号块的信息
7 2190210
8 2190210
9 2190210
9 rows selected.
对比控制文件记录的检查点信息和数据文件头部的检查点信息,看两者是否一致。
数据文件头部的scn要比控制文件记录的scn要小,所以要利用归档日志应用日志将数据文件恢复到2190210这个时间点。
5. 使用recover命令恢复6号文件
SQL> recover datafile 6;
ORA-00279: change 1301034 generated at 10/23/2016 10:29:13 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_10_23/o1_mf_1_4_d0rb9tqo
_.arc
ORA-00280: change 1301034 for thread 1 is in sequence #4
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
提示要应用o1_mf_1_4_d0rb9tqo_.arc从1301034进行恢复
6号文件头部记录了检查点信息,包括时间(scn)和RBA,即日志地址,之前的日志就不需要再进行恢复。
--回车进行自动恢复 1322797
ORA-00279: change 1322797 generated at 10/23/2016 11:06:34 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_10_24/o1_mf_1_5_d0vhzoyw
_.arc
ORA-00280: change 1322797 for thread 1 is in sequence #5
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
检查点信息已经到了1322797,还是要比控制文件记录的要小。再从1322797应用归档进行恢复。
--1344052
ORA-00279: change 1344052 generated at 10/24/2016 16:01:57 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_10_24/o1_mf_1_6_d0vj3ffm
_.arc
ORA-00280: change 1344052 for thread 1 is in sequence #6
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
以下重复内容不再贴出来..
--恢复异常,25号归档日志文件丢失
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
ORA-00279: change 1928225 generated at 11/13/2016 11:30:53 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_11_13/o1_mf_1_25_d2htrdn
9_.arc
ORA-00280: change 1928225 for thread 1 is in sequence #25
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
ORA-00308: cannot open archived log
'/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_11_13/o1_mf_1_25_d2htrd
n9_.arc'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
6. 到指定目录下查看归档日志是否还在
[oracle@centos6 2016_11_13]$ ls
o1_mf_1_23_d2hqlxhy_.arc o1_mf_1_26_d2hwpw1f_.arc
o1_mf_1_24_d2hqmflr_.arc o1_mf_1_27_d2jcddcq_.arc
的确是缺少了25号归档日志文件
--再看此时数据文件头部的scn
SQL> select FILE# ,CHECKPOINT_CHANGE# from v$datafile_header;
FILE# CHECKPOINT_CHANGE#
---------- ------------------
1 2190210
2 2190210
3 2190210
4 2190210
5 2190210
6 1928225
7 2190210
8 2190210
9 2190210
9 rows selected.
还没有恢复到异常发生前的scn,如果此时直接打开数据库,后面的归档日志记录的信息就丢失了。
7. 修改数据文件头部的检查点信息,以跳过25号归档日志。将检查点信息指向26号日志。
--修改RBA,SCN
RBA(redo byte addres) à sequence number+block number+offset 61è62
SCN(system change number) 使数据文件头部的SCN从61号日志的开始SCN指向62号日志的开始SCN
8. 使用BBED修改6号文件头部的RBA,6号文件1号块
注:oracle11g offset 500-->RBA offset 484-->SCN
BBED> set file 6
FILE# 6
BBED> map /v --dump出1号块的结构
File: /u01/app/oracle/oradata/orcl/t01.dbf (6)
Block: 1 Dba:0x01800001
------------------------------------------------------------
Data File Header
struct kcvfh, 860 bytes @0
struct kcvfhbfh, 20 bytes @0
struct kcvfhhdr, 76 bytes @20
ub4 kcvfhrdb @96
struct kcvfhcrs, 8 bytes @100
ub4 kcvfhcrt @108
ub4 kcvfhrlc @112
struct kcvfhrls, 8 bytes @116
ub4 kcvfhbti @124
struct kcvfhbsc, 8 bytes @128
ub2 kcvfhbth @136
ub2 kcvfhsta @138
struct kcvfhckp, 36 bytes @484
ub4 kcvfhcpc @140
ub4 kcvfhrts @144
ub4 kcvfhccc @148
struct kcvfhbcp, 36 bytes @152
ub4 kcvfhbhz @312
struct kcvfhxcd, 16 bytes @316
sword kcvfhtsn @332
ub2 kcvfhtln @336
text kcvfhtnm[30] @338
ub4 kcvfhrfn @368
struct kcvfhrfs, 8 bytes @372
ub4 kcvfhrft @380
struct kcvfhafs, 8 bytes @384
ub4 kcvfhbbc @392
ub4 kcvfhncb @396
ub4 kcvfhmcb @400
ub4 kcvfhlcb @404
ub4 kcvfhbcs @408
ub2 kcvfhofb @412
ub2 kcvfhnfb @414
ub4 kcvfhprc @416
struct kcvfhprs, 8 bytes @420
struct kcvfhprfs, 8 bytes @428
ub4 kcvfhtrt @444
ub4 tailchk @8188
BBED> p kcvfhckp --484号检查点的结构
struct kcvfhckp, 36 bytes @484
struct kcvcpscn, 8 bytes @484
ub4 kscnbas @484 0x001d6c21 –低位4个字节
ub2 kscnwrp @488 0x0000 --高位两个字节
ub4 kcvcptim @492 0x374d2ced
ub2 kcvcpthr @496 0x0001
union u, 12 bytes @500
struct kcvcprba, 12 bytes @500
ub4 kcrbaseq @500 0x00000019
ub4 kcrbabno @504 0x00000002
ub2 kcrbabof @508 0x0000
ub1 kcvcpetb[0] @512 0x02
ub1 kcvcpetb[1] @513 0x00
ub1 kcvcpetb[2] @514 0x00
ub1 kcvcpetb[3] @515 0x00
ub1 kcvcpetb[4] @516 0x00
ub1 kcvcpetb[5] @517 0x00
ub1 kcvcpetb[6] @518 0x00
ub1 kcvcpetb[7] @519 0x00
SQL> select to_number('19','xx') fromdual; --25号文件
TO_NUMBER('19','XX')
--------------------
25
也就是从25号文件,2号块,偏移量为0的位置开始恢复。
只需要把25改为26即可,02不需修改,因为日志文件的第一个块是文件头,也就是从2号文件开始写日志的,恢复的时候也是从2号块开始恢复。
将19改为1a
BBED> d /v offset 500 count 16
File: /u01/app/oracle/oradata/orcl/t01.dbf (6)
Block: 1 Offsets: 500 to 515 Dba:0x01800001
-------------------------------------------------------
19000000 02000000 00009f98 02000000 l ................
BBED> modify /x 1a offset 500
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
File: /u01/app/oracle/oradata/orcl/t01.dbf (6)
Block: 1 Offsets: 500 to 515 Dba:0x01800001
------------------------------------------------------------------------
1a000000 02000000 00009f98 02000000
<32 bytes per line>
BBED> sum apply
Check value for File 6, Block 1:
current = 0xea80, required = 0xea80
9. 修改SCN 6号文件1号块
struct kcvcpscn, 8 bytes @484
ub4 kscnbas @484 0x001d6c21 #低位4个字节
ub2 kscnwrp @488 0x0000 #高位2个字节
接下来要推测26号日志开始的SCN,可以从控制文件中记录的V$LOG_HISTORY查看
SQL> select SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE# from v$log_history;
SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
20 1700044 1721365
21 1721365 1754090
22 1754090 1791173
23 1791173 1829681
24 1829681 1928225
25 1928225 2023273
26 2023273 2046588
这样就可以去确定,将1028225改为2023273
1D6C21(21 6c 1d)---》à1edf69(69df 1e) #此时可以用计算器转换,因为数据库未打开,不能使用函数转换
通过将6号文件头部的SCN转换也可以验证这一点。
SQL> select to_number('1d6c21','xxxxxx') from dual;
TO_NUMBER('1D6C21','XXXXXX')
----------------------------
1928225
BBED> d /v offset 484 count 16
File: /u01/app/oracle/oradata/orcl/t01.dbf (6)
Block: 1 Offsets: 484 to 499 Dba:0x01800001
-------------------------------------------------------
216c1d00 00000000 ed2c4d37 01000000 l !l......?M7....
<16 bytes per line>
注意,因为这里linux使用的是小端,number类型为倒置存储
BBED> modify /x 69 offset 484
File: /u01/app/oracle/oradata/orcl/t01.dbf (6)
Block: 1 Offsets: 484 to 499 Dba:0x01800001
------------------------------------------------------------------------
696c1d00 00000000 ed2c4d37 01000000
<32 bytes per line>
BBED> modify /x df offset 485
File: /u01/app/oracle/oradata/orcl/t01.dbf (6)
Block: 1 Offsets: 485 to 500 Dba:0x01800001
------------------------------------------------------------------------
df1d0000 000000ed 2c4d3701 0000001a
<32 bytes per line>
BBED> modify /x 1e offset 486
File: /u01/app/oracle/oradata/orcl/t01.dbf (6)
Block: 1 Offsets: 486 to 501 Dba:0x01800001
------------------------------------------------------------------------
1e000000 0000ed2c 4d370100 00001a00
<32 bytes per line>
BBED> sum apply;
Check value for File 6, Block 1:
current = 0x59cb, required = 0x59cb
10. 恢复6号文件,看是否能够恢复
SQL> recover datafile 6
ORA-00279: change 2023273 generated at 11/13/2016 11:30:53 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_11_13/o1_mf_1_26_d2hwpw1
f_.arc
ORA-00280: change 2023273 for thread 1 is in sequence #26
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
可以看到此时已经从26号日志开始恢复了
SQL> recover datafile 6
ORA-00279: change 2096676 generated at 11/16/2016 18:54:49 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_11_19/o1_mf_1_29_d2zk84c
z_.arc
ORA-00280: change 2096676 for thread 1 is in sequence #29
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
Log applied.
Media recovery complete.
SQL> alter database open;
Database altered.
数据库可以打开了。此时最好尽快做一个全库备份
==============================================================================================================================
附上模拟故障的脚本:
1. 移除数据文件 t01.dbf
[oracle@centos6 orcl]$ ls
control01.ctl redo01.log redo03.log system01.dbf tbs_1.dbf temp01.dbf undotbs01.dbf
example01.dbf redo02.log sysaux01.dbf t01.dbf tbs2.dbf undo01.dbf users01.dbf
[oracle@centos6 orcl]$ mkdir bak
[oracle@centos6 orcl]$ mv t01.dbf bak/t01.dbf
2. 查看最近的备份文件
RMAN> list backup of database;
using target database control file instead of recovery catalog
List of Backup Sets
===================
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
4 Full 1.12G DISK 00:00:19 23-OCT-16
BP Key: 4 Status: AVAILABLE Compressed: NO Tag: TAG20161023T102912
Piece Name: /u01/app/oracle/fast_recovery_area/ORCL/backupset/2016_10_23/o1_mf_nnndf_TAG20161023T102912_d0r83s29_.bkp
List of Datafiles in backup set 4
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
1 Full 1301034 23-OCT-16 /u01/app/oracle/oradata/orcl/system01.dbf
2 Full 1301034 23-OCT-16 /u01/app/oracle/oradata/orcl/sysaux01.dbf
3 Full 1301034 23-OCT-16 /u01/app/oracle/oradata/orcl/undotbs01.dbf
4 Full 1301034 23-OCT-16 /u01/app/oracle/oradata/orcl/users01.dbf
5 Full 1301034 23-OCT-16 /u01/app/oracle/oradata/orcl/example01.dbf
6 Full 1301034 23-OCT-16 /u01/app/oracle/oradata/orcl/t01.dbf
最新的备份为10月26号
3. 查看归档日志
[oracle@centos6 ~]$ cd /u01/app/oracle/fast_recovery_area/ORCL/archivelog
[oracle@centos6 archivelog]$ ls
2016_10_16 2016_10_24 2016_11_01 2016_11_03 2016_11_06 2016_11_12 2016_11_16 2016_11_20
2016_10_23 2016_10_29 2016_11_02 2016_11_05 2016_11_09 2016_11_13 2016_11_19
4. 移除2016_11_13的其中一个归档日志文件
[oracle@centos6 archivelog]$ cd 2016_11_13
[oracle@centos6 2016_11_13]$ ls
o1_mf_1_23_d2hqlxhy_.arc o1_mf_1_25_d2htrdn9_.arc o1_mf_1_27_d2jcddcq_.arc
o1_mf_1_24_d2hqmflr_.arc o1_mf_1_26_d2hwpw1f_.arc
[oracle@centos6 2016_11_13]$ mv o1_mf_1_25_d2htrdn9_.arc o1_mf_1_25_d2htrdn9_.arc.bak
5. 关闭数据库,重新打开
SQL> startup
ORACLE instance started.
Total System Global Area 776646656 bytes
Fixed Size 2257272 bytes
Variable Size 490737288 bytes
Database Buffers 281018368 bytes
Redo Buffers 2633728 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '/u01/app/oracle/oradata/orcl/t01.dbf'