最近做了一次RMAN 0 级恢复测试,测试模拟了生产数据库发生灾难性故障,只剩下rman全备份的备份片,利用备份的spfile、控制文件、数据文件、归档日志恢复数据的过程。
首先说一下环境,网上很多文章都是互相粘贴,并不一定适用于你的测试环境。我这次测试的生产环境是2个节点的RAC,存储使用了ASM去管理,操作系统为RHEL6.4,Oracle11.2.0.4,rman每日全备份,使用全备份去恢复数据。恢复的机器选择了1台PC机,安装RHEL6.4,操作系统、Oracle版本均和服务器一致,区别在于不使用ASM,Oracle不是RAC而是单机。
本文简要描述恢复过程,主要阐述恢复过程中遇到的一些问题的原因和处理方法
准备工作
PC机安装操作系统、安装Oracle、建库、将生产服务器的备份数据导入到相同路径下
恢复过程
恢复顺序--》spfile--》创建相关的目录--》控制文件--》归档日志--》数据文件--》应用归档日志找到一致点--》alter database open resetlogs;--》重新创建redo日志组
RMAN> set DBID 2238896801 RMAN> startup nomount; RMAN> restore spfile from '/backup/oraclebak/c-2238896801-20140226-00'; # sqlplus / as sysdba SQL> create pfile from spfile; SQL> shutdown immediate;
首先设置生产库的DBID,在nomount状态恢复spfile,然后创建pfile,由于生产库是RAC,所以要去掉所有RAC的参数,不对的路径也要修改。(网上找来的,与下面的路径不符,见谅)
生产库pfile如下:
ccdb2.__db_cache_size=503316480 ccdb1.__db_cache_size=1140850688 ccdb2.__java_pool_size=16777216 ccdb1.__java_pool_size=16777216 ccdb2.__large_pool_size=16777216 ccdb1.__large_pool_size=16777216 ccdb2.__shared_pool_size=1056964608 ccdb1.__shared_pool_size=1996488704 ccdb2.__streams_pool_size=0 ccdb1.__streams_pool_size=33554432 *.audit_file_dest='/oracle/admin/ccdb/adump' *.background_dump_dest='/oracle/admin/ccdb/bdump' *.cluster_database_instances=2 *.cluster_database=true *.compatible='10.2.0.3.0' *.control_files='+ASMDISK/ccdb/controlfile/current.261.648572913','+ASMDISK/ccdb/controlfile/current.260.648572913' *.core_dump_dest='/oracle/admin/ccdb/cdump' *.db_block_size=8192 *.db_create_file_dest='+ASMDISK' *.db_domain='' *.db_file_multiblock_read_count=16 *.db_name='ccdb' *.db_recovery_file_dest='+ASMDISK' *.db_recovery_file_dest_size=10737418240 *.dispatchers='' ccdb1.instance_number=1 ccdb2.instance_number=2 *.job_queue_processes=10 ccdb2.local_listener='LISTENERS_CCDB2' ccdb1.local_listener='LISTENERS_CCDB1' *.log_archive_dest_1='LOCATION=+ASMDISK/ccdb/' *.log_archive_dest_2='LOCATION=/oralog/ccdb/' ccdb2.log_archive_dest_2='LOCATION=/oralog/ccdb2/' ccdb1.log_archive_dest_2='LOCATION=/oralog/ccdb1/' *.log_archive_format='%t_%s_%r.dbf' *.log_archive_local_first=TRUE *.log_archive_min_succeed_dest=2 *.open_cursors=300 *.pga_aggregate_target=1029701632 *.processes=1000 *.recyclebin='OFF' *.remote_listener='LISTENERS_CCDB' *.remote_login_passwordfile='exclusive' *.sessions=1105 *.sga_max_size=3221225472 *.sga_target=1610612736 ccdb1.sga_target=3221225472 ccdb2.thread=2 ccdb1.thread=1 *.undo_management='AUTO' ccdb2.undo_tablespace='UNDOTBS2' ccdb1.undo_tablespace='UNDOTBS1' *.user_dump_dest='/oracle/admin/ccdb/udump'
修改后的pfile如下:
*.audit_file_dest='/oracle/admin/ccdb/adump' *.background_dump_dest='/oracle/admin/ccdb/bdump' *.compatible='10.2.0.3.0' *.control_files='/oradata/ccdb/control01.dbf' *.core_dump_dest='/oracle/admin/ccdb/cdump' *.db_block_size=8192 *.db_domain='' *.db_file_multiblock_read_count=16 *.db_name='ccdb' *.db_recovery_file_dest_size=10737418240 *.dispatchers='' *.job_queue_processes=10 *.log_archive_dest_2='LOCATION=/oralog/ccdb/' *.log_archive_format='%t_%s_%r.dbf' *.log_archive_local_first=TRUE *.open_cursors=300 *.pga_aggregate_target=1029701632 *.processes=1000 *.recyclebin='OFF' *.remote_login_passwordfile='exclusive' *.sessions=1105 *.sga_max_size=3221225472 *.sga_target=1610612736 *.undo_management='AUTO' *.user_dump_dest='/oracle/admin/ccdb/udump'
注意几个地方,控制文件路径;sga、pga大小,如果生产服务器内存大,恢复测试的PC机内存较小注意大小要修改;其它修改了的路径如果没有的一定全部创建;
接下来恢复控制文件
RMAN> startup nomount; RMAN> restore controlfile from '/backup/oraclebak/c-2238896801-20140616-05'; 注意恢复时使用最新的控制文件备份 RMAN> alter database mount;
这里要注意,全备份时,一般控制文件是自动备份的,一般在备份的不同阶段会备份几次控制文件,所以一定要选择最后一次备份的控制文件用来还原。
并且备份control文件到trace,这一步主要是重建控制文件时使用
alter database backup controlfile to trace as '/u01/app/oracle/oradata/nxxedk/control_new.trc';
控制文件恢复后 rman下list backup应该已经可以看到控制文件了,这个时候可以先将归档日志从备份片中恢复出来
RMAN> list backup; RMAN> crosscheck backup; RMAN> delete noprompt expired backup; RMAN> list backup of archivelog all; 看到现在有哪些归档,把这些归档恢复到指定文件夹 run{ set archivelog destination to '/tmp/archivelog'; restore archivelog from logseq 26770 until logseq 26804; }
这注意一下,如果你在做恢复归档时遇到这个报错
Starting restore at 2012-08-06 14:40:12 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=1994 instance=ora2 devtype=DISK RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 08/06/2012 14:40:13 RMAN-20242: specification does not match any archive log in the recovery catalog
解决办法是,恢复归档时一定要加线程号
[oracle@p730a:/oracle/app/oracle]$rman target / Recovery Manager: Release 10.2.0.5.0 - Production on Mon Aug 6 14:42:18 2012 Copyright (c) 1982, 2007, Oracle. All rights reserved. connected to target database: ORA (DBID=2743318089) RMAN> restore archivelog from logseq 97249 until logseq 97254 thread 2; Starting restore at 2012-08-06 14:43:21 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=1727 instance=ora1 devtype=DISK archive log thread 2 sequence 97254 is already on disk as file +DGRECOVER/ora/archivelog/2012_08_05/thread_2_seq_97254.1943.790473651 channel ORA_DISK_1: starting archive log restore to default destination channel ORA_DISK_1: restoring archive log archive log thread=2 sequence=97249 channel ORA_DISK_1: restoring archive log archive log thread=2 sequence=97250
(上面的是在网页上粘来的,所以数据库版本、路径什么的都不一样了,见谅啊)
恢复完归档日志后可以开始恢复数据文件了,由于生产库使用的是ASM,所有数据文件的路径都要改
run { set newname for datafile 1 to '/u01/app/oracle/oradata/system.268.829235467'; set newname for datafile 2 to '/u01/app/oracle/oradata/sysaux.260.829235467'; set newname for datafile 3 to '/u01/app/oracle/oradata/undotbs1.263.829235469'; set newname for datafile 4 to '/u01/app/oracle/oradata/users.267.829235469'; set newname for datafile 5 to '/u01/app/oracle/oradata/example.264.829235561'; set newname for datafile 6 to '/u01/app/oracle/oradata/undotbs2.259.829235741'; set newname for datafile 7 to '/u01/app/oracle/oradata/xe_data01.dbf'; set newname for datafile 8 to '/u01/app/oracle/oradata/xe_data02.dbf'; set newname for datafile 9 to '/u01/app/oracle/oradata/xe_data03.dbf'; set newname for datafile 10 to '/u01/app/oracle/oradata/xe_data04.dbf'; set newname for datafile 11 to '/u01/app/oracle/oradata/xe_data05.dbf'; set newname for datafile 12 to '/u01/app/oracle/oradata/xe_data06.dbf'; set newname for datafile 13 to '/u01/app/oracle/oradata/xe_data07.dbf'; set newname for datafile 14 to '/u01/app/oracle/oradata/xe_data08.dbf'; set newname for datafile 15 to '/u01/app/oracle/oradata/xe_data09.dbf'; set newname for datafile 16 to '/u01/app/oracle/oradata/xe_data10.dbf'; set newname for datafile 17 to '/u01/app/oracle/oradata/xe_data11.dbf'; set newname for datafile 18 to '/u01/app/oracle/oradata/xe_data12.dbf'; set newname for datafile 19 to '/u01/app/oracle/oradata/xe_data13.dbf'; set newname for datafile 20 to '/u01/app/oracle/oradata/xe_data14.dbf'; set newname for datafile 21 to '/u01/app/oracle/oradata/xe_data15.dbf'; set newname for datafile 22 to '/u01/app/oracle/oradata/xe_data16.dbf'; set newname for datafile 23 to '/u01/app/oracle/oradata/xe_data17.dbf'; set newname for datafile 24 to '/u01/app/oracle/oradata/xe_data18.dbf'; set newname for datafile 25 to '/u01/app/oracle/oradata/xe_data19.dbf'; set newname for datafile 26 to '/u01/app/oracle/oradata/xe_data20.dbf'; set newname for datafile 27 to '/u01/app/oracle/oradata/nxxedk01.dbf'; set newname for datafile 28 to '/u01/app/oracle/oradata/nxxedk02.dbf'; restore database; switch datafile all; }
我这数据文件很大,有近500G,而且是恢复到一台PC机上,SATA 5400转的硬盘,写速度也就50M/秒吧,这一步用了2个多小时。
这里还有一点,如果你之前做过一次恢复测试,就是已经执行过alter database open resetlogs;了,第二次做恢复测试时没有重新安装系统的话,那么这一步很用可能会出现没有找到数据文件备份的提示,原理见http://docs.oracle.com/cd/B19306_01/backup.102/b14192/flashptr006.htm#sthref660
解决办法是重建一下库并且把快速恢复区内东西全部rm -rf了,否则oracle会用catalog而不是用控制文件去恢复。
数据文件恢复完成后,看网上的文档都说要重建控制文件,但是我实际测试后发现重建控制文件后在应用归档日志时有问题,不重建控制文件也是可以的,还没搞清楚是怎么回事,我把重建控制文件的方法贴一下,各位参考
CREATE CONTROLFILE REUSE DATABASE "nxxedk" RESETLOGS FORCE LOGGING ARCHIVELOG MAXLOGFILES 16 MAXLOGMEMBERS 3 MAXDATAFILES 100 MAXINSTANCES 8 MAXLOGHISTORY 292 LOGFILE GROUP 1 '/u01/app/oracle/oradata/REDO01.LOG' SIZE 50M, GROUP 2 '/u01/app/oracle/oradata/REDO02.LOG' SIZE 50M, GROUP 3 '/u01/app/oracle/oradata/REDO03.LOG' SIZE 50M -- STANDBY LOGFILE DATAFILE '/u01/app/oracle/oradata/system.268.829235467', '/u01/app/oracle/oradata/sysaux.260.829235467', '/u01/app/oracle/oradata/undotbs1.263.829235469', '/u01/app/oracle/oradata/users.267.829235469', '/u01/app/oracle/oradata/example.264.829235561', '/u01/app/oracle/oradata/undotbs2.259.829235741', '/u01/app/oracle/oradata/xe_data01.dbf', '/u01/app/oracle/oradata/xe_data02.dbf', '/u01/app/oracle/oradata/xe_data03.dbf', '/u01/app/oracle/oradata/xe_data04.dbf', '/u01/app/oracle/oradata/xe_data05.dbf', '/u01/app/oracle/oradata/xe_data06.dbf', '/u01/app/oracle/oradata/xe_data07.dbf', '/u01/app/oracle/oradata/xe_data08.dbf', '/u01/app/oracle/oradata/xe_data09.dbf', '/u01/app/oracle/oradata/xe_data10.dbf', '/u01/app/oracle/oradata/xe_data11.dbf', '/u01/app/oracle/oradata/xe_data12.dbf', '/u01/app/oracle/oradata/xe_data13.dbf', '/u01/app/oracle/oradata/xe_data14.dbf', '/u01/app/oracle/oradata/xe_data15.dbf', '/u01/app/oracle/oradata/xe_data16.dbf', '/u01/app/oracle/oradata/xe_data17.dbf', '/u01/app/oracle/oradata/xe_data18.dbf', '/u01/app/oracle/oradata/xe_data19.dbf', '/u01/app/oracle/oradata/xe_data20.dbf', '/u01/app/oracle/oradata/nxxedk01.dbf', '/u01/app/oracle/oradata/nxxedk02.dbf' CHARACTER SET AL32UTF8 ;
重建控制文件要在nomount下
然后开始recover数据库
recover database using backup controlfile until cancel;
会提示需要的SCN号的归档日志,应该都在刚刚恢复归档日志的路径下面,可以直接贴上绝对路径,会一路提示需要的归档
这里说一下recover的4种用法
recover database using backup controlfile recover database until cancel recover database using backup controlfile until cancel; recover database until cancel using backup controlfile;
区别:
1. 如果丢失丢失当前控制文件,用冷备份的控制文件恢复的时候。用来告诉oracle,不要以controlfile中的scn作为恢复的终点;
2. 如果丢失current/active redo的时候。手动指定终点。
3. 如果 丢失当前controlfile并且current/active redo都丢失,会先去 自动 应用归档日志,可以实现最大的恢复;
4. 如果 丢失当前controlfile并且current/active redo都丢失,以旧的redo中的scn为恢复终点。因为没有应用归档日志,所有会丢失数据。
要理解recover database using backup controlfile,先理解 recover database,也就是说,不加using backup controlfile的情况。
在普通的recover database 或者 recover tablespace, recover datafile时, Oracle会以当前controlfile所纪录的SCN为准,利用archive log和 redo log的redo entry, 把相关的datafile 的 block恢复到“当前controlfile所纪录的SCN”
而某些情况下,Oracle需要把数据恢复到比当前controlfile所纪录的SCN还要靠后的位置(比如说,control file是backup controlfile , 或者 controlfile是根据trace create的。),这时候,就需要用using backup controlfile. 恢复就不会受“当前controlfile所纪录的SCN”的限制。
这时候的限制就来自于你的语句(until time , until scn),或者可用的archive log(until cancel) ...
恢复到你想要的时间点就可以open数据库了
RMAN>alter database open resetlogs;
打开库之前可以先处理一下REDO log
select thread#,group#,bytes/1024/1024||'M',status from v$log; GROUP# BYTES/1024/1024||'M' STATUS ---------- ----------------------------------------- ---------------- 1 1024M CLEARING 2 1024M CLEARING 3 1024M CLEARING 8 1024M CLEARING_CURRENT 5 1024M CLEARING 6 1024M CLEARING 7 1024M CLEARING 4 1024M CLEARING 8 rows selected.
可以全部drop掉
alter database drop logfile group 1; alter database drop logfile group 2; alter database drop logfile group 3; alter database drop logfile group 4; alter database clear logfile group 2; alter database clear logfile group 6; alter database clear logfile group 5; alter database clear logfile group 7; alter database clear logfile group 8; alter database clear logfile group 4;
如果遇到提示说thread2 最少需要2组日志,可以再open数据库后将thread 2 禁掉
SQL> alter database drop logfile group 4; alter database drop logfile group 4 * ERROR at line 1: ORA-01567: dropping log 4 would leave less than 2 log files for instance nxxedk2 (thread 2) ORA-00312: online log 4 thread 2: '/u01/app/oracle/oradata/REDO02.LOG' ORA-00312: online log 4 thread 2: '/u01/app/oracle/oradata/REDO04.LOG' 可以禁掉thread 2 alter database disable thread 2;
redo log改路径
alter database rename file '+DATA/nxxedk/onlinelog/group_2.log' to '/u01/app/oracle/oradata/REDO01.LOG' alter database rename file '+DATA/nxxedk/onlinelog/group_4.log' to '/u01/app/oracle/oradata/REDO02.LOG'; alter database rename file '+FRA/nxxedk/onlinelog/group_2.log' to '/u01/app/oracle/oradata/REDO03.LOG'; alter database rename file '+FRA/nxxedk/onlinelog/group_4.log' to '/u01/app/oracle/oradata/REDO04.LOG'; alter database rename file '+DATA/nxxedk/onlinelog/group_06.log' to '/u01/app/oracle/oradata/REDO05.LOG'; alter database rename file '+DATA/nxxedk/onlinelog/group_08.log' to '/u01/app/oracle/oradata/REDO06.LOG'; alter database rename file '+FRA/nxxedk/onlinelog/group_06.log' to '/u01/app/oracle/oradata/REDO07.LOG'; alter database rename file '+FRA/nxxedk/onlinelog/group_08.log' to '/u01/app/oracle/oradata/REDO08.LOG'; alter database rename file '+DATA/nxxedk/onlinelog/group_4.log' to '/u01/app/oracle/oradata/REDO02.LOG'; alter database rename file '+FRA/nxxedk/onlinelog/group_2.log' to '/u01/app/oracle/oradata/REDO03.LOG'; alter database rename file '+FRA/nxxedk/onlinelog/group_4.log' to '/u01/app/oracle/oradata/REDO04.LOG'; alter database rename file '+DATA/nxxedk/onlinelog/group_06.log' to '/u01/app/oracle/oradata/REDO05.LOG'; alter database rename file '+DATA/nxxedk/onlinelog/group_08.log' to '/u01/app/oracle/oradata/REDO06.LOG'; alter database rename file '+FRA/nxxedk/onlinelog/group_06.log' to '/u01/app/oracle/oradata/REDO07.LOG'; alter database rename file '+FRA/nxxedk/onlinelog/group_08.log' to '/u01/app/oracle/oradata/REDO08.LOG';
增加新的redo log
alter database add logfile group 1 '/u01/app/oracle/oradata/REDO01.LOG' size 50M; alter database add logfile group 2 '/u01/app/oracle/oradata/REDO02.LOG' size 50M; alter database add logfile group 3 '/u01/app/oracle/oradata/REDO03.LOG' size 50M;
最后还要处理一下临时表空间数据文件路径的问题
查看临时表空间 select file#,NAME,status,CREATION_TIME from v$tempfile; 删除临时表空间数据文件 alter database tempfile '+DATA/nxxedk/tempfile/xe_tmp01.dbf' offline; alter database tempfile '+DATA/nxxedk/tempfile/xe_tmp01.dbf' drop; alter database tempfile '+DATA/nxxedk/tempfile/nxtemp01.dbf' offline; alter database tempfile '+DATA/nxxedk/tempfile/nxtemp01.dbf' drop; 创建临时表空间数据文件 alter tablespace temp add tempfile '/u01/app/oracle/oradata/xe_tmp.dbf' size 20480m autoextend on; alter tablespace temp add tempfile '/u01/app/oracle/oradata/nxtemp.dbf' size 10240m autoextend on;
至此 恢复测试基本完成,实际操作时可能还会遇到很多其他问题,能力有限,做个记录而已。