主库查看DG同步是否报错

set linesize 200;
col destination for a40;
select dest_id, destination, status, error, fail_date from v$archive_dest;

主库修改:

alter system set log_archive_config = 'DG_CONFIG=(stldb,stldg,stlrdg,tstldg)' sid = '*';
alter system set log_archive_dest_3= 'SERVICE=nlogdb LGWR ASYNC VALID_FOR=(ONLINE_LOGFILE,PRIMARY_ROLE) DB_UNIQUE_NAME=logdb' sid = '*';
alter system set fal_server= nlogdb sid = '*';
alter system set fal_client = logdg sid = '*';

先临时关闭archived_log的传输

alter system set log_archive_dest_state_3 = defer sid = '*';
alter system set log_archive_dest_state_3 = enable sid = '*';

备库修改

*.db_unique_name
*.standby_file_management='AUTO'
*.fal_server
alter system set db_file_name_convert='/oracle/app/oradata/zydb','/oracle/app/oradata/zydb' scope=spfile sid='*';
alter system set log_file_name_convert='/oracle/app/oradata/zydb','/oracle/app/oradata/zydb' scope=spfile sid='*';
select database_role,open_mode from v$database;

常见故障

密码文件问题

问题:密码文件不一致导致归档传输不到备库

解决方案:请确认密码文件是否正确

原则上二节点密码文件也要从重新一节点传过来

注意:12c版本及以上,rac密码文件存放ASM磁盘中,查询密码文件存放位置方法:

select * from v$passwordfile_info;

参考文档:SYS Password Management with RAC and Data Guard (Doc ID 1267828.1)

归档丢失

问题:归档丢失

备库报错:

Fatal NI connect error 12545, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=racscan)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcl)(CID=(PROGRAM=oracle)(HOST=orc
l)(USER=oracle))))
TNS-00515: Connect failed because target host or object does not exist
nt secondary err code: 111
nt OS err code: 0
Error 12545 received logging on to the standby
FAL[client, ARC0]: Error 12545 connecting to orcl for fetching gap sequence

解决方案:
如果存在归档备份,从备份中取出丢失的归档日志。
如果不存在归档备份,则可以增量恢复解决。如果数据库小,也可以重新搭建。

参考文档:
Steps to perform for Rolling Forward a Physical Standby Database using RMAN Incremental Backup. (Doc ID 836986.1)
Rolling a Standby Forward using an RMAN Incremental Backup To Fix The Nologging Changes (Doc ID 958181.1)
Steps to perform for Rolling forward a standby database using RMAN incremental backup when datafile is added to primary (Doc ID 1531031.1)

主库归档删除失败

问题:主库归档日志删除失败,导致主库归档空间使用率高

解决方案:检查主备网络是否中断,备库mrp进程同步是否正常,备库应用归档之后,主库即可删除归档

DG同步带宽占用高

问题:主库到备库使用同一个网卡,网卡带宽使用率较高,导致业务受影响

解决方案:主库采用单独网卡创建单独监听进行同步,不占用业务网络。或者开启归档传输的压缩功能

log_archive_dest_2='service=<service_name> compression=enable db_unique_name=<db_unique_name>'

备库出现UNNAME数据文件

问题:主库添加数据文件,但因备库的convert参数配置不正确,导致备库数据文件出现unname,导致mrp进程终止

解决方案:db_create_file_dest参数优先于convert,首先确保OMF配置正确。然后rename数据文件

SQL> alter database create datafile '<source location>\UNNAMED00005' as '<destination location>';

如果备库是ASM+OMF方式,采用如下命令:

SQL> alter database create datafile '<source location>\UNNAMED00005' as <'+ASMDISKGROUPNAME'> size <specify the size of datafile>;
or
SQL>alter database create datafile '<source location>\UNNAMED00005>' as new;

参考文档:Upon Primary Creation Of Tablespace/Add datafile, Standby MRP(Media Recovery) failing with ORA-01110,ORA-01565 and ORA-27037 (Doc ID 2811172.1)

案例

DG常见故障处理汇总_DG

解决方案

1、检查备库是否需要恢复文件

select * from v$recover_file where error like '%FILE%';

DG常见故障处理汇总_DG_02

2、根据步骤1查出来的文件号在主库确认数据文件

select file#,name from v$datafile where file# in (81);
81 +DATA/htfjy_rac/datafile/trade.361.1130771655

DG常见故障处理汇总_故障处理_03

3、识别在备库中创建的虚拟文件名

select file#,name from v$datafile where file# in (81);
81 /u01/app/oracle/product/11.2.0/db_1/dbs/UNNAMED00081

DG常见故障处理汇总_故障处理_04

4、检查没有运行MRP,并在备库重新创建数据文件

DG常见故障处理汇总_解决方案_05

alter database recover managed standby database cancel;
alter system set standby_file_management='MANUAL' scope=both;alter database create datafile '/u01/app/oracle/product/11.2.0/db_1/dbs/UNNAMED00081' as '/oradata/htfjy/trade.361.1130771655';
select * from v$recover_file where error like '%FILE%';
alter system set standby_file_management='AUTO' scope=both;

5、备库启动MRP

alter database recover managed standby database using current logfile disconnect from session;

6、备库检查状态

select process,client_process,thread#,sequence#,status from v$managed_standby;

索引块中的itl信息scn不匹配

问题:
ora-00600 [ktbdchk1: bad dscn]错误代表commit scn 大于block scn,该错误命中Oracle Bug 22241601 : ORA-600 [KDSGRP1] IN ADG AFTER FAILOVER。
该bug一般出现在dg switch切换后的primary 端或standby端,如果使用dbv检查会出现:itl [itlid] has commit scn higher than block scn,而实际上数据并没有真正损坏,只是在索引块中的itl信息scn不匹配导致。

解决方案:重建报错的索引。

预防的方案:
1)切换之前主备库配置如下参数:

alter system set "_ktb_debug_flags"=8 scope=both;

2)安装补丁Patch 22241601

参考文档:
On Standby, DBV Shows Pages Failing with Check Code 6056 (Doc ID 1523623.1)
ALERT Bug 22241601 ORA-600 [kdsgrp1] / ORA-1555 / ORA-600 [ktbdchk1: bad dscn] / ORA-600 [2663] due to Invalid Commit SCN in INDEX (Doc ID 1608167.1)

备库failover后Reinstate报错

问题:备库failover演练后,reinstate报错ORA-16795

DGMGRL> reinstate database db112
Reinstating database "db112", please wait...
Error: ORA-16795: the standby database needs to be re-created

Failed.
Reinstatement of database "db112" failed

解决方案:reinstate需要主备库都开启闪回。因为主库没有开启闪回导致,备库开启了闪回,手工闪回备库解决。

参考文档:
Step by Step Guide on How To Reinstate Failed Primary Database into Physical Standby (Doc ID 738642.1)
Failover/Reinstate within Dataguard configuration fails with ORA-16653 /ORA-16795 (Doc ID 1161094.1)

ADG备库Wrong Results

问题:ADG备库实时同步,但是备库select查询结果明显不对

解决方案:大多数为数据库Bug导致

参考文档:Known issues of Wrong Results in Active Data Guard(ADG) (Doc ID 2490469.1)

ADG备库ORA-01555

问题:ADG模式,备库查询报错ORA-01555

解决方案:大多数为数据库Bug导致

参考文档:Known issue of ORA-01555 in Active Dataguard (ADG) (Doc ID 1951318.1)