OGG学习笔记03-单向复制简单故障处理
环境:参考:OGG学习笔记02-单向复制配置实例
实验目的:了解OGG简单故障的基本处理思路。
1. 故障现象
故障现象:启动OGG源端的extract进程,data pump进程,一段时间后发现进程均被终止。
GGSCI (oradb30) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:39:54
EXTRACT ABENDED LXJY1 00:00:00 47:40:00
GGSCI (oradb30) 2> start extract lxjy1
Sending START request to MANAGER ...
EXTRACT LXJY1 starting
GGSCI (oradb30) 3> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:40:50
EXTRACT RUNNING LXJY1 00:00:00 47:40:55
GGSCI (oradb30) 4> start extract lpjy1
Sending START request to MANAGER ...
EXTRACT LPJY1 starting
GGSCI (oradb30) 5> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING LPJY1 00:00:00 47:40:58
EXTRACT RUNNING LXJY1 00:00:00 47:41:04
GGSCI (oradb30) 6> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:41:15
EXTRACT RUNNING LXJY1 00:00:00 47:41:21
GGSCI (oradb30) 7> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:41:19
EXTRACT RUNNING LXJY1 00:00:00 47:41:25
GGSCI (oradb30) 8> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED LPJY1 00:00:00 47:41:41
EXTRACT ABENDED LXJY1 00:00:00 47:41:47
2. 查看日志
查看ogg日志ggserr.log, 排查进程被终止的原因。
[ogg@oradb30 ogg]$ cd $GG_HOME
[ogg@oradb30 ogg]$ tail -200f ggserr.log
发现datapump进程lpjy1是因为连接不到目标OGG而终止;extract进程lxjy1是因为无法找到归档日志sequence 160 thread 1而终止。
2017-01-19 14:51:46 INFO OGG-00993 Oracle GoldenGate Capture for Oracle, lpjy1.prm: EXTRACT LPJY1 started.
2017-01-19 14:51:49 ERROR OGG-01224 Oracle GoldenGate Capture for Oracle, lpjy1.prm: TCP/IP error 113 (No route to host).
2017-01-19 14:51:49 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, lpjy1.prm: PROCESS ABENDING.
2017-01-19 14:52:28 ERROR OGG-00446 Oracle GoldenGate Capture for Oracle, lxjy1.prm: Could not find archived log for sequence 160 thread 1 under default destinations SQL <SELECT name FROM v$archived_log WHERE sequence# = :ora_seq_no AND thread# = :ora_thread AND resetlogs_id = :ora_resetlog_id AND archived = 'YES' AND deleted = 'NO' AND name not like '+%' AND standby_dest = 'NO' >, error retrieving redo file name for sequence 160, archived = 1, use_alternate = 0Not able to establish initial position for sequence 160, rba 7758352.
2017-01-19 14:52:28 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, lxjy1.prm: PROCESS ABENDING.
排查原因发现是归档日志被RMAN备份策略备份完成后删除了,既然有备份,那么下一步只需要从备份集中恢复日志中提示的sequence 160及其之后的日志即可。
这里,也说明配置OGG最好建议是归档模式,否则在这种目标端没有及时获取到源端在线日志的情况下,就没有办法继续应用了。
3. 解决问题
对于lxjy1进程(Extract),只需要从RMAN备份集中恢复sequence 160及其之后的归档日志:
$ rman target /
RMAN> restore archivelog from logseq 160;
然后再启动lxjy1进程。
对于lpjy1进程(Data Pump),只需要确认已经启动目标端OGG所在主机,网通,然后启动目标端数据库和目标OGG,并启动目标OGG的mgr进程,replicat进程即可。
最终确认源端和目标端ogg所有进程均正常running:
源端OGG:
GGSCI (oradb30) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING LPJY1 00:00:00 00:00:03
EXTRACT RUNNING LXJY1 00:00:00 00:00:00
目标端OGG:
GGSCI (oradb31) 1> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING RJY1 00:00:00 00:00:01
OGG学习笔记基础篇: