GOLDENGATE:
database version:11.2.0.2 RAC ASM
goldengate version :11.2.1.0.1
GGSCI (DB2-PAN) 17> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT_D6 00:00:00 00:00:05
EXTRACT RUNNING EXT_E6 00:00:00 00:48:56
APPLIES TO:
Oracle GoldenGate - Version 11.1.1.0.6 and later
Information in this document applies to any platform.
***Checked for relevance on 02-January-2014***
SYMPTOMS:
When running Oracle Golden Gate 11.1.1.0.6 or higher, extract is "abending" every 4 hours on the hour. This approximates the same time or interval that Bounded Recovery is set to by default.
Extract can be restarted and continues to work but then fails again after 4 hours with the same errors as shown below.
ERROR
--------------------------------------------------
2014-01-02 18:34:56 INFO OGG-01478 Oracle GoldenGate Capture for Oracle, ext_e6.prm: Output file ./dirdat/la is using format RELEASE 11.2.
2014-01-02 18:34:56 INFO OGG-01026 Oracle GoldenGate Capture for Oracle, ext_e6.prm: Rolling over remote file ./dirdat/la000000.
2014-01-02 18:34:56 INFO OGG-01053 Oracle GoldenGate Capture for Oracle, ext_e6.prm: Recovery completed for target file ./dirdat/la000001, at RBA 1072.
2014-01-02 18:34:56 INFO OGG-01057 Oracle GoldenGate Capture for Oracle, ext_e6.prm: Recovery completed for all targets.
2014-01-02 18:34:56 INFO OGG-01517 Oracle GoldenGate Capture for Oracle, ext_e6.prm: Position of first record processed for Thread 2, Sequence 4, RBA 51202064, SCN 0.1425644, 2014-1-2 下午06:34:32.
2014-01-02 22:34:59 INFO OGG-01738 Oracle GoldenGate Capture for Oracle, ext_e6.prm: BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p16681_Redo Thread 1: start=SeqNo: 41, RBA: 66879504, SCN: 0.1559643 (1559643), Timestamp: 2014-01-02 22:34:57.000000, Thread: 1, end=SeqNo: 41, RBA: 66880000, SCN: 0.1559643 (1559643), Timestamp: 2014-01-02 22:34:57.000000, Thread: 1.
2014-01-02 22:34:59 INFO OGG-01738 Oracle GoldenGate Capture for Oracle, ext_e6.prm: BOUNDED RECOVERY: CHECKPOINT: for object pool 2: p16681_Redo Thread 2: start=SeqNo: 5, RBA: 51252240, SCN: 0.1559642 (1559642), Timestamp: 2014-01-02 22:34:56.000000, Thread: 2, end=SeqNo: 5, RBA: 51252736, SCN: 0.1559642 (1559642), Timestamp: 2014-01-02 22:34:56.000000, Thread: 2.
CAUSE:
Under these conditions, this may be a problem with the Bounded Recovery Checkpoint file. It is likely corrupted.
SOLUTION:
The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:
GGSCI> start <extract> BRRESET
REFERENCES:
=====================================
小结:早上上班例行巡检中一线同事告知OGG主备数据不一致,登陆服务器查看主库OGG抽取进程状态为running,但是chkpt时间持续48分钟没有更新,该进程属于hang死状态。查看MOS如上,通过start ext_e6 brreset 不能正常启动,最后通过手工 ps -ef | grep ext_e6,kill -9 ext_e6 系统进程号,然后 start ext_e6 启动进程成功。。
----thank you & best regards