最近ogg遇到个队列文件损坏故障,之前的文档是通过跳过事务实现的

可在某些情况下不能够跳过事务,本次重新指定scn号抽取,scn就按照时间最早的scn去抽取。

2022-12-15 10:10:25  ERROR   OGG-02191  Incompatible record 104 in /OGG_DATA/ogg/dirdat/mod/cp000181953, rba 31,813 when getting trail header.

首先模拟一下故障,dp1当前是故障状态

GGSCI (itpuxhsdb52 as ogg@fghsdb) 71> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
EXTRACT     ABENDED     DP1         00:00:00      00:58:49     ----模拟文件损坏
EXTRACT     RUNNING     EXT2        00:00:00      00:00:00

重新抽取的的命令很多,参考help alter ext

GGSCI (itpuxhsdb152) 12> help alter ext   
Use ALTER EXTRACT for the following purposes:  
* To change the attributes of an Extract group created with the ADD  
EXTRACT command.  
* To increment a trail to the next file in the sequence.  
* To upgrade to an integrated capture configuration.  
* To downgrade from an integrated capture configuration.  
* To position any given IBM for i journal at a specific journal  
sequence number.  
* To position an Extract for SQL Server to begin at a specific LSN. 
Before using this command, stop Extract with the STOP EXTRACT  
group_name command.  
Syntax  
ALTER EXTRACT group_name 
    [, BEGIN {NOW | yyyy-mm-dd[ hh:mi:[ss[.cccccc]]]}] 
    [, EXTSEQNO sequence-number] 
    [, EXTRBA offset-number] 
    [, TRANLOG LRI LRI_number] 
    [, UPGRADE INTEGRATED TRANLOG] 
    [, DOWNGRADE INTEGRATED TRANLOG [THREADS number]] 
    [, THREAD number] 
    [, LSN value] 
    [, SCN value] 
    [, ETROLLOVER] 
    [, INFO EXTRACT group-name]

我们一般采取指定scn的形式重新指定抽取位置,本次遇到的是Integrated集成模式,需要注意重新指定的scn不能小于Integrated Extract outbound server first scn,这个在帮助文档里也有说明

GGSCI (itpuxhsdb52 as ogg@fghsdb) 72> info EXT2 detail
EXTRACT    EXT2      Last Started 2022-12-19 14:18   Status RUNNING
Checkpoint Lag       00:00:03 (updated 00:00:09 ago)
Process ID           43093
Log Read Checkpoint  Oracle Integrated Redo Logs
                     2022-12-19 14:19:21
                     SCN 0.1176912 (1176912)
  Target Extract Trails:
  Trail Name                                       Seqno        RBA     Max MB Trail Type
  ./dirdat/e2                                         12     126741        200 EXTTRAIL  
Integrated Extract outbound server first scn: 0.1103341 (1103341)   ------注意此处的scn号
Integrated Extract outbound server filtering start scn: 0.1103341 (1103341)

ogg集成模式队列文件损坏处理一_oracle

当我们指定的scn号小于此scn时会报错,如下:

GGSCI (itpuxhsdb52 as ogg@fghsdb) 67> alter ext2 scn 1103340
ERROR: Extract EXT2 can not be positioned to SCN 1103340.
GGSCI (itpuxhsdb52 as ogg@fghsdb) 68> alter ext2 scn 1103341
EXTRACT altered.

开始处理故障

GGSCI (itpuxhsdb52 as ogg@fghsdb) 73> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
EXTRACT     ABENDED     DP1         00:00:00      01:01:43    
EXTRACT     RUNNING     EXT2        00:00:03      00:00:03    
GGSCI (itpuxhsdb52 as ogg@fghsdb) 74> stop EXT2
Sending STOP request to EXTRACT EXT2 ...
Request processed.
GGSCI (itpuxhsdb52 as ogg@fghsdb) 76> alter EXT2 ETROLLOVER   --生一个新文件
2022-12-19 14:22:06  INFO    OGG-01520  Rollover performed.  For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file;  it will not happen automatically.
EXTRACT altered.
GGSCI (itpuxhsdb52 as ogg@fghsdb) 77> alter EXT2 tranlog scn 1103341   --rac需要指定2个线程1
EXTRACT altered.
GGSCI (itpuxhsdb52 as ogg@fghsdb) 78> info EXT2
EXTRACT    EXT2      Initialized   2022-12-19 13:47   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:42:09 ago)
Log Read Checkpoint  Oracle Integrated Redo Logs
                     First Record       
                     SCN 0.1103341 (1103341)
GGSCI (itpuxhsdb52) 4> alter DP1 extseqno 13 extrba 0  --重新指定dp1序列号和rba号
EXTRACT altered.
GGSCI (itpuxhsdb52) 5> info DP1
EXTRACT    DP1       Initialized   2022-12-19 14:33   Status STOPPED
Checkpoint Lag       00:00:00 (updated 00:00:02 ago)
Log Read Checkpoint  File /home/oracle/goldengate/dirdat/e2000000013
                     First Record  RBA 0
GGSCI (itpuxhsdb52) 6> start DP1
Sending START request to MANAGER ...
EXTRACT DP1 starting
GGSCI (itpuxhsdb52) 7> info DP1
EXTRACT    DP1       Last Started 2022-12-19 14:33   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:07 ago)
Process ID           43532
Log Read Checkpoint  File /home/oracle/goldengate/dirdat/e2000000013
                     2022-12-19 13:48:36.000000  RBA 126682

查看源端,此时会新生成一个trail文件

[oracle@itpuxhsdb52:/home/oracle/goldengate/dirdat]$ll e2*
-rw-r----- 1 oracle oinstall   8104 Dec 16 14:11 e2000000000
-rw-r----- 1 oracle oinstall  23646 Dec 16 15:50 e2000000001
-rw-r----- 1 oracle oinstall   8164 Dec 16 15:54 e2000000002
-rw-r----- 1 oracle oinstall   8164 Dec 16 15:57 e2000000003
-rw-r----- 1 oracle oinstall  13326 Dec 16 16:10 e2000000004
-rw-r----- 1 oracle oinstall  13326 Dec 16 16:12 e2000000005
-rw-r----- 1 oracle oinstall  23648 Dec 16 17:45 e2000000006
-rw-r----- 1 oracle oinstall  49450 Dec 16 21:18 e2000000007
-rw-r----- 1 oracle oinstall   1437 Dec 19 12:55 e2000000008
-rw-r----- 1 oracle oinstall  18487 Dec 19 13:14 e2000000009
-rw-r----- 1 oracle oinstall  38946 Dec 19 13:48 e2000000010
-rw-r----- 1 oracle oinstall 121522 Dec 19 14:18 e2000000011
-rw-r----- 1 oracle oinstall 126741 Dec 19 14:18 e2000000012
-rw-r----- 1 oracle oinstall 126682 Dec 19 14:31 e2000000013 <<<<<<新文件

查看目标端,注意源端和目标端的文件号不一定一致

[oracle@itpuxhsdb152:/home/oracle/ogg/dirdat]$ll
total 184
-rw-r----- 1 oracle oinstall 34458 Dec 16 15:59 e2000000000
-rw-r----- 1 oracle oinstall 24008 Dec 16 16:13 e2000000001
-rw-r----- 1 oracle oinstall 13436 Dec 16 17:47 e2000000002
-rw-r----- 1 oracle oinstall 39238 Dec 16 21:10 e2000000003
-rw-r----- 1 oracle oinstall  1547 Dec 16 21:11 e2000000004
-rw-r----- 1 oracle oinstall  8216 Dec 16 21:16 e2000000005
-rw-r----- 1 oracle oinstall  8214 Dec 19 13:02 e2000000006
-rw-r----- 1 oracle oinstall  1547 Dec 19 13:09 e2000000007
-rw-r----- 1 oracle oinstall  1547 Dec 19 13:17 e2000000008
-rw-r----- 1 oracle oinstall 18597 Dec 19 13:19 e2000000009
-rw-r----- 1 oracle oinstall  8215 Dec 19 13:19 e2000000010
[oracle@itpuxhsdb152:/home/oracle/ogg/dirdat]$ll
total 376
-rw-r----- 1 oracle oinstall  34458 Dec 16 15:59 e2000000000
-rw-r----- 1 oracle oinstall  24008 Dec 16 16:13 e2000000001
-rw-r----- 1 oracle oinstall  13436 Dec 16 17:47 e2000000002
-rw-r----- 1 oracle oinstall  39238 Dec 16 21:10 e2000000003
-rw-r----- 1 oracle oinstall   1547 Dec 16 21:11 e2000000004
-rw-r----- 1 oracle oinstall   8216 Dec 16 21:16 e2000000005
-rw-r----- 1 oracle oinstall   8214 Dec 19 13:02 e2000000006
-rw-r----- 1 oracle oinstall   1547 Dec 19 13:09 e2000000007
-rw-r----- 1 oracle oinstall   1547 Dec 19 13:17 e2000000008
-rw-r----- 1 oracle oinstall  18597 Dec 19 13:19 e2000000009
-rw-r----- 1 oracle oinstall   8215 Dec 19 14:34 e2000000010
-rw-r----- 1 oracle oinstall 124427 Dec 19 14:34 e2000000011 <<<新文件

目标端也要重新指定序列号和rba号

GGSCI (itpuxhsdb152) 1> info all
Program     Status      Group       Lag at Chkpt  Time Since Chkpt
MANAGER     RUNNING                                           
REPLICAT    STOPPED     RP2         00:00:00      01:24:03    
GGSCI (itpuxhsdb152) 2> info RP2
REPLICAT   RP2       Last Started 2022-12-19 13:13   Status STOPPED
INTEGRATED
Checkpoint Lag       00:00:00 (updated 01:24:10 ago)
Log Read Checkpoint  File /home/oracle/ogg/dirdat/e2000000008
                     2022-12-16 21:12:01.710203  RBA 1547
GGSCI (itpuxhsdb152) 3> alter RP2 extseqno 11 extrba 0
2022-12-19 14:39:29  INFO    OGG-06594  Replicat RP2 has been altered. Even the start up position might be updated, duplicate suppression remains active in next startup. To override duplicate suppression, start RP2 with NOFILTERDUPTRANSACTIONS option.
REPLICAT (Integrated) altered.
GGSCI (itpuxhsdb152) 4> info RP2
REPLICAT   RP2       Initialized   2022-12-19 14:39   Status STOPPED
INTEGRATED
Checkpoint Lag       00:00:00 (updated 00:00:05 ago)
Log Read Checkpoint  File /home/oracle/ogg/dirdat/e2000000011
                     First Record  RBA 0
GGSCI (itpuxhsdb152) 5> start RP2
Sending START request to MANAGER ...
REPLICAT RP2 starting

操作流程到此结束。