OGG投递进程报错无法open文件,无法正常投递


1.1现象

之前有个客户遇到一个问题,OGG同步数据链路,突然有一天网络出现问题,导致OGG投递进程无法正常投递,无法写入目标端的该文件。

猜测是由于网络丢包等原因导致文件损坏,无法正常open,read,write. 解决方法,投递进程etrollover。

本篇文档是基于这种方式测试下etrollover 【测试没有完美还原网络的问题,只是对其进行了测试】

1.2测试OGG进程restart与seqno有什么关系?

    1. 1)OGG 同步表及进程参数查看
    SQL>select*from;2.         ID CC_NAME                        WITTIME
    3. ----------------------------------------------------------------------
    4. 2203-JUN-2002.34.37.000000 PM
    5.  
    6. GGSCI (t1)4> view param exta
    7. extract exta
    8. USERID ogg,PASSWORD ogg
    9. EXTTRAIL /u01/ogg/base/dirdat/ea
    10. table YZ.DD;
    11.  
    12. GGSCI (t1)5> view param dpea
    13. extract dpea
    14. rmthost 10.0.0.32,mgrport 7809, compress
    15. rmttrail /u01/ogg/base/dirdat/t1
    16. table YZ.B;
    17. table YZ.DD;
    18.  
    19. GGSCI (t1)7> info exta
    20. EXTRACT    EXTA      LastStarted2020-11-1011:05Status RUNNING
    21. CheckpointLag00:00:00(updated 00:00:08)
    22. Process10744
    23. LogReadCheckpointOracleRedoLogs
    24. 2020-11-1011:25:54Seqno353,3917824
    25. 0.3276594(3276594)
    26. GGSCI (t1)8> info dpea
    27. EXTRACT    DPEA      LastStarted2020-11-1011:05Status RUNNING
    28. CheckpointLag00:00:00(updated 00:00:09)
    29. Process10776
    30. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000067
    31. 2020-11-1011:05:01.6690871469
    32.  
    33. SQL>select*from;
    34.         ID CC_NAME                        WITTIME
    35. ----------------------------------------------------------------------
    36. 2203-JUN-2002.34.37.000000 PM
    37. GGSCI (t2)26> view param repa
    38. replicat repa
    39. userid ogg,password ogg
    40. assumetargetdefs
    41. HANDLECOLLISIONS
    42. discardfile  /u01/ogg/base/dirrpt/repa.dsc
    43. MAP YZ.DD ,TARGET BAK_YZ.DD;
    44.  
    45. GGSCI (t2)27> info repa
    46. REPLICAT   REPA      LastStarted2020-11-1011:20Status RUNNING
    47. CheckpointLag00:00:00(updated 00:00:09)
    48. Process11023
    49. LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000051
    50. 2020-11-1011:05:01.3137911563
    51.  
    52. 2)目标端OGG复制进程重启,复制进程对应的trail 文件seq不变
    53. GGSCI (t2)28> stop repa
    54. GGSCI (t2)29> start repa
    55.  
    56. 3)源端OGG投递进程重启,投递进程对应的trail 文件seq不变
    57. GGSCI (t1)9> stop dpea
    58. GGSCI (t1)10> start dpea
    59. GGSCI (t1)13> info dpea
    60. EXTRACT    DPEA      LastStarted2020-11-1011:30Status RUNNING
    61. CheckpointLag00:00:00(updated 00:00:04)
    62. Process11117
    63. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000067
    64. FirstRecord1469
    65.  
    66. 4)源端OGG抽取进程重启,抽取进程对应的trail 文件seq +1
    67. GGSCI (t1)15>,detail
    68. EXTRACT    EXTA      LastStarted2020-11-1011:05Status RUNNING
    69. CheckpointLag00:00:00(updated 00:00:09)
    70. Process10744
    71. LogReadCheckpointOracleRedoLogs
    72. 2020-11-1011:30:15Seqno353,3919360
    73. 0.3276690(3276690)
    74. TargetExtractTrails:
    75. TrailNameSeqnoMaxTrailType
    76. /u01/ogg/base/dirdat/ea                             67146920 EXTTRAIL
    77. GGSCI (t1)16> stop exta
    78. GGSCI (t1)17> start exta
    79. TargetExtractTrails:
    80. TrailNameSeqnoMaxTrailType
    81. /u01/ogg/base/dirdat/ea                             68146920 EXTTRAIL
    82. 5)源端抽取进程seq +1之后,源端投递进程读取的文件+1,投递进程写入目标端seq 文件+1,目标端复制进程读取的seq 文件+1
    83. GGSCI (t1)19> info dpea
    84. EXTRACT    DPEA      LastStarted2020-11-1011:30Status RUNNING
    85. CheckpointLag00:00:00(updated 00:00:08)
    86. Process11117
    87. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
    88. 2020-11-1011:31:58.3801851469
    89.  
    90. GGSCI (t2)45> info repa
    91. REPLICAT   REPA      LastStarted2020-11-1011:28Status RUNNING
    92. CheckpointLag00:00:00(updated 00:00:02)
    93. Process11132
    94. LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
    95. 2020-11-1011:31:58.0350411563
    96.  
    97. 6)源端{确认OGG链路处于同步状态}
    98. SQL>into(3,'cc',sysdate);
    99. SQL>;
    100. GGSCI (t1)22> info dpea
    101. EXTRACT    DPEA      LastStarted2020-11-1011:30Status RUNNING
    102. CheckpointLag00:00:00(updated 00:00:00)
    103. Process11117
    104. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
    105. 2020-11-1011:34:52.0000002284
    106.  
    107. 目标端
    108. SQL>select*from;
    109.         ID CC_NAME                        WITTIME
    110. ----------------------------------------------------------------------
    111. 310-NOV-2011.34.50.000000 AM
    112. 2203-JUN-2002.34.37.000000 PM
    113.  
    114. REPLICAT   REPA      LastStarted2020-11-1011:28Status RUNNING
    115. CheckpointLag00:00:00(updated 00:00:04)
    116. Process11132
    117. LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
    118. 2020-11-1011:34:51.6560022378
    1.3模拟破坏目标端OGG应用Dump文件,如何处理
    1. 1)手工修改dump文件
    2. [ogg@t2 ~]$ vi /u01/ogg/base/dirdat/t1000000052
    3. 破坏文件
    4.  
    5. 2)源端插入1条测试数据
    6. SQL>into(4,'cc',sysdate);
    7. SQL>;
    8.  
    9. 3)OGG 复制进程Abend
    10. 2020-11-1011:36:59-02171Errorfrom.Status509,TrailDataSource.
    11. 2020-11-1011:36:59-02191Incompatible101in/u01/ogg/base/dirdat/t1000000052,2,378when.
    12. 2020-11-1011:36:59-01668.
    13.  
    14. 4)源端再次插入1条测试数据
    15. SQL>into(5,'cc',sysdate);
    16. 1.
    17. SQL>;
    18. GGSCI (t1)38> info dpea
    19. EXTRACT    DPEA      LastStarted2020-11-1011:30Status RUNNING
    20. CheckpointLag00:00:00(updated 00:00:03)
    21. Process11117
    22. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
    23. 2020-11-1013:25:29.0000002604
    24. 此时,对于源端投递进程来说,eaxxx68 这个队列文件中,存在两条Insert记录;
    25. 对于目标端应用进程来说,repa t1xxx52队列文件中,应用第一条记录就报错了!
    26.      
    27. 投递进程重新投递eaxxx68队列文件,这个文件被我们手工人为破坏了,【实际生产运维过程中,存在网络波动包损坏等,导致源端投递进程无法写入文件,导致OGG同步链路中断】,
    原本是想模拟这个场景,但是本次模拟投递正常,应用失败。28. GGSCI (t1)40> info dpea
    29. EXTRACT    DPEA      LastStarted2020-11-1011:30Status RUNNING
    30. CheckpointLag00:00:00(updated 00:00:03)
    31. Process11117
    32. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
    33. 2020-11-1013:25:29.0000002604
    34.  
    35. GGSCI (t1)47> view param dpea
    36. extract dpea
    37. rmthost 10.0.0.32,mgrport 7809, compress
    38. rmttrail /u01/ogg/base/dirdat/t1
    39. table YZ.DD;
    40.      
    5)如何处理???既然是dump文件损坏,源端投递进程重新再次投递一个这个seqno文件不就可行?使用etrollover前滚投递进程!41. GGSCI (t1)55> alter EXTRACT dpea etrollover
    42. 2020-11-1013:39:25-01520Rollover.ForVersion10or,
    ,for's reader (either pump EXTRACT or REPLICAT) to move the reader's
    new;not.43. EXTRACT altered.
    44.  
    45. GGSCI (t1)48>,detail
    46. EXTRACT    DPEA      Initialized2020-11-1011:30Status STOPPED
    47. CheckpointLag00:00:00(updated 00:01:07)
    48. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
    49. 2020-11-1013:25:29.0000002604
    50. TargetExtractTrails:
    51. TrailNameSeqnoMaxTrailType
    52. /u01/ogg/base/dirdat/t1                             53020 RMTTRAIL
    53. ExtractSourceBeginEnd
    54. /u01/ogg/base/dirdat/ea000000068        *Initialized*2020-11-1013:25
    55. /u01/ogg/base/dirdat/ea000000068        2020-11-1011:052020-11-1013:25
    56. /u01/ogg/base/dirdat/ea000000067        2020-10-1313:242020-11-1011:05
    57. /u01/ogg/base/dirdat/ea000000066        2020-10-1313:242020-10-1313:24
    58. [ogg@t2 ~]$ ls -lrt /u01/ogg/base/dirdat/t1*
    59. GGSCI (t1)49> start dpea
    60. 可以发现什么问题?里面存着2个eaxxx68 seqno文件,正常情况下只会出现1条,并且end一致,因此相当于这个seq文件重新投递。 
    61. 6)目标端再次启动复制进程
    62. GGSCI (t2)52> info repa 
    63.  
    64. REPLICAT   REPA      LastStarted2020-11-1011:28Status ABENDED
    65. CheckpointLag00:00:00(updated 01:58:17)
    66. LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
    67. 2020-11-1011:34:51.6560022378
    GGSCI (t2) 58> start repa
    GGSCI (t2) 59> info repa
    REPLICAT REPA Last Started 2020-11-10 13:35 Status RUNNING
    Checkpoint Lag 00:00:00 (updated 00:00:10 ago)
    Process ID 12727
    Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052
    2020-11-10 13:25:28.699520 RBA 2698
    SQL> select * from dd;
    ID CC_NAME WITTIME
    ---------- ------------------------------ ------------------------------
    3 cc 10-NOV-20 11.34.50.000000 AM
    2 2 03-JUN-20 02.34.37.000000 PM
    4 cc 10-NOV-20 11.37.19.000000 AM
    5 cc 10-NOV-20 01.25.27.000000 PM