OGG投递进程报错无法open文件,无法正常投递
1.1现象
之前有个客户遇到一个问题,OGG同步数据链路,突然有一天网络出现问题,导致OGG投递进程无法正常投递,无法写入目标端的该文件。
猜测是由于网络丢包等原因导致文件损坏,无法正常open,read,write. 解决方法,投递进程etrollover。
本篇文档是基于这种方式测试下etrollover 【测试没有完美还原网络的问题,只是对其进行了测试】
1.2测试OGG进程restart与seqno有什么关系?
1. 1)OGG 同步表及进程参数查看
SQL>select*from;2. ID CC_NAME WITTIME
3. ----------------------------------------------------------------------
4. 2203-JUN-2002.34.37.000000 PM
5.
6. GGSCI (t1)4> view param exta
7. extract exta
8. USERID ogg,PASSWORD ogg
9. EXTTRAIL /u01/ogg/base/dirdat/ea
10. table YZ.DD;
11.
12. GGSCI (t1)5> view param dpea
13. extract dpea
14. rmthost 10.0.0.32,mgrport 7809, compress
15. rmttrail /u01/ogg/base/dirdat/t1
16. table YZ.B;
17. table YZ.DD;
18.
19. GGSCI (t1)7> info exta
20. EXTRACT EXTA LastStarted2020-11-1011:05Status RUNNING
21. CheckpointLag00:00:00(updated 00:00:08)
22. Process10744
23. LogReadCheckpointOracleRedoLogs
24. 2020-11-1011:25:54Seqno353,3917824
25. 0.3276594(3276594)
26. GGSCI (t1)8> info dpea
27. EXTRACT DPEA LastStarted2020-11-1011:05Status RUNNING
28. CheckpointLag00:00:00(updated 00:00:09)
29. Process10776
30. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000067
31. 2020-11-1011:05:01.6690871469
32.
33. SQL>select*from;
34. ID CC_NAME WITTIME
35. ----------------------------------------------------------------------
36. 2203-JUN-2002.34.37.000000 PM
37. GGSCI (t2)26> view param repa
38. replicat repa
39. userid ogg,password ogg
40. assumetargetdefs
41. HANDLECOLLISIONS
42. discardfile /u01/ogg/base/dirrpt/repa.dsc
43. MAP YZ.DD ,TARGET BAK_YZ.DD;
44.
45. GGSCI (t2)27> info repa
46. REPLICAT REPA LastStarted2020-11-1011:20Status RUNNING
47. CheckpointLag00:00:00(updated 00:00:09)
48. Process11023
49. LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000051
50. 2020-11-1011:05:01.3137911563
51.
52. 2)目标端OGG复制进程重启,复制进程对应的trail 文件seq不变
53. GGSCI (t2)28> stop repa
54. GGSCI (t2)29> start repa
55.
56. 3)源端OGG投递进程重启,投递进程对应的trail 文件seq不变
57. GGSCI (t1)9> stop dpea
58. GGSCI (t1)10> start dpea
59. GGSCI (t1)13> info dpea
60. EXTRACT DPEA LastStarted2020-11-1011:30Status RUNNING
61. CheckpointLag00:00:00(updated 00:00:04)
62. Process11117
63. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000067
64. FirstRecord1469
65.
66. 4)源端OGG抽取进程重启,抽取进程对应的trail 文件seq +1
67. GGSCI (t1)15>,detail
68. EXTRACT EXTA LastStarted2020-11-1011:05Status RUNNING
69. CheckpointLag00:00:00(updated 00:00:09)
70. Process10744
71. LogReadCheckpointOracleRedoLogs
72. 2020-11-1011:30:15Seqno353,3919360
73. 0.3276690(3276690)
74. TargetExtractTrails:
75. TrailNameSeqnoMaxTrailType
76. /u01/ogg/base/dirdat/ea 67146920 EXTTRAIL
77. GGSCI (t1)16> stop exta
78. GGSCI (t1)17> start exta
79. TargetExtractTrails:
80. TrailNameSeqnoMaxTrailType
81. /u01/ogg/base/dirdat/ea 68146920 EXTTRAIL
82. 5)源端抽取进程seq +1之后,源端投递进程读取的文件+1,投递进程写入目标端seq 文件+1,目标端复制进程读取的seq 文件+1
83. GGSCI (t1)19> info dpea
84. EXTRACT DPEA LastStarted2020-11-1011:30Status RUNNING
85. CheckpointLag00:00:00(updated 00:00:08)
86. Process11117
87. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
88. 2020-11-1011:31:58.3801851469
89.
90. GGSCI (t2)45> info repa
91. REPLICAT REPA LastStarted2020-11-1011:28Status RUNNING
92. CheckpointLag00:00:00(updated 00:00:02)
93. Process11132
94. LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
95. 2020-11-1011:31:58.0350411563
96.
97. 6)源端{确认OGG链路处于同步状态}
98. SQL>into(3,'cc',sysdate);
99. SQL>;
100. GGSCI (t1)22> info dpea
101. EXTRACT DPEA LastStarted2020-11-1011:30Status RUNNING
102. CheckpointLag00:00:00(updated 00:00:00)
103. Process11117
104. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
105. 2020-11-1011:34:52.0000002284
106.
107. 目标端
108. SQL>select*from;
109. ID CC_NAME WITTIME
110. ----------------------------------------------------------------------
111. 310-NOV-2011.34.50.000000 AM
112. 2203-JUN-2002.34.37.000000 PM
113.
114. REPLICAT REPA LastStarted2020-11-1011:28Status RUNNING
115. CheckpointLag00:00:00(updated 00:00:04)
116. Process11132
117. LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
118. 2020-11-1011:34:51.6560022378
1.3模拟破坏目标端OGG应用Dump文件,如何处理
1. 1)手工修改dump文件
2. [ogg@t2 ~]$ vi /u01/ogg/base/dirdat/t1000000052
3. 破坏文件
4.
5. 2)源端插入1条测试数据
6. SQL>into(4,'cc',sysdate);
7. SQL>;
8.
9. 3)OGG 复制进程Abend
10. 2020-11-1011:36:59-02171Errorfrom.Status509,TrailDataSource.
11. 2020-11-1011:36:59-02191Incompatible101in/u01/ogg/base/dirdat/t1000000052,2,378when.
12. 2020-11-1011:36:59-01668.
13.
14. 4)源端再次插入1条测试数据
15. SQL>into(5,'cc',sysdate);
16. 1.
17. SQL>;
18. GGSCI (t1)38> info dpea
19. EXTRACT DPEA LastStarted2020-11-1011:30Status RUNNING
20. CheckpointLag00:00:00(updated 00:00:03)
21. Process11117
22. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
23. 2020-11-1013:25:29.0000002604
24. 此时,对于源端投递进程来说,eaxxx68 这个队列文件中,存在两条Insert记录;
25. 对于目标端应用进程来说,repa t1xxx52队列文件中,应用第一条记录就报错了!
26.
27. 投递进程重新投递eaxxx68队列文件,这个文件被我们手工人为破坏了,【实际生产运维过程中,存在网络波动包损坏等,导致源端投递进程无法写入文件,导致OGG同步链路中断】,
原本是想模拟这个场景,但是本次模拟投递正常,应用失败。28. GGSCI (t1)40> info dpea
29. EXTRACT DPEA LastStarted2020-11-1011:30Status RUNNING
30. CheckpointLag00:00:00(updated 00:00:03)
31. Process11117
32. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
33. 2020-11-1013:25:29.0000002604
34.
35. GGSCI (t1)47> view param dpea
36. extract dpea
37. rmthost 10.0.0.32,mgrport 7809, compress
38. rmttrail /u01/ogg/base/dirdat/t1
39. table YZ.DD;
40.
5)如何处理???既然是dump文件损坏,源端投递进程重新再次投递一个这个seqno文件不就可行?使用etrollover前滚投递进程!41. GGSCI (t1)55> alter EXTRACT dpea etrollover
42. 2020-11-1013:39:25-01520Rollover.ForVersion10or,
,for's reader (either pump EXTRACT or REPLICAT) to move the reader's
new;not.43. EXTRACT altered.
44.
45. GGSCI (t1)48>,detail
46. EXTRACT DPEA Initialized2020-11-1011:30Status STOPPED
47. CheckpointLag00:00:00(updated 00:01:07)
48. LogReadCheckpointFile/u01/ogg/base/dirdat/ea000000068
49. 2020-11-1013:25:29.0000002604
50. TargetExtractTrails:
51. TrailNameSeqnoMaxTrailType
52. /u01/ogg/base/dirdat/t1 53020 RMTTRAIL
53. ExtractSourceBeginEnd
54. /u01/ogg/base/dirdat/ea000000068 *Initialized*2020-11-1013:25
55. /u01/ogg/base/dirdat/ea000000068 2020-11-1011:052020-11-1013:25
56. /u01/ogg/base/dirdat/ea000000067 2020-10-1313:242020-11-1011:05
57. /u01/ogg/base/dirdat/ea000000066 2020-10-1313:242020-10-1313:24
58. [ogg@t2 ~]$ ls -lrt /u01/ogg/base/dirdat/t1*
59. GGSCI (t1)49> start dpea
60. 可以发现什么问题?里面存着2个eaxxx68 seqno文件,正常情况下只会出现1条,并且end一致,因此相当于这个seq文件重新投递。
61. 6)目标端再次启动复制进程
62. GGSCI (t2)52> info repa
63.
64. REPLICAT REPA LastStarted2020-11-1011:28Status ABENDED
65. CheckpointLag00:00:00(updated 01:58:17)
66. LogReadCheckpointFile/u01/ogg/base/dirdat/t1000000052
67. 2020-11-1011:34:51.6560022378
GGSCI (t2) 58> start repa
GGSCI (t2) 59> info repa
REPLICAT REPA Last Started 2020-11-10 13:35 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:10 ago)
Process ID 12727
Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052
2020-11-10 13:25:28.699520 RBA 2698
SQL> select * from dd;
ID CC_NAME WITTIME
---------- ------------------------------ ------------------------------
3 cc 10-NOV-20 11.34.50.000000 AM
2 2 03-JUN-20 02.34.37.000000 PM
4 cc 10-NOV-20 11.37.19.000000 AM
5 cc 10-NOV-20 01.25.27.000000 PM