问题描述
使用flink sql语法创建源表,使用flink-mysql-cdc读取mysql的binlog:
CREATE TABLE mysql_binlog (
user_id STRING NOT NULL,
birthday INT,
PRIMARY KEY (user_id) NOT ENFORCED
) WITH (
'connector' = 'mysql-cdc',
'hostname' = 'host',
'port' = '3306',
'username' = 'root',
'password' = 'root',
'database-name' = 'test',
'scan.startup.mode' = 'initial',
'table-name' = 'test_0\d*'
)
注意其中的database-name
和table-name
配置确定了读取mysql中的哪些表,由于使用了正则匹配,因此该配置包含了多张表,如果修改这两个配置,并且将flink作业停止后从savepoint恢复,有以下现象:
- 举例:将
table-name
从test_0\d*
修改为test_\d{2}
表 | 配置修改前是否包含 | 配置修改后是否包含 | 现象 |
test_0 | √ | × | 修改前会读取,修改后不会读取 |
test_00 | √ | √ | 修改前后都会读取该表的binlog |
test_100 | × | × | 修改前后都不会读取该表的binlog |
test_10 | × | √ | 出现异常!作业失效但不会停止 |
前三种情况符合我们的预期,第四种情况按理来说应该是:修改前不会读取,修改后会读取,但并非如此,修改配置后表test_10
仍然无法读取binlog,并且其他表的binlog也不再读取了,整个flink作业失效。
查看flink TaskManager日志,发现如下内容:
2022-07-08 14:30:41,695 ERROR io.debezium.connector.mysql.MySqlStreamingChangeEventSource [] - Encountered change event 'Event{header=EventHeaderV4{timestamp=1657261841000, eventType=TABLE_MAP, serverId=21915752, headerLength=19, dataLength=43, nextPosition=178324, flags=0}, data=TableMapEventData{tableId=373, database='test', table='test_10', columnTypes=15, 10, columnMetadata=765, 0, columnNullability={1}, eventMetadata=null}}' at offset {transaction_id=null, file=mysql-bin.000062, pos=178190, gtids=23d27819-f2d7-11ec-a644-00163e0eca02:1-93299,3641f1c2-f2d7-11ec-97e8-00163e0e82b7:1-3, server_id=21915752, event=1} for table test.test_10 whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.
Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position=178262 --stop-position=178324 --verbose mysql-bin.000062
2022-07-08 14:30:41,695 ERROR io.debezium.connector.mysql.MySqlStreamingChangeEventSource [] - Error during binlog processing. Last offset stored = null, binlog reader near position = mysql-bin.000062/178262
2022-07-08 14:30:41,696 WARN com.ververica.cdc.connectors.mysql.debezium.task.context.MySqlErrorHandler [] - Schema for table test.test_10 is null
2022-07-08 14:30:41,696 INFO io.debezium.connector.mysql.MySqlStreamingChangeEventSource [] - Error processing binlog event, and propagating to Kafka Connect so it stops this connector. Future binlog events read before connector is shutdown will be ignored.
问题排查
通过阅读源码io.debezium.connector.mysql.MySqlStreamingChangeEventSource
发现以下代码:
private void informAboutUnknownTableIfRequired(Event event, TableId tableId, String typeToLog) {
if (tableId != null && connectorConfig.getTableFilters().dataCollectionFilter().isIncluded(tableId)) {
metrics.onErroneousEvent("source = " + tableId + ", event " + event);
EventHeaderV4 eventHeader = event.getHeader();
if (inconsistentSchemaHandlingMode == EventProcessingFailureHandlingMode.FAIL) {
LOGGER.error(
"Encountered change event '{}' at offset {} for table {} whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.{}"
+ "Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position={} --stop-position={} --verbose {}",
event, offsetContext.getOffset(), tableId, System.lineSeparator(), eventHeader.getPosition(),
eventHeader.getNextPosition(), offsetContext.getSource().binlogFilename());
throw new DebeziumException("Encountered change event for table " + tableId
+ " whose schema isn't known to this connector");
}
else if (inconsistentSchemaHandlingMode == EventProcessingFailureHandlingMode.WARN) {
LOGGER.warn(
"Encountered change event '{}' at offset {} for table {} whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.{}"
+ "The event will be ignored.{}"
+ "Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position={} --stop-position={} --verbose {}",
event, offsetContext.getOffset(), tableId, System.lineSeparator(), System.lineSeparator(),
eventHeader.getPosition(), eventHeader.getNextPosition(), offsetContext.getSource().binlogFilename());
}
else {
LOGGER.debug(
"Encountered change event '{}' at offset {} for table {} whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.{}"
+ "The event will be ignored.{}"
+ "Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position={} --stop-position={} --verbose {}",
event, offsetContext.getOffset(), tableId, System.lineSeparator(), System.lineSeparator(),
eventHeader.getPosition(), eventHeader.getNextPosition(), offsetContext.getSource().binlogFilename());
}
}
else {
LOGGER.debug("Filtering {} event: {} for non-monitored table {}", typeToLog, event, tableId);
metrics.onFilteredEvent("source = " + tableId);
}
}
当发现一个表但却找不到这个表的过往状态的时候,会进入这个方法,并且由于debezium的参数inconsistent.schema.handling.mode
设置成FAIL(默认就是FAIL),因此会抛出异常。
抛出的异常会在handleEvent
方法被捕获,并且会清空所有的eventHandlers
,这也就导致了整个flink作业失效且不会停止:
// ---------------------other code------------------------
try {
// Forward the event to the handler ...
eventHandlers.getOrDefault(eventType, this::ignoreEvent).accept(event);
// ---------------------other code------------------------
}
catch (RuntimeException e) {
// There was an error in the event handler, so propagate the failure to Kafka Connect ...
logStreamingSourceState();
errorHandler.setProducerThrowable(new DebeziumException("Error processing binlog event", e));
// Do not stop the client, since Kafka Connect should stop the connector on it's own
// (and doing it here may cause problems the second time it is stopped).
// We can clear the listeners though so that we ignore all future events ...
eventHandlers.clear();
LOGGER.info(
"Error processing binlog event, and propagating to Kafka Connect so it stops this connector. Future binlog events read before connector is shutdown will be ignored.");
}
// ---------------------other code------------------------
问题解决
如果不希望整个flink作业失效,可以在flink-mysql-cdc连接器里添加debezium.inconsistent.schema.handling.mode
参数(注意:所有debezium的参数都可以通过添加debezium.
参数设置),设置成warn
或其他即可。
但即使设置了如上参数,test_10
这个表的binlog依然会无法读取,只是不再影响别的表而已。究其原因是因为flink作业的状态中没有这个表的信息。
总结
如果希望flink-mysql-cdc的状态中有一个表的信息,有以下几种方式:
- flink作业初次启动初始化的时候这个表已经存在,且属于配置的表名范围内
- flink作业进行savepoint的时候会将当前的状态中的表信息保留起来,下次启动如果从savepoint启动那么就会从savepoint中读取所有的表,而不再会读取mysql中有哪些表
- flink运行中或曾经启动过(已经有binlog的offset),然后再创建新的属于配置的表名范围内的表,那么binlog中将会有对应的建表日志,flink在读到建表日志后也会将该表读取到状态中
而如果修改了flink-mysql-cdc配置的表名范围,导致出现一个新的表,那么无论如何都无法将该表加入到状态中。