问题描述

使用flink sql语法创建源表,使用flink-mysql-cdc读取mysql的binlog:

CREATE TABLE mysql_binlog (
    user_id STRING NOT NULL,
    birthday INT,
    PRIMARY KEY (user_id) NOT ENFORCED
) WITH (
    'connector' = 'mysql-cdc',
    'hostname' = 'host',
    'port' = '3306',
    'username' = 'root',
    'password' = 'root',
    'database-name' = 'test',
    'scan.startup.mode' = 'initial',
    'table-name' = 'test_0\d*'
)

注意其中的database-nametable-name配置确定了读取mysql中的哪些表,由于使用了正则匹配,因此该配置包含了多张表,如果修改这两个配置,并且将flink作业停止后从savepoint恢复,有以下现象:

  • 举例:将table-nametest_0\d*修改为test_\d{2}


配置修改前是否包含

配置修改后是否包含

现象

test_0


×

修改前读取,修改后不会读取

test_00



修改前后都读取该表的binlog

test_100

×

×

修改前后都不会读取该表的binlog

test_10

×


出现异常!作业失效但不会停止

前三种情况符合我们的预期,第四种情况按理来说应该是:修改前不会读取,修改后读取,但并非如此,修改配置后表test_10仍然无法读取binlog,并且其他表的binlog也不再读取了,整个flink作业失效

查看flink TaskManager日志,发现如下内容:

2022-07-08 14:30:41,695 ERROR io.debezium.connector.mysql.MySqlStreamingChangeEventSource  [] - Encountered change event 'Event{header=EventHeaderV4{timestamp=1657261841000, eventType=TABLE_MAP, serverId=21915752, headerLength=19, dataLength=43, nextPosition=178324, flags=0}, data=TableMapEventData{tableId=373, database='test', table='test_10', columnTypes=15, 10, columnMetadata=765, 0, columnNullability={1}, eventMetadata=null}}' at offset {transaction_id=null, file=mysql-bin.000062, pos=178190, gtids=23d27819-f2d7-11ec-a644-00163e0eca02:1-93299,3641f1c2-f2d7-11ec-97e8-00163e0e82b7:1-3, server_id=21915752, event=1} for table test.test_10 whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.
Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position=178262 --stop-position=178324 --verbose mysql-bin.000062
2022-07-08 14:30:41,695 ERROR io.debezium.connector.mysql.MySqlStreamingChangeEventSource  [] - Error during binlog processing. Last offset stored = null, binlog reader near position = mysql-bin.000062/178262
2022-07-08 14:30:41,696 WARN  com.ververica.cdc.connectors.mysql.debezium.task.context.MySqlErrorHandler [] - Schema for table test.test_10 is null
2022-07-08 14:30:41,696 INFO  io.debezium.connector.mysql.MySqlStreamingChangeEventSource  [] - Error processing binlog event, and propagating to Kafka Connect so it stops this connector. Future binlog events read before connector is shutdown will be ignored.

问题排查

通过阅读源码io.debezium.connector.mysql.MySqlStreamingChangeEventSource发现以下代码:

private void informAboutUnknownTableIfRequired(Event event, TableId tableId, String typeToLog) {
    if (tableId != null && connectorConfig.getTableFilters().dataCollectionFilter().isIncluded(tableId)) {
        metrics.onErroneousEvent("source = " + tableId + ", event " + event);
        EventHeaderV4 eventHeader = event.getHeader();

        if (inconsistentSchemaHandlingMode == EventProcessingFailureHandlingMode.FAIL) {
            LOGGER.error(
                    "Encountered change event '{}' at offset {} for table {} whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.{}"
                            + "Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position={} --stop-position={} --verbose {}",
                    event, offsetContext.getOffset(), tableId, System.lineSeparator(), eventHeader.getPosition(),
                    eventHeader.getNextPosition(), offsetContext.getSource().binlogFilename());
            throw new DebeziumException("Encountered change event for table " + tableId
                    + " whose schema isn't known to this connector");
        }
        else if (inconsistentSchemaHandlingMode == EventProcessingFailureHandlingMode.WARN) {
            LOGGER.warn(
                    "Encountered change event '{}' at offset {} for table {} whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.{}"
                            + "The event will be ignored.{}"
                            + "Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position={} --stop-position={} --verbose {}",
                    event, offsetContext.getOffset(), tableId, System.lineSeparator(), System.lineSeparator(),
                    eventHeader.getPosition(), eventHeader.getNextPosition(), offsetContext.getSource().binlogFilename());
        }
        else {
            LOGGER.debug(
                    "Encountered change event '{}' at offset {} for table {} whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.{}"
                            + "The event will be ignored.{}"
                            + "Use the mysqlbinlog tool to view the problematic event: mysqlbinlog --start-position={} --stop-position={} --verbose {}",
                    event, offsetContext.getOffset(), tableId, System.lineSeparator(), System.lineSeparator(),
                    eventHeader.getPosition(), eventHeader.getNextPosition(), offsetContext.getSource().binlogFilename());
        }
    }
    else {
        LOGGER.debug("Filtering {} event: {} for non-monitored table {}", typeToLog, event, tableId);
        metrics.onFilteredEvent("source = " + tableId);
    }
}

当发现一个表但却找不到这个表的过往状态的时候,会进入这个方法,并且由于debezium的参数inconsistent.schema.handling.mode设置成FAIL(默认就是FAIL),因此会抛出异常。

抛出的异常会在handleEvent方法被捕获,并且会清空所有的eventHandlers,这也就导致了整个flink作业失效且不会停止:

// ---------------------other code------------------------
        try {
            // Forward the event to the handler ...
            eventHandlers.getOrDefault(eventType, this::ignoreEvent).accept(event);
        // ---------------------other code------------------------
        }
        catch (RuntimeException e) {
            // There was an error in the event handler, so propagate the failure to Kafka Connect ...
            logStreamingSourceState();
            errorHandler.setProducerThrowable(new DebeziumException("Error processing binlog event", e));
            // Do not stop the client, since Kafka Connect should stop the connector on it's own
            // (and doing it here may cause problems the second time it is stopped).
            // We can clear the listeners though so that we ignore all future events ...
            eventHandlers.clear();
            LOGGER.info(
                    "Error processing binlog event, and propagating to Kafka Connect so it stops this connector. Future binlog events read before connector is shutdown will be ignored.");
        }
        // ---------------------other code------------------------

问题解决

如果不希望整个flink作业失效,可以在flink-mysql-cdc连接器里添加debezium.inconsistent.schema.handling.mode参数(注意:所有debezium的参数都可以通过添加debezium.参数设置),设置成warn或其他即可。

但即使设置了如上参数,test_10这个表的binlog依然会无法读取,只是不再影响别的表而已。究其原因是因为flink作业的状态中没有这个表的信息。

总结

如果希望flink-mysql-cdc的状态中有一个表的信息,有以下几种方式:

  • flink作业初次启动初始化的时候这个表已经存在,且属于配置的表名范围内
  • flink作业进行savepoint的时候会将当前的状态中的表信息保留起来,下次启动如果从savepoint启动那么就会从savepoint中读取所有的表,而不再会读取mysql中有哪些表
  • flink运行中或曾经启动过(已经有binlog的offset),然后再创建新的属于配置的表名范围内的表,那么binlog中将会有对应的建表日志,flink在读到建表日志后也会将该表读取到状态中

而如果修改了flink-mysql-cdc配置的表名范围,导致出现一个新的表,那么无论如何都无法将该表加入到状态中。