CoProcessFunction 简介

对于连接流ConnectedStreams的处理操作,需要分别定义对两条流的处理转换,因此接口中就会有两个相同的方法需要实现,用数字“1”“2”区分,在两条流中的数据到来时分别调用。我们把这种接口叫作“协同处理函数”(co-process function)。与CoMapFunction类似,如果是调用.flatMap()就需要传入一个CoFlatMapFunction,需要实现flatMap1()、flatMap2()两个方法;而调用.process()时,传入的则是一个CoProcessFunction。抽象类CoProcessFunction在源码中定义如下:

@PublicEvolving
public abstract class CoProcessFunction<IN1, IN2, OUT> extends AbstractRichFunction {

    private static final long serialVersionUID = 1L;

    /**
     * This method is called for each element in the first of the connected streams.
     *
     * <p>This function can output zero or more elements using the {@link Collector} parameter and
     * also update internal state or set timers using the {@link Context} parameter.
     *
     * @param value The stream element
     * @param ctx A {@link Context} that allows querying the timestamp of the element, querying the
     *     {@link TimeDomain} of the firing timer and getting a {@link TimerService} for registering
     *     timers and querying the time. The context is only valid during the invocation of this
     *     method, do not store it.
     * @param out The collector to emit resulting elements to
     * @throws Exception The function may throw exceptions which cause the streaming program to fail
     *     and go into recovery.
     */
    public abstract void processElement1(IN1 value, Context ctx, Collector<OUT> out)
            throws Exception;

    /**
     * This method is called for each element in the second of the connected streams.
     *
     * <p>This function can output zero or more elements using the {@link Collector} parameter and
     * also update internal state or set timers using the {@link Context} parameter.
     *
     * @param value The stream element
     * @param ctx A {@link Context} that allows querying the timestamp of the element, querying the
     *     {@link TimeDomain} of the firing timer and getting a {@link TimerService} for registering
     *     timers and querying the time. The context is only valid during the invocation of this
     *     method, do not store it.
     * @param out The collector to emit resulting elements to
     * @throws Exception The function may throw exceptions which cause the streaming program to fail
     *     and go into recovery.
     */
    public abstract void processElement2(IN2 value, Context ctx, Collector<OUT> out)
            throws Exception;

    /**
     * Called when a timer set using {@link TimerService} fires.
     *
     * @param timestamp The timestamp of the firing timer.
     * @param ctx An {@link OnTimerContext} that allows querying the timestamp of the firing timer,
     *     querying the {@link TimeDomain} of the firing timer and getting a {@link TimerService}
     *     for registering timers and querying the time. The context is only valid during the
     *     invocation of this method, do not store it.
     * @param out The collector for returning result values.
     * @throws Exception This method may throw exceptions. Throwing an exception will cause the
     *     operation to fail and may trigger recovery.
     */
    public void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) throws Exception {}

    /**
     * Information available in an invocation of {@link #processElement1(Object, Context,
     * Collector)}/ {@link #processElement2(Object, Context, Collector)} or {@link #onTimer(long,
     * OnTimerContext, Collector)}.
     */
    public abstract class Context {

        /**
         * Timestamp of the element currently being processed or timestamp of a firing timer.
         *
         * <p>This might be {@code null}, for example if the time characteristic of your program is
         * set to {@link org.apache.flink.streaming.api.TimeCharacteristic#ProcessingTime}.
         */
        public abstract Long timestamp();

        /** A {@link TimerService} for querying time and registering timers. */
        public abstract TimerService timerService();

        /**
         * Emits a record to the side output identified by the {@link OutputTag}.
         *
         * @param outputTag the {@code OutputTag} that identifies the side output to emit to.
         * @param value The record to emit.
         */
        public abstract <X> void output(OutputTag<X> outputTag, X value);
    }

    /**
     * Information available in an invocation of {@link #onTimer(long, OnTimerContext, Collector)}.
     */
    public abstract class OnTimerContext extends Context {
        /** The {@link TimeDomain} of the firing timer. */
        public abstract TimeDomain timeDomain();
    }
}

可以看到,很明显CoProcessFunction也是“处理函数”家族中的一员,用法非常相似。它需要实现的就是processElement1()、processElement2()两个方法,在每个数据到来时,会根据来源的流调用其中的一个方法进行处理。CoProcessFunction同样可以通过上下文ctx来访问timestamp、水位线,并通过TimerService注册定时器;另外也提供了.onTimer()方法,用于定义定时触发的处理操作。下面是CoProcessFunction的一个具体示例:我们可以实现一个实时对账的需求,也就是app的支付操作和第三方的支付操作的一个双流Join。App的支付事件和第三方的支付事件将会互相等待5秒钟,如果等不来对应的支付事件,那么就输出报警信息.

参考代码

/**
 * 实时对账 demo
 */
public class BillCheckExample0828 {
    public static void main(String[] args) throws Exception {
        //1、获取执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //1.1、便于测试,测试环境设置并行度为 1,生产环境记得设置为 kafka topic 的分区数
        env.setParallelism(1);
        //2、读取数据 并 声明水位线
        //2.1、模拟来自app 的数据 appStream
        SingleOutputStreamOperator<Tuple3<String, String, Long>> appStream = env.fromElements(
                Tuple3.of("order-1", "app", 1000L),
                Tuple3.of("order-2", "app", 2000L),
                Tuple3.of("order-3", "app", 3500L)
        ).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple3<String, String, Long>>forBoundedOutOfOrderness(Duration.ZERO)
                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() {
                    @Override
                    public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) {
                        return element.f2;
                    }
                }));

        //2.2、模拟来自第三方支付平台的数据  
        SingleOutputStreamOperator<Tuple4<String, String, String, Long>> thirdPartStream = env.fromElements(
                Tuple4.of("order-1", "third-party", "success", 3000L),
                Tuple4.of("order-3", "third-party", "success", 4000L)
        ).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple4<String, String, String, Long>>forBoundedOutOfOrderness(Duration.ZERO)
                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple4<String, String, String, Long>>() {
                    @Override
                    public long extractTimestamp(Tuple4<String, String, String, Long> element, long recordTimestamp) {
                        return element.f3;
                    }
                }));
        
        //3、调用实现 CoProcessFunction 的静态类 检查同一支付单,是否两条流种是否匹配
        appStream.connect(thirdPartStream).keyBy(data -> data.f0, data -> data.f0)
                .process(new OrderMatchResult0828())
                .print();


        env.execute();
    }

    /**
     * 自定义实现 CoProcessFunction
     */
    public static class OrderMatchResult0828 extends CoProcessFunction<Tuple3<String, String, Long>, Tuple4<String, String, String, Long>, String> {

        //定义状态,保存已经到达的状态
        private ValueState<Tuple3<String, String, Long>> appEventState;
        private ValueState<Tuple4<String, String, String, Long>> thirdPartyEventState;

        @Override
        public void open(Configuration parameters) throws Exception {
            appEventState = getRuntimeContext().getState(
                    new ValueStateDescriptor<Tuple3<String, String, Long>>("app-state", Types.TUPLE(Types.STRING, Types.STRING, Types.LONG))
            );

            thirdPartyEventState = getRuntimeContext().getState(
                    new ValueStateDescriptor<Tuple4<String, String, String, Long>>("thirt-party-state", Types.TUPLE(Types.STRING, Types.STRING, Types.STRING, Types.LONG))
            );
        }

        @Override
        public void processElement1(Tuple3<String, String, Long> value, Context ctx, Collector<String> out) throws Exception {
            //来的时 app 数据,查看 第三方数据是否来过
            if (thirdPartyEventState.value() != null) {
                out.collect("对账成功:" + value + " " + thirdPartyEventState.value());
                //对账成功后可以清空状态
                thirdPartyEventState.clear();
            } else {
                //更新状态 更新 app
                appEventState.update(value);
                //定义注册定时器,等待另一条流的数据
                ctx.timerService().registerEventTimeTimer(value.f2 + 5000L); //等待 5s
            }
        }

        @Override
        public void processElement2(Tuple4<String, String, String, Long> value, Context ctx, Collector<String> out) throws Exception {
            //来的时 app 数据,查看 第三方数据是否来过
            if (appEventState.value() != null) {
                out.collect("对账成功:" + appEventState.value() + " " + value);
                //对账成功后可以清空状态
                appEventState.clear();
            } else {
                //更新状态 更新 app
                thirdPartyEventState.update(value);
                //定义注册定时器,等待另一条流的数据
                ctx.timerService().registerEventTimeTimer(value.f3 + 5000L); //等待 5s
            }
        }

        //定时触发
        @Override
        public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
            //如果某个状态不为空,说明另一方流差数据
            if (appEventState.value() != null) {
                out.collect("对账失败 " + appEventState.value() + " 第三方差数据");
            }
            if (thirdPartyEventState.value() != null) {
                out.collect("对账失败 " + thirdPartyEventState.value() + " app差数据");
            }

            //清空数据
            appEventState.clear();
            thirdPartyEventState.clear();

        }
    }
}

运行效果

对账成功:(order-1,app,1000) (order-1,third-party,success,3000)
对账成功:(order-3,app,3500) (order-3,third-party,success,4000)
对账失败 (order-2,app,2000) 第三方差数据