CoProcessFunction 简介
对于连接流ConnectedStreams的处理操作,需要分别定义对两条流的处理转换,因此接口中就会有两个相同的方法需要实现,用数字“1”“2”区分,在两条流中的数据到来时分别调用。我们把这种接口叫作“协同处理函数”(co-process function)。与CoMapFunction类似,如果是调用.flatMap()就需要传入一个CoFlatMapFunction,需要实现flatMap1()、flatMap2()两个方法;而调用.process()时,传入的则是一个CoProcessFunction。抽象类CoProcessFunction在源码中定义如下:
@PublicEvolving
public abstract class CoProcessFunction<IN1, IN2, OUT> extends AbstractRichFunction {
private static final long serialVersionUID = 1L;
/**
* This method is called for each element in the first of the connected streams.
*
* <p>This function can output zero or more elements using the {@link Collector} parameter and
* also update internal state or set timers using the {@link Context} parameter.
*
* @param value The stream element
* @param ctx A {@link Context} that allows querying the timestamp of the element, querying the
* {@link TimeDomain} of the firing timer and getting a {@link TimerService} for registering
* timers and querying the time. The context is only valid during the invocation of this
* method, do not store it.
* @param out The collector to emit resulting elements to
* @throws Exception The function may throw exceptions which cause the streaming program to fail
* and go into recovery.
*/
public abstract void processElement1(IN1 value, Context ctx, Collector<OUT> out)
throws Exception;
/**
* This method is called for each element in the second of the connected streams.
*
* <p>This function can output zero or more elements using the {@link Collector} parameter and
* also update internal state or set timers using the {@link Context} parameter.
*
* @param value The stream element
* @param ctx A {@link Context} that allows querying the timestamp of the element, querying the
* {@link TimeDomain} of the firing timer and getting a {@link TimerService} for registering
* timers and querying the time. The context is only valid during the invocation of this
* method, do not store it.
* @param out The collector to emit resulting elements to
* @throws Exception The function may throw exceptions which cause the streaming program to fail
* and go into recovery.
*/
public abstract void processElement2(IN2 value, Context ctx, Collector<OUT> out)
throws Exception;
/**
* Called when a timer set using {@link TimerService} fires.
*
* @param timestamp The timestamp of the firing timer.
* @param ctx An {@link OnTimerContext} that allows querying the timestamp of the firing timer,
* querying the {@link TimeDomain} of the firing timer and getting a {@link TimerService}
* for registering timers and querying the time. The context is only valid during the
* invocation of this method, do not store it.
* @param out The collector for returning result values.
* @throws Exception This method may throw exceptions. Throwing an exception will cause the
* operation to fail and may trigger recovery.
*/
public void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) throws Exception {}
/**
* Information available in an invocation of {@link #processElement1(Object, Context,
* Collector)}/ {@link #processElement2(Object, Context, Collector)} or {@link #onTimer(long,
* OnTimerContext, Collector)}.
*/
public abstract class Context {
/**
* Timestamp of the element currently being processed or timestamp of a firing timer.
*
* <p>This might be {@code null}, for example if the time characteristic of your program is
* set to {@link org.apache.flink.streaming.api.TimeCharacteristic#ProcessingTime}.
*/
public abstract Long timestamp();
/** A {@link TimerService} for querying time and registering timers. */
public abstract TimerService timerService();
/**
* Emits a record to the side output identified by the {@link OutputTag}.
*
* @param outputTag the {@code OutputTag} that identifies the side output to emit to.
* @param value The record to emit.
*/
public abstract <X> void output(OutputTag<X> outputTag, X value);
}
/**
* Information available in an invocation of {@link #onTimer(long, OnTimerContext, Collector)}.
*/
public abstract class OnTimerContext extends Context {
/** The {@link TimeDomain} of the firing timer. */
public abstract TimeDomain timeDomain();
}
}
可以看到,很明显CoProcessFunction也是“处理函数”家族中的一员,用法非常相似。它需要实现的就是processElement1()、processElement2()两个方法,在每个数据到来时,会根据来源的流调用其中的一个方法进行处理。CoProcessFunction同样可以通过上下文ctx来访问timestamp、水位线,并通过TimerService注册定时器;另外也提供了.onTimer()方法,用于定义定时触发的处理操作。下面是CoProcessFunction的一个具体示例:我们可以实现一个实时对账的需求,也就是app的支付操作和第三方的支付操作的一个双流Join。App的支付事件和第三方的支付事件将会互相等待5秒钟,如果等不来对应的支付事件,那么就输出报警信息.
参考代码
/**
* 实时对账 demo
*/
public class BillCheckExample0828 {
public static void main(String[] args) throws Exception {
//1、获取执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//1.1、便于测试,测试环境设置并行度为 1,生产环境记得设置为 kafka topic 的分区数
env.setParallelism(1);
//2、读取数据 并 声明水位线
//2.1、模拟来自app 的数据 appStream
SingleOutputStreamOperator<Tuple3<String, String, Long>> appStream = env.fromElements(
Tuple3.of("order-1", "app", 1000L),
Tuple3.of("order-2", "app", 2000L),
Tuple3.of("order-3", "app", 3500L)
).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple3<String, String, Long>>forBoundedOutOfOrderness(Duration.ZERO)
.withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() {
@Override
public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) {
return element.f2;
}
}));
//2.2、模拟来自第三方支付平台的数据
SingleOutputStreamOperator<Tuple4<String, String, String, Long>> thirdPartStream = env.fromElements(
Tuple4.of("order-1", "third-party", "success", 3000L),
Tuple4.of("order-3", "third-party", "success", 4000L)
).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple4<String, String, String, Long>>forBoundedOutOfOrderness(Duration.ZERO)
.withTimestampAssigner(new SerializableTimestampAssigner<Tuple4<String, String, String, Long>>() {
@Override
public long extractTimestamp(Tuple4<String, String, String, Long> element, long recordTimestamp) {
return element.f3;
}
}));
//3、调用实现 CoProcessFunction 的静态类 检查同一支付单,是否两条流种是否匹配
appStream.connect(thirdPartStream).keyBy(data -> data.f0, data -> data.f0)
.process(new OrderMatchResult0828())
.print();
env.execute();
}
/**
* 自定义实现 CoProcessFunction
*/
public static class OrderMatchResult0828 extends CoProcessFunction<Tuple3<String, String, Long>, Tuple4<String, String, String, Long>, String> {
//定义状态,保存已经到达的状态
private ValueState<Tuple3<String, String, Long>> appEventState;
private ValueState<Tuple4<String, String, String, Long>> thirdPartyEventState;
@Override
public void open(Configuration parameters) throws Exception {
appEventState = getRuntimeContext().getState(
new ValueStateDescriptor<Tuple3<String, String, Long>>("app-state", Types.TUPLE(Types.STRING, Types.STRING, Types.LONG))
);
thirdPartyEventState = getRuntimeContext().getState(
new ValueStateDescriptor<Tuple4<String, String, String, Long>>("thirt-party-state", Types.TUPLE(Types.STRING, Types.STRING, Types.STRING, Types.LONG))
);
}
@Override
public void processElement1(Tuple3<String, String, Long> value, Context ctx, Collector<String> out) throws Exception {
//来的时 app 数据,查看 第三方数据是否来过
if (thirdPartyEventState.value() != null) {
out.collect("对账成功:" + value + " " + thirdPartyEventState.value());
//对账成功后可以清空状态
thirdPartyEventState.clear();
} else {
//更新状态 更新 app
appEventState.update(value);
//定义注册定时器,等待另一条流的数据
ctx.timerService().registerEventTimeTimer(value.f2 + 5000L); //等待 5s
}
}
@Override
public void processElement2(Tuple4<String, String, String, Long> value, Context ctx, Collector<String> out) throws Exception {
//来的时 app 数据,查看 第三方数据是否来过
if (appEventState.value() != null) {
out.collect("对账成功:" + appEventState.value() + " " + value);
//对账成功后可以清空状态
appEventState.clear();
} else {
//更新状态 更新 app
thirdPartyEventState.update(value);
//定义注册定时器,等待另一条流的数据
ctx.timerService().registerEventTimeTimer(value.f3 + 5000L); //等待 5s
}
}
//定时触发
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
//如果某个状态不为空,说明另一方流差数据
if (appEventState.value() != null) {
out.collect("对账失败 " + appEventState.value() + " 第三方差数据");
}
if (thirdPartyEventState.value() != null) {
out.collect("对账失败 " + thirdPartyEventState.value() + " app差数据");
}
//清空数据
appEventState.clear();
thirdPartyEventState.clear();
}
}
}
运行效果
对账成功:(order-1,app,1000) (order-1,third-party,success,3000)
对账成功:(order-3,app,3500) (order-3,third-party,success,4000)
对账失败 (order-2,app,2000) 第三方差数据