目录

  • 一、inner join
  • 二、sliding-inner-join
  • 三、session-inner-join
  • 四、left-join
  • 五、interval-join



一、inner join

        两个流中的数据,通过join连接,在通过where和equalsTo条件判断后,条件成立并且处在同一个窗口内的数据会触发后续的窗口操作。

1.开启nc

开启两个端口,模拟两个数据来源

nc -lp 8888
nc -lp 8899

2.示例

@Test
    public void joinTumblingTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //数据流2
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //连接两个数据流
        stream1.join(stream2)
                //第一个流的条件
                .where(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //第二个流的条件
                .equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //滚动窗口,时间间隔10毫秒
                .window(TumblingEventTimeWindows.of(Time.milliseconds(10)))
                .apply(new JoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String, Integer, Integer>>() {
                    @Override
                    public Tuple3<String, Integer, Integer> join(Tuple2<String, Integer> first, Tuple2<String, Integer> second) throws Exception {
                        return new Tuple3<>(first.f0, first.f1, second.f1);
                    }
                })
                .print("join");
        env.execute("joinTumblingTest");
    }

3.测试

数据流1

a,1
a,5
b,6
a,10

数据流2

a,7
a,8
a,11

由于滑动窗口设置的时间间隔是10毫秒,当窗口关闭的时候,处在0~10毫秒内的数据会触发join操作

结果

join> (a,1,7)
join> (a,1,8)
join> (a,5,7)
join> (a,5,8)

flinksql 流表hive维表时态表关联 include flink流表互相join_滑动窗口

二、sliding-inner-join

        下面测试滑动窗口的内连接

1.示例

滑动窗口时间间隔是4毫秒,滑动间隔是2毫秒

@Test
    public void joinSlidingTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //数据流2
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //连接两个数据流
        stream1.join(stream2)
                //第一个流的条件
                .where(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //第二个流的条件
                .equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //滚动窗口,时间间隔10毫秒
                .window(SlidingEventTimeWindows.of(Time.milliseconds(4),Time.milliseconds(2)))
                .apply(new JoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String, Integer, Integer>>() {
                    @Override
                    public Tuple3<String, Integer, Integer> join(Tuple2<String, Integer> first, Tuple2<String, Integer> second) throws Exception {
                        return new Tuple3<>(first.f0, first.f1, second.f1);
                    }
                })
                .print("sliding-inner-join");
        env.execute("joinSlidingTest");
    }

2.测试

数据流1输入 a,2 和 a,4
数据流2输入a,3 和 a,4

当达到滑动窗口时间间隔4毫秒时触发join连接,打印出 sliding-inner-join> (a,2,3)

flinksql 流表hive维表时态表关联 include flink流表互相join_ide_02


接着

数据流1输入 a,5 和 a,6

数据流2输入a,5 和 a,6

此时到达滑动间隔2毫秒,则4~6之间的数据会触发join操作

flinksql 流表hive维表时态表关联 include flink流表互相join_ide_03

三、session-inner-join

        下面测试会话窗口的内连接

1.示例

会话窗口,时间间隔10毫秒

@Test
    public void joinSessionTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //数据流2
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //连接两个数据流
        stream1.join(stream2)
                //第一个流的条件
                .where(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //第二个流的条件
                .equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //会话窗口,时间间隔10毫秒
                .window(EventTimeSessionWindows.withGap(Time.milliseconds(10)))
                .apply(new JoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String, Integer, Integer>>() {
                    @Override
                    public Tuple3<String, Integer, Integer> join(Tuple2<String, Integer> first, Tuple2<String, Integer> second) throws Exception {
                        return new Tuple3<>(first.f0, first.f1, second.f1);
                    }
                })
                .print("session-inner-join");
        env.execute("joinSessionTest");
    }

2.测试

flinksql 流表hive维表时态表关联 include flink流表互相join_flink_04

数据流1输入 a,3 和 a,14
数据流2输入a,5 和 a,20

四、left-join

两个数据流左连接

1.示例

@Test
    public void leftJoinTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        return null;
                    }
                });
        //数据流2
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        return null;
                    }
                });
        //连接两个数据流
        stream1.coGroup(stream2)
                //第一个流的条件
                .where(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //第二个流的条件
                .equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //滚动窗口,时间间隔10毫秒
                .window(TumblingEventTimeWindows.of(Time.milliseconds(10)))
                .apply(new CoGroupFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String, Integer, Integer>>() {
                    @Override
                    public void coGroup(Iterable<Tuple2<String, Integer>> first, Iterable<Tuple2<String, Integer>> second, Collector<Tuple3<String, Integer, Integer>> out) throws Exception {
                        //左连接
                        for (Tuple2<String, Integer> left : first) {
                            boolean isJoin = false;
                            for (Tuple2<String, Integer> right : second) {
                                isJoin = true;
                                out.collect(new Tuple3<>(left.f0, left.f1, right.f1));
                            }
                            //右侧没有数据
                            if (!isJoin) {
                                out.collect(new Tuple3<>(left.f0, left.f1, null));
                            }
                        }

                    }
                })
                .print("left join");
        env.execute("coGroupTest");
    }

2.测试

当数据时间间隔大于10毫秒时,进行左连接输出

flinksql 流表hive维表时态表关联 include flink流表互相join_时间间隔_05

五、interval-join

进行连接的两个流a和b,如果满足
b.timestamp ∈ [a.timestamp + lowerBound; a.timestamp + upperBound]
或者
a.timestamp + lowerBound <= b.timestamp <= a.timestamp + upperBound
即b的时间戳位于a的时间戳的下限和上限的范围内,可以触发join操作。

1.示例

@Test
    public void intervalJoinTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        KeyedStream<Tuple2<String, Integer>, String> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1))
                .keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                });
        //数据流2
        KeyedStream<Tuple2<String, Integer>, String> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1))
                .keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                });
        //连接两个数据流
        stream1.intervalJoin(stream2)
                //事件时间
                .inEventTime()
                //定义上下界
                .between(Time.milliseconds(-2), Time.milliseconds(2))
                //不包含下界
                .lowerBoundExclusive()
                .process(new ProcessJoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String,Integer, Integer>>() {
                    @Override
                    public void processElement(Tuple2<String, Integer> left, Tuple2<String, Integer> right, Context ctx, Collector<Tuple3<String, Integer, Integer>> out) throws Exception {
                        out.collect(new Tuple3<>(left.f0, left.f1, right.f1));
                    }
                })
                .print("interval-join");
        env.execute("intervalJoinTest");
    }

2.测试

数据流1输入

a,5

数据,流2输入

a,3
a,4
a,6
a,7
a,8

根据定义的上下界是 -2 和 2,数据流2中数据位于 5-2 和 5+2 之间的数据会进行join操作

flinksql 流表hive维表时态表关联 include flink流表互相join_flink_06