目录

1.写在前面

2.窗口(Windows)

2.1 分组窗口(Group Windows)

2.1.1 滚动窗口

2.1.2 滑动窗口

2.1.3 会话窗口

 2.2 Over Windows

2.2.1 无界 Over Windows

2.2.2 有界的 over window

2.3 SQL 中窗口的定义

2.3.1 Group Windows

2.3.2 Over Windows

3.代码总结

4.代码练习(以分组滚动窗口为例)


1.写在前面

时间语义,要配合窗口操作才能发挥作用。最主要的用途,当然就是开窗口、根据时间段做计算了。下面我们就来看看 Table API 和 SQL 中,怎么利用时间字段做窗口操作。 在 Table API 和 SQL 中,主要有两种窗口:Group WindowsOver Windows

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_flink

2.窗口(Windows)

2.1 分组窗口(Group Windows)

        分组窗口(Group Windows)会根据时间或行计数间隔,将行聚合到有限的组(Group) 中,并对每个组的数据执行一次聚合函数。Table API 中的 Group Windows 都是使用.window(w:GroupWindow)子句定义的,并且必须由 as 子句指定一个别名。为了按窗口对表进行分组,窗口的别名必须在 group by 子句 中,像常规的分组字段一样引用。

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_Windows_02

2.1.1 滚动窗口

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_Windows_03

        代码如下: 

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_flink_04

2.1.2 滑动窗口

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_flink_05

        代码如下:

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_flink_06

2.1.3 会话窗口

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_apache_07

        代码如下:

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_apache_08

 2.2 Over Windows

        可以在事件时间或处理时间,以及指定为时间间隔、或行计数的范围内,定义 Over windows。 无界的over window是使用常量指定的。也就是说,时间间隔要指定UNBOUNDED_RANGE, 或者行计数间隔要指定 UNBOUNDED_ROW。而有界的 over window 是用间隔的大小指定的。

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_apache_09

2.2.1 无界 Over Windows

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_apache_10

2.2.2 有界的 over window

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_Windows_11

2.3 SQL 中窗口的定义

        我们已经了解了在 Table API 里 window 的调用方式,同样,我们也可以在 SQL 中直接加入窗口的定义和使用。

2.3.1 Group Windows

        Group Windows 在 SQL 查询的 Group BY 子句中定义。与使用常规 GROUP BY 子句的查询 一样,使用 GROUP BY 子句的查询会计算每个组的单个结果行。

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_flink_12

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_flink_13

2.3.2 Over Windows

        由于 Over 本来就是 SQL 内置支持的语法,所以这在 SQL 中属于基本的聚合操作。所有聚合必须在同一窗口上定义,也就是说,必须是相同的分区、排序和范围。目前仅支持在当前行范围之前的窗口(无边界和有边界)。 注意,ORDER BY 必须在单一的时间属性上指定。

flink SQL cdc MySQL 数据太大 内存不足 flink sql over_Windows_14

3.代码总结

import com.atguigu.apitest.beans.SensorReading;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.table.api.Over;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.Tumble;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;

public class TableTest5_TimeAndWindow {
    public static void main(String[] args) throws Exception {
        // 1. 创建环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

        // 2. 读入文件数据,得到DataStream
        DataStream<String> inputStream = env.readTextFile("D:\\Projects\\BigData\\FlinkTutorial\\src\\main\\resources\\sensor.txt");

        // 3. 转换成POJO
        DataStream<SensorReading> dataStream = inputStream.map(line -> {
            String[] fields = line.split(",");
            return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
        })
                .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(2)) {
                    @Override
                    public long extractTimestamp(SensorReading element) {
                        return element.getTimestamp() * 1000L;
                    }
                });

        // 4. 将流转换成表,定义时间特性
//        Table dataTable = tableEnv.fromDataStream(dataStream, "id, timestamp as ts, temperature as temp, pt.proctime");
        Table dataTable = tableEnv.fromDataStream(dataStream, "id, timestamp as ts, temperature as temp, rt.rowtime");

        tableEnv.createTemporaryView("sensor", dataTable);

        // 5. 窗口操作
        // 5.1 Group Window
        // table API
        Table resultTable = dataTable.window(Tumble.over("10.seconds").on("rt").as("tw"))
                .groupBy("id, tw")
                .select("id, id.count, temp.avg, tw.end");

        // SQL
        Table resultSqlTable = tableEnv.sqlQuery("select id, count(id) as cnt, avg(temp) as avgTemp, tumble_end(rt, interval '10' second) " +
                "from sensor group by id, tumble(rt, interval '10' second)");

        // 5.2 Over Window
        // table API
        Table overResult = dataTable.window(Over.partitionBy("id").orderBy("rt").preceding("2.rows").as("ow"))
                .select("id, rt, id.count over ow, temp.avg over ow");

        // SQL
        Table overSqlResult = tableEnv.sqlQuery("select id, rt, count(id) over ow, avg(temp) over ow " +
                " from sensor " +
                " window ow as (partition by id order by rt rows between 2 preceding and current row)");

//        dataTable.printSchema();
//        tableEnv.toAppendStream(resultTable, Row.class).print("result");
//        tableEnv.toRetractStream(resultSqlTable, Row.class).print("sql");
        tableEnv.toAppendStream(overResult, Row.class).print("result");
        tableEnv.toRetractStream(overSqlResult, Row.class).print("sql");

        env.execute();
    }
}

4.代码练习(以分组滚动窗口为例)

        我们可以综合学习过的内容,用一段完整的代码实现一个具体的需求。例如,可以开一 个滚动窗口,统计 10 秒内出现的每个 sensor 的个数。

package com.ucas.windows;

import com.ucas.beans.SensorReading;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.Tumble;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
import sun.management.Sensor;

/**
 * @author GONG
 * @version 1.0
 * @date 2021/3/14 23:01
 */
public class WindowDemo {
    public static void main(String[] args) throws Exception {
        /**
         * 注册环境
         */
        EnvironmentSettings mySetting = EnvironmentSettings
                .newInstance()
//                .useOldPlanner()
                .useBlinkPlanner()
                .inStreamingMode()
                .build();
        //1.设置流执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2.设置并行度
        env.setParallelism(1).setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        //3.根据流执行环境创建流转表的环境
        StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(env, mySetting);


        //4.读取文件,得到dataStream
        String filePath = "D:\\IDEA_WORK\\flinkstudy\\src\\main\\resources\\sensor.txt";


        //5.利用流执行环境读取txt
        DataStreamSource<String> inputStream = env.readTextFile(filePath);
        //6.转换成 Java Bean,并指定 timestamp 和 watermark
        SingleOutputStreamOperator<SensorReading> dataStream = inputStream.map(line -> {
            String[] fields = line.split(",");
            return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
        }).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(1)) {
            @Override
            public long extractTimestamp(SensorReading sensorReading) {
                return sensorReading.getTimestamp() * 1000L;
            }
        });

        //7.将 DataStream 转换为 Table
        Table sensorTable = tableEnvironment.fromDataStream(dataStream, "id,timestamp.rowtime as ts,temperature");

        //8.开窗聚合
        Table selectTable = sensorTable.window(Tumble.over("10.seconds").on("ts").as("tw"))
                .groupBy("id,tw")
                .select("id,id.count,temperature.avg,tw.end");
        //9.sql
        tableEnvironment.createTemporaryView("sensor", sensorTable);
        Table resultSqlTable = tableEnvironment.sqlQuery(
                "select id, count(id), avg(temperature), tumble_end(ts, interval '10' second) " + "from sensor group by id, tumble(ts, interval '10' second)");
        //10.转换成流打印输出
        tableEnvironment.toAppendStream(selectTable, Row.class).print("select:");
        tableEnvironment.toAppendStream(resultSqlTable, Row.class).print("sql:");
        //11.开始执行环境
        env.execute("time and window test");
    }
}
package com.ucas.beans;

// 传感器温度读数的数据类型
public class SensorReading {
    // 属性:id,时间戳,温度值
    private String id;
    private Long timestamp;
    private Double temperature;

    public SensorReading() {
    }

    public SensorReading(String id, Long timestamp, Double temperature) {
        this.id = id;
        this.timestamp = timestamp;
        this.temperature = temperature;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public Long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(Long timestamp) {
        this.timestamp = timestamp;
    }

    public Double getTemperature() {
        return temperature;
    }

    public void setTemperature(Double temperature) {
        this.temperature = temperature;
    }

    @Override
    public String toString() {
        return "SensorReading{" +
                "id='" + id + '\'' +
                ", timestamp=" + timestamp +
                ", temperature=" + temperature +
                '}';
    }
}