文章目录


1.时间属性 Time

基于时间的操作(比如时间窗口),需要定义相关的时间语义和时间数据来源的信息。在Table API 和 SQL 中,会给表单独提供一个逻辑上的时间字段,专门用来在表处理程序中指示时间。

所以所谓的​​时间属性​​​(time attributes),其实就是每个表模式结构(schema)的一部分。它可以在创建表的 DDL 里直接定义为一个字段,也可以在 DataStream 转换成表时定义。一旦定义了时间属性,它就可以作为一个普通字段引用,并且可以在基于时间的操作中使用。
时间属性的数据类型为 TIMESTAMP,它的行为类似于常规时间戳,可以直接访问并且进行计算。、

按照时间语义的不同,我们可以把时间属性的定义分成​​事件时间​​​(event time)和​​处理时间​​(processing time)两种情况。

1.1 事件时间

事件时间语义最大的用途就是处理乱序事件或者延迟事件的场景。我们通过设置水位线(watermark)来表示事件时间的进展,而水位线可以根据数据的最大时间戳设置一个延迟时间。这样即使在出现乱序的情况下,对数据的处理也可以获得正确的结果。

在创建表的 DDL 中定义

CREATE TABLE EventTable(
user STRING,
url STRING,
ts TIMESTAMP(3),
WATERMARK FOR ts AS ts - INTERVAL '5' SECOND
) WITH (
...
);

在数据流转换为表时定义

// 方法一:
// 流中数据类型为二元组 Tuple2,包含两个字段;需要自定义提取时间戳并生成水位线
DataStream<Tuple2<String, String>> stream =
inputStream.assignTimestampsAndWatermarks(...);
// 声明一个额外的逻辑字段作为事件时间属性
Table table = tEnv.fromDataStream(stream, $("user"), $("url"),
$("ts").rowtime());

// 方法二:
// 流中数据类型为三元组 Tuple3,最后一个字段就是事件时间戳
DataStream<Tuple3<String, String, Long>> stream =
inputStream.assignTimestampsAndWatermarks(...);
// 不再声明额外字段,直接用最后一个字段作为事件时间属性
Table table = tEnv.fromDataStream(stream, $("user"), $("url"),
$("ts").rowtime());
public class TimeAndWindowTest01 {
public static void main(String[] args){
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);

StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(environment);
// DDL 中定义时间属性
String createDDL = "create table clickTable (" +
"user_name STRING, " +
"url STRING, " +
"ts BIGINT, " +
"et AS TO_TIMESTAMP( FROM_UNIXTIME(ts / 1000))," +
"WATERMARK FOR et AS et - INTERVAL '1' SECOND" +
") WITH (" +
" 'connector' = 'filesystem'," +
" 'path' = 'input/clicks.txt'," +
" 'format' = 'csv')";
tableEnvironment.executeSql(createDDL);
// 流转为table时候定义时间属性
SingleOutputStreamOperator<Event> stream = environment.addSource(new ClickSource())
.assignTimestampsAndWatermarks(WatermarkStrategy.<Event>forBoundedOutOfOrderness(Duration.ZERO)
.withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
@Override
public long extractTimestamp(Event event, long l) {
return event.timestamp;
}
})
);
// 事件时间
Table table = tableEnvironment.fromDataStream(stream, $("user"), $("url"), $("timestamp").as("ts"),
$("et").rowtime());
tableEnvironment.toDataStream(table).print();
table.printSchema();
environment.execute();
}
}

根据时间语义打印出的 信息

Flink_Sql和Table Api_2_时间戳

1.2 处理时间

CREATE TABLE EventTable(
user STRING,
url STRING,
ts AS PROCTIME()
) WITH (
...
);
Table table = tEnv.fromDataStream(stream, $("user"), $("url"), 
$("ts").proctime());

2.窗口 Window

  • 分组窗口(Group Window,老版本)
  • 窗口表值函数(Windowing TVFs,新版本)
    滚动窗口, Tumbling Windows
    滑动窗口, Hop Windows
    累计窗口, Cumulate Windows
    会话窗口, Session Windows

累计窗口
Flink_Sql和Table Api_2_时间戳_02

3.聚合查询

流处理中特有的聚合(主要指窗口聚合),以及 SQL 原生的聚合查询方式。

  • 分组聚合
  • 开窗聚合
  • 窗口聚合

3.1 分组聚合

Table table1 = tableEnvironment.sqlQuery("select user_name, count(1) from clickTable group by user_name");

3.2 开窗聚合

分组+开窗聚合

Table table2 = tableEnvironment.sqlQuery("select " +
"user_name, count(1) as cnt," +
"TUMBLE_END(et, INTERVAL '10' SECOND) AS entT " +
"from clickTable " +
"group by user_name, " +
"TUMBLE(et, INTERVAL '10' SECOND)");

3.3 窗口聚合

  • 滚动窗口聚合
  • 滑动窗口
  • 累计窗口
// 窗口聚合: 滚动窗口聚合
Table tumbleWindowResultTable = tableEnvironment.sqlQuery("select user_name, count(1) as cnt," +
"window_end as entT " +
"from Table(" +
"TUMBLE(TABLE clickTable, DESCRIPTOR(et), INTERVAL '10' SECOND)" +
")" +
"GROUP BY user_name, window_end, window_start");
// 滑动窗口
Table hopWindowResultTable = tableEnvironment.sqlQuery("select user_name, count(1) as cnt," +
"window_end as entT " +
"from Table(" +
"HOP(TABLE clickTable, DESCRIPTOR(et), INTERVAL '5' SECOND,INTERVAL '10' SECOND)" +
")" +
"GROUP BY user_name, window_end, window_start");

4.TopN例子

4.1 普通Top2

1.普通TopN: toChangelogStream

public class TopNExample {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);

StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(environment);
// DDL 中定义时间属性
String createDDL = "create table clickTable (" +
"`user` STRING, " +
"url STRING, " +
"ts BIGINT, " +
"et AS TO_TIMESTAMP( FROM_UNIXTIME(ts / 1000))," +
"WATERMARK FOR et AS et - INTERVAL '1' SECOND" +
") WITH (" +
" 'connector' = 'filesystem'," +
" 'path' = 'input/clicks.txt'," +
" 'format' = 'csv')";
tableEnvironment.executeSql(createDDL);
Table topNTable = tableEnvironment.sqlQuery("SELECT user, cnt, row_num " +
"from (" +
" select *, ROW_NUMBER() OVER (" +
" ORDER BY cnt DESC" +
") AS row_num FROM " +
"( SELECT user, COUNT(url) AS cnt FROM clickTable GROUP BY user)" +
") WHERE row_num <= 2");
tableEnvironment.toChangelogStream(topNTable).print();
environment.execute();
}
}

Top2
Flink_Sql和Table Api_2_字段_03

4.2 窗口Top2

2.窗口TopN: toDataStream

public class TopNExample {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);

StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(environment);
// DDL 中定义时间属性
String createDDL = "create table clickTable (" +
"`user` STRING, " +
"url STRING, " +
"ts BIGINT, " +
"et AS TO_TIMESTAMP( FROM_UNIXTIME(ts / 1000))," +
"WATERMARK FOR et AS et - INTERVAL '1' SECOND" +
") WITH (" +
" 'connector' = 'filesystem'," +
" 'path' = 'input/clicks.txt'," +
" 'format' = 'csv')";
tableEnvironment.executeSql(createDDL);
String subQuery = " SELECT user, COUNT(url) AS cnt, window_start, window_end" +
" FROM TABLE (" +
" TUMBLE(TABLE clickTable, DESCRIPTOR(et), INTERVAL '10' SECOND)" +
")" +
"GROUP BY user, window_start, window_end";
Table windowTopNTable = tableEnvironment.sqlQuery("SELECT user, cnt, row_num, window_end " +
"from (" +
" select *, ROW_NUMBER() OVER (" +
" PARTITION BY window_start, window_end " +
" ORDER BY cnt DESC" +
") AS row_num FROM " +
"(" + subQuery + ")" +
") WHERE row_num <= 2");
//tableEnvironment.toChangelogStream(topNTable).print();

tableEnvironment.toDataStream(windowTopNTable).print();

environment.execute();
}
}

Top2:

Flink_Sql和Table Api_2_字段_04