Flink_Sql和Table Api_2
原创
©著作权归作者所有:来自51CTO博客作者959_1x的原创作品,请联系作者获取转载授权,否则将追究法律责任
文章目录
1.时间属性 Time
基于时间的操作(比如时间窗口),需要定义相关的时间语义和时间数据来源的信息。在Table API 和 SQL 中,会给表单独提供一个逻辑上的时间字段,专门用来在表处理程序中指示时间。
所以所谓的时间属性(time attributes),其实就是每个表模式结构(schema)的一部分。它可以在创建表的 DDL 里直接定义为一个字段,也可以在 DataStream 转换成表时定义。一旦定义了时间属性,它就可以作为一个普通字段引用,并且可以在基于时间的操作中使用。
时间属性的数据类型为 TIMESTAMP,它的行为类似于常规时间戳,可以直接访问并且进行计算。、
按照时间语义的不同,我们可以把时间属性的定义分成事件时间(event time)和处理时间(processing time)两种情况。
1.1 事件时间
事件时间语义最大的用途就是处理乱序事件或者延迟事件的场景。我们通过设置水位线(watermark)来表示事件时间的进展,而水位线可以根据数据的最大时间戳设置一个延迟时间。这样即使在出现乱序的情况下,对数据的处理也可以获得正确的结果。
在创建表的 DDL 中定义
CREATE TABLE EventTable(
user STRING,
url STRING,
ts TIMESTAMP(3),
WATERMARK FOR ts AS ts - INTERVAL '5' SECOND
) WITH (
...
);
在数据流转换为表时定义
// 方法一:
// 流中数据类型为二元组 Tuple2,包含两个字段;需要自定义提取时间戳并生成水位线
DataStream<Tuple2<String, String>> stream =
inputStream.assignTimestampsAndWatermarks(...);
// 声明一个额外的逻辑字段作为事件时间属性
Table table = tEnv.fromDataStream(stream, $("user"), $("url"),
$("ts").rowtime());
// 方法二:
// 流中数据类型为三元组 Tuple3,最后一个字段就是事件时间戳
DataStream<Tuple3<String, String, Long>> stream =
inputStream.assignTimestampsAndWatermarks(...);
// 不再声明额外字段,直接用最后一个字段作为事件时间属性
Table table = tEnv.fromDataStream(stream, $("user"), $("url"),
$("ts").rowtime());
public class TimeAndWindowTest01 {
public static void main(String[] args){
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);
StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(environment);
// DDL 中定义时间属性
String createDDL = "create table clickTable (" +
"user_name STRING, " +
"url STRING, " +
"ts BIGINT, " +
"et AS TO_TIMESTAMP( FROM_UNIXTIME(ts / 1000))," +
"WATERMARK FOR et AS et - INTERVAL '1' SECOND" +
") WITH (" +
" 'connector' = 'filesystem'," +
" 'path' = 'input/clicks.txt'," +
" 'format' = 'csv')";
tableEnvironment.executeSql(createDDL);
// 流转为table时候定义时间属性
SingleOutputStreamOperator<Event> stream = environment.addSource(new ClickSource())
.assignTimestampsAndWatermarks(WatermarkStrategy.<Event>forBoundedOutOfOrderness(Duration.ZERO)
.withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
@Override
public long extractTimestamp(Event event, long l) {
return event.timestamp;
}
})
);
// 事件时间
Table table = tableEnvironment.fromDataStream(stream, $("user"), $("url"), $("timestamp").as("ts"),
$("et").rowtime());
tableEnvironment.toDataStream(table).print();
table.printSchema();
environment.execute();
}
}根据时间语义打印出的 信息

1.2 处理时间
CREATE TABLE EventTable(
user STRING,
url STRING,
ts AS PROCTIME()
) WITH (
...
);
Table table = tEnv.fromDataStream(stream, $("user"), $("url"),
$("ts").proctime());
2.窗口 Window
- 分组窗口(Group Window,老版本)
- 窗口表值函数(Windowing TVFs,新版本)
滚动窗口, Tumbling Windows
滑动窗口, Hop Windows
累计窗口, Cumulate Windows
会话窗口, Session Windows
累计窗口

3.聚合查询
流处理中特有的聚合(主要指窗口聚合),以及 SQL 原生的聚合查询方式。
3.1 分组聚合
Table table1 = tableEnvironment.sqlQuery("select user_name, count(1) from clickTable group by user_name");
3.2 开窗聚合
分组+开窗聚合
Table table2 = tableEnvironment.sqlQuery("select " +
"user_name, count(1) as cnt," +
"TUMBLE_END(et, INTERVAL '10' SECOND) AS entT " +
"from clickTable " +
"group by user_name, " +
"TUMBLE(et, INTERVAL '10' SECOND)");
3.3 窗口聚合
// 窗口聚合: 滚动窗口聚合
Table tumbleWindowResultTable = tableEnvironment.sqlQuery("select user_name, count(1) as cnt," +
"window_end as entT " +
"from Table(" +
"TUMBLE(TABLE clickTable, DESCRIPTOR(et), INTERVAL '10' SECOND)" +
")" +
"GROUP BY user_name, window_end, window_start");
// 滑动窗口
Table hopWindowResultTable = tableEnvironment.sqlQuery("select user_name, count(1) as cnt," +
"window_end as entT " +
"from Table(" +
"HOP(TABLE clickTable, DESCRIPTOR(et), INTERVAL '5' SECOND,INTERVAL '10' SECOND)" +
")" +
"GROUP BY user_name, window_end, window_start");
4.TopN例子
4.1 普通Top2
1.普通TopN: toChangelogStream
public class TopNExample {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);
StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(environment);
// DDL 中定义时间属性
String createDDL = "create table clickTable (" +
"`user` STRING, " +
"url STRING, " +
"ts BIGINT, " +
"et AS TO_TIMESTAMP( FROM_UNIXTIME(ts / 1000))," +
"WATERMARK FOR et AS et - INTERVAL '1' SECOND" +
") WITH (" +
" 'connector' = 'filesystem'," +
" 'path' = 'input/clicks.txt'," +
" 'format' = 'csv')";
tableEnvironment.executeSql(createDDL);
Table topNTable = tableEnvironment.sqlQuery("SELECT user, cnt, row_num " +
"from (" +
" select *, ROW_NUMBER() OVER (" +
" ORDER BY cnt DESC" +
") AS row_num FROM " +
"( SELECT user, COUNT(url) AS cnt FROM clickTable GROUP BY user)" +
") WHERE row_num <= 2");
tableEnvironment.toChangelogStream(topNTable).print();
environment.execute();
}
}Top2

4.2 窗口Top2
2.窗口TopN: toDataStream
public class TopNExample {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);
StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(environment);
// DDL 中定义时间属性
String createDDL = "create table clickTable (" +
"`user` STRING, " +
"url STRING, " +
"ts BIGINT, " +
"et AS TO_TIMESTAMP( FROM_UNIXTIME(ts / 1000))," +
"WATERMARK FOR et AS et - INTERVAL '1' SECOND" +
") WITH (" +
" 'connector' = 'filesystem'," +
" 'path' = 'input/clicks.txt'," +
" 'format' = 'csv')";
tableEnvironment.executeSql(createDDL);
String subQuery = " SELECT user, COUNT(url) AS cnt, window_start, window_end" +
" FROM TABLE (" +
" TUMBLE(TABLE clickTable, DESCRIPTOR(et), INTERVAL '10' SECOND)" +
")" +
"GROUP BY user, window_start, window_end";
Table windowTopNTable = tableEnvironment.sqlQuery("SELECT user, cnt, row_num, window_end " +
"from (" +
" select *, ROW_NUMBER() OVER (" +
" PARTITION BY window_start, window_end " +
" ORDER BY cnt DESC" +
") AS row_num FROM " +
"(" + subQuery + ")" +
") WHERE row_num <= 2");
//tableEnvironment.toChangelogStream(topNTable).print();
tableEnvironment.toDataStream(windowTopNTable).print();
environment.execute();
}
}Top2:
