- Windows Function
- ReduceFunction
- AggregateFunction
- ProcessWindowFunction
- ProcessWindowFunction with Incremental Aggregation(处理窗口函数和增加聚合函数结合)
- Incremental Window Aggregation with ReduceFunction(ReduceFuntione 结合 增长聚合窗口)
- Incremental Window Aggregation with AggregateFunction(AggregateFunction 结合增长聚合窗口)
- Using per-window state in ProcessWindowFunction
- WindowFunction (Legacy)
- Keyed Windows
Windows Function
val input: DataStream[(String, Long)] = ...
.keyBy(<key selector>)
.window(<window assigner>)
.reduce { (v1, v2) => (v1._1, v1._2 + v2._2) }
* The accumulator is used to keep a running sum and a count. The [getResult] method
* computes the average.
class AverageAggregate extends AggregateFunction[(String, Long), (Long, Long), Double] {
override def createAccumulator() = (0L, 0L)
override def add(value: (String, Long), accumulator: (Long, Long)) =
(accumulator._1 + value._2, accumulator._2 + 1L)
override def getResult(accumulator: (Long, Long)) = accumulator._1 / accumulator._2
override def merge(a: (Long, Long), b: (Long, Long)) =
(a._1 + b._1, a._2 + b._2)
val input: DataStream[(String, Long)] = ...
.keyBy(<key selector>)
.window(<window assigner>)
.aggregate(new AverageAggregate)
ProcessWindowFunction获得一个包含所有窗口元素的Iterable,以及一个能够访问时间和状态信息的Context 上下文对象的接口,这使它能够比其他窗口函数提供更多的灵活性。这是以性能和资源消耗为代价的,因为元素不能增量地聚合,而是需要在内部缓冲,直到窗口被认为可以处理为止。
abstract class ProcessWindowFunction[IN, OUT, KEY, W <: Window] extends Function {
* Evaluates the window and outputs none or several elements.
* @param key The key for which this window is evaluated.
* @param context The context in which the window is being evaluated.
* @param elements The elements in the window being evaluated.
* @param out A collector for emitting elements.
* @throws Exception The function may throw exceptions to fail the program and trigger recovery.
def process(
key: KEY,
context: Context,
elements: Iterable[IN],
out: Collector[OUT])
* The context holding window metadata
abstract class Context {
* Returns the window that is being evaluated.
* 返回正在计算的窗口。
def window: W
* Returns the current processing time.
* 返回当前处理时间。
def currentProcessingTime: Long
* Returns the current event-time watermark.
* 返回当前事件时间水印
def currentWatermark: Long
* State accessor for per-key and per-window state.
* 每个键全局状态的状态访问器。
def windowState: KeyedStateStore
* State accessor for per-key global state.
def globalState: KeyedStateStore
val input: DataStream[(String, Long)] = ...
.process(new MyProcessWindowFunction())
/* ... */
class MyProcessWindowFunction extends ProcessWindowFunction[(String, Long), String, String, TimeWindow] {
def process(key: String, context: Context, input: Iterable[(String, Long)], out: Collector[String]) = {
var count = 0L
for (in <- input) {
count = count + 1
out.collect(s"Window ${context.window} count: $count")
ProcessWindowFunction with Incremental Aggregation(处理窗口函数和增加聚合函数结合)
注意, 您也可以使用以前的WindowFunction代替ProcessWindowFunction来进行增量窗口聚合。 但是windowFunction 没有Context 不能截取上下文
Incremental Window Aggregation with ReduceFunction(ReduceFuntione 结合 增长聚合窗口)
val input: DataStream[SensorReading] = ...
.keyBy(<key selector>)
.window(<window assigner>)
(r1: SensorReading, r2: SensorReading) => { if (r1.value > r2.value) r2 else r1 },
( key: String,
context: ProcessWindowFunction[_, _, _, TimeWindow]#Context,
minReadings: Iterable[SensorReading],
out: Collector[(Long, SensorReading)] ) =>
val min = minReadings.iterator.next()
out.collect((context.window.getStart, min))
Incremental Window Aggregation with AggregateFunction(AggregateFunction 结合增长聚合窗口)
这里的AggregateFunction 使用org.apache.flink.api.common.functions.AggregateFunction 下的如果没有此方法可以在pom 文件中添加
val input: DataStream[(String, Long)] = ...
.keyBy(<key selector>)
.window(<window assigner>)
.aggregate(new AverageAggregate(), new MyProcessWindowFunction())
// Function definitions
* The accumulator is used to keep a running sum and a count. The [getResult] method
* computes the average.
class AverageAggregate extends AggregateFunction[(String, Long), (Long, Long), Double] {
override def createAccumulator() = (0L, 0L)
override def add(value: (String, Long), accumulator: (Long, Long)) =
(accumulator._1 + value._2, accumulator._2 + 1L)
override def getResult(accumulator: (Long, Long)) = accumulator._1 / accumulator._2
override def merge(a: (Long, Long), b: (Long, Long)) =
(a._1 + b._1, a._2 + b._2)
class MyProcessWindowFunction extends ProcessWindowFunction[Double, (String, Double), String, TimeWindow] {
def process(key: String, context: Context, averages: Iterable[Double], out: Collector[(String, Double)]) = {
val average = averages.iterator.next()
out.collect((key, average))
Using per-window state in ProcessWindowFunction
除了访问键控状态(任何rich function都可以)之外,ProcessWindowFunction还可以使用作用域为函数当前正在处理的窗口的键控状态。在这种情况下,理解每个窗口状态所指的是什么窗口是很重要的。这里有不同的“窗口”:
- 当指定窗口操作时定义的窗口:这可能是滚动1小时的窗口或滑动2小时的窗口,滑动1小时。
- 给定键的定义窗口的实际实例:这可能是用户id xyz从12:00到13:00的时间窗口。这是基于窗口定义的,并且会有许多窗口是基于作业当前正在处理的键的数量,以及基于事件所处的时间段。
- globalState(),它允许访问不在窗口作用域内的键控状态
- windowState(),它允许访问同样作用域为窗口的键控状态
WindowFunction (Legacy)
trait WindowFunction[IN, OUT, KEY, W <: Window] extends Function with Serializable {
* Evaluates the window and outputs none or several elements.
* @param key The key for which this window is evaluated.
* @param window The window that is being evaluated.
* @param input The elements in the window being evaluated.
* @param out A collector for emitting elements.
* @throws Exception The function may throw exceptions to fail the program and trigger recovery.
def apply(key: KEY, window: W, input: Iterable[IN], out: Collector[OUT])
val input: DataStream[(String, Long)] = ...
.keyBy(<key selector>)
.window(<window assigner>)
.apply(new MyWindowFunction())
Keyed Windows
.keyBy(…) <- keyed versus non-keyed windows
.window(…) <- required: “assigner”
[.trigger(…)] <- optional: “trigger” (else default trigger)
[.evictor(…)] <- optional: “evictor” (else no evictor)
[.allowedLateness(…)] <- optional: “lateness” (else zero)
[.sideOutputLateData(…)] <- optional: “output tag” (else no side output for late data)
.reduce/aggregate/fold/apply() <- required: “function”
[.getSideOutput(…)] <- optional: “output tag”