EventTime:
是事件创建的时间。它通常由事件中的时间戳描述,例如采集的日志数据中,每一条日志都会记录自己的生成时间,Flink 通过时间戳分配器访问事件时间戳。例如:点击网站上的某个链接的时间
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
IngestionTime:
某个Flink节点的source operator接收到数据的时间,例如:某个source消费到kafka中的数据
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)
ProcessingTime:
是每一个执行基于时间操作的算子的本地系统时间,与机器相关,默认的时间属性就是 Processing Time。例如:timeWindow接收到数据的时间
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
package flink.chapter5WaterMark
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
object WaterMark_Demo {
def main(args: Array[String]): Unit = {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//参数当中的TimeCharacteristic选择第二个
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
}
}
}
水印机制窗口 代码 (与TimeWindow不一样,不要混淆) 左闭右开区间[a,2000)
滚动 .window(TumblingEventTimeWindows.of(Time.seconds(5)))
滑动 .window(SlidingEventTimeWindows.of(Time.seconds(10),Time.seconds(5)))
会话 .window(EventTimeSessionWindows.withGap(Time.seconds(5)))
水印机制滚动窗口案例
package flink.chapter5WaterMark
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala.{DataStream, KeyedStream, StreamExecutionEnvironment, WindowedStream}
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import scala.collection.mutable
object TumblingWindow_Demo {
def main(args: Array[String]): Unit = {
//构建流处理程序入口
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//设置并行度
env.setParallelism(1)
//指定时间类型
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//接收数据
val data: DataStream[String] = env.socketTextStream("hadoop101",9999)
//针对数据进行操作
val file: DataStream[(String, Long, Int)] = data.map(text => {
val arr: Array[String] = text.split(" ")
(arr(0), arr(1).toLong, 1)
})
//设置水印机制(waterMark),小括号中参数为水印延迟时间
val fileDataStream: DataStream[(String, Long, Int)] = file.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[(String, Long, Int)](Time.seconds(2)) {
//获取传入数据的时间戳
override def extractTimestamp(t: (String, Long, Int)): Long = {
return t._2
}
})
//针对相同的key进行分流
val keyData: KeyedStream[(String, Long, Int), Tuple] = fileDataStream.keyBy(0)
//打印输出
keyData.print("keyed:")
//设置窗口
val window: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = keyData.window(TumblingEventTimeWindows.of(Time.seconds(2)))
//收集结果
val result: DataStream[mutable.HashSet[Long]] = window.fold(new mutable.HashSet[Long]()) {
case (set, (word, ts, count)) => set += ts
}
//打印输出
result.print("window::")
env.execute()
}
}
水印机制滑动窗口案例
package flink.chapter5WaterMark
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala.{DataStream, KeyedStream, StreamExecutionEnvironment, WindowedStream}
import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import scala.collection.mutable
object SlidingWindow_Demo {
def main(args: Array[String]): Unit = {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//设置并行度
env.setParallelism(1)
//设置使用时间
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//加载数据
val data: DataStream[String] = env.socketTextStream("hadoop101",9999)
//对数据进行操作
val file: DataStream[(String, Long, Int)] = data.map(text => {
val arr: Array[String] = text.split(" ")
(arr(0), arr(1).toLong, 1)
})
//设置水印机制,小括号当中设置的是水印延迟时间
val fileDataStream: DataStream[(String, Long, Int)] = file.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[(String, Long, Int)](Time.seconds(2)) {
//获取传入数据的时间戳
override def extractTimestamp(t: (String, Long, Int)): Long = {
return t._2
}
})
//按照key值进行分流
val keyed: KeyedStream[(String, Long, Int), Tuple] = fileDataStream.keyBy(0)
//打印输出传入的数据
keyed.print("keyed:")
//指定窗口大小和类型
val windows: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = keyed.window(SlidingEventTimeWindows.of(Time.seconds(2),Time.seconds(2)))
//收集时间戳
val result: DataStream[mutable.HashSet[Long]] = windows.fold(new mutable.HashSet[Long]()) {
case (set, (word, ts, count)) => set += ts
}
//打印输出收集的时间戳
result.print("windows:::")
//调用execute方法
env.execute()
}
}
水印机制会话窗口案例
相邻两次数据的 EventTime 的时间差超过指定的时间间隔就会触发执行。如果加入 Watermark, 会在符合窗口触发的情况下进行延迟。到达延迟水位再进行窗口触发。
package flink.chapter5WaterMark
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala.{DataStream, KeyedStream, StreamExecutionEnvironment, WindowedStream}
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
object SessionWindow_Demo {
def main(args: Array[String]): Unit = {
//构建flink流处理执行环境
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
//设置并行度
env.setParallelism(1)
//设置使用时间类型
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//接收数据
val data: DataStream[String] = env.socketTextStream("hadoop101",9999)
//对数据进行操作
val file: DataStream[(String, Long, Int)] = data.map(text => {
val arr: Array[String] = text.split(" ")
(arr(0), arr(1).toLong, 1)
})
//设置水印机制
val fileDataStream: DataStream[(String, Long, Int)] = file.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[(String, Long, Int)](Time.seconds(2)) {
override def extractTimestamp(t: (String, Long, Int)): Long = {
return t._2
}
})
//分流
val keyed: KeyedStream[(String, Long, Int), Tuple] = fileDataStream.keyBy(0)
//打印输出传入的数据
keyed.print("keyed:")
//设置会话窗口间隔
val window: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = keyed.window(EventTimeSessionWindows.withGap(Time.seconds(2)))
//统计出现的次数,0L占位时间戳
window.reduce((text1,text2)=>{
(text1._1,0L,text1._3+text2._3)
}).map(_._3).print("window::")
//调用execute
env.execute()
}
}
间隔超过两秒,为一个会话窗口,延时2秒执行[左闭右开区间).