Managed Operator State
Flink提供了基于keyed stream操作符状态称为keyedstate,对于⼀些⾮keyed stream的操作中使⽤的状态统称为Operator State,如果⽤户希望使Operator State需要实现通⽤的CheckpointedFunction接⼝或者ListCheckpointed。
CheckpointedFunction
CheckpointedFunction
接⼝提供non-keyed state的不同状态分发策略。⽤户在实现该接⼝的时候需要实现以下两个⽅法:
public interface CheckpointedFunction {
void snapshotState(FunctionSnapshotContext context) throws Exception;
void initializeState(FunctionInitializationContext context) throws Exception;
}
- snapshotState:当系统进⾏Checkpoint的时候,系统回调⽤该⽅法,通常⽤户需要将持久化的状态数据存储到状态中。
- initializeState:当第⼀次启动的时候系统⾃动调⽤initializeState,进⾏状态初始化。或者系统在故障恢复的时候进⾏状态的恢复。
当前,Operator State⽀持list-style的Managed State。该状态应为彼此独⽴的可序列化对象的列表,因此
在系统故障恢复的时候才有可能进⾏重新分配。⽬前Flink针对于Operator State分配⽅案有以下两种:
- Even-split redistribution - 每⼀个操作符实例都会保留⼀个List的状态,因此Operator State逻辑上是将该Operator的并⾏实例的所有的List状态拼接成⼀个完成的List State。当 系统在恢复、重新分发状态的时候,系统会根据当前Operator实例并⾏度,对当前的状态进⾏均分。例如,如果在并⾏度为1的情况下,Operator的检查点状态包含元素element1和element2,则在Operator并⾏度提⾼到2时,element1可能会分配给Operator Instance 0,⽽element2将进⼊Operator Instance 1。
- Union redistribution: - 每⼀个操作符实例都会保留⼀个List的状态,因此Operator State逻辑上是将该Operator的并⾏实例的所有的List状态拼接成⼀个完成的List State。在还原/重新分发状态时,每个Operator实例都会获得状态元素的完整列表。
class UserDefineBufferSinkEvenSplit(threshold: Int = 0) extends SinkFunction[(String,
Int)] with CheckpointedFunction{
@transient
private var checkpointedState: ListState[(String, Int)] = _
private val bufferedElements = ListBuffer[(String, Int)]()
//复写写出逻辑
override def invoke(value: (String, Int), context: SinkFunction.Context[_]): Unit =
{
bufferedElements += value
if(bufferedElements.size >= threshold){
for(e <- bufferedElements){
println("元素:"+e)
}
bufferedElements.clear()
}
}
//需要将状态数据存储起来
override def snapshotState(context: FunctionSnapshotContext): Unit = {
checkpointedState.clear()
checkpointedState.update(bufferedElements.asJava)//直接将状态数据存储起来
}
//初始化状态逻辑、状态恢复逻辑
override def initializeState(context: FunctionInitializationContext): Unit = {
//初始化状态、也有可能是故障恢复
val lsd=new ListStateDescriptor[(String, Int)]("liststate",createTypeInformation[(String,Int)])
checkpointedState = context.getOperatorStateStore.getListState(lsd) //默认均分⽅式恢
复
//context.getOperatorStateStore.getUnionListState(lsd) //默认⼴
播⽅式恢复
if(context.isRestored){ //实现故障恢复逻辑
bufferedElements.appendAll(checkpointedState.get().asScala.toList)
}
}
}
object FlinkWordCountValueStateCheckpoint {
def main(args: Array[String]): Unit = {
//1.创建流计算执⾏环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStateBackend(new RocksDBStateBackend("hdfs:///flink-rocksdbcheckpoints",true))
//间隔5s执⾏⼀次checkpoint 精准⼀次
env.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
//设置检查点超时 4s
env.getCheckpointConfig.setCheckpointTimeout(4000)
//开启本次检查点 与上⼀次完成的检查点时间间隔不得⼩于 2s 优先级⾼于 checkpoint interval
env.getCheckpointConfig.setMinPauseBetweenCheckpoints(2000)
//如果检查点失败,任务宣告退出 setFailOnCheckpointingErrors(true)
env.getCheckpointConfig.setTolerableCheckpointFailureNumber(0)
//设置如果任务取消,系统该如何处理检查点数据
//RETAIN_ON_CANCELLATION:如果取消任务的时候,没有加--savepoint,系统会保留检查点数据
//DELETE_ON_CANCELLATION:取消任务,⾃动是删除检查点(不建议使⽤)
env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.R
ETAIN_ON_CANCELLATION)
//2.创建DataStream - 细化
val text = env.socketTextStream("CentOS", 9999)
//3.执⾏DataStream的转换算⼦
val counts = text.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.keyBy(0)
.map(new WordCountMapFunction)
.uid("wc-map")
//4.将计算的结果在控制打印
counts.addSink(new UserDefineBufferSinkEvenSplit(3))
.uid("buffer-sink")
//5.执⾏流计算任务
env.execute("Stream WordCount")
}
}
class WordCountMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
var vs:ValueState[Int]=_
ListCheckpointed
ListCheckpointed接⼝是CheckpointedFunction的更有限的变体写法。因为该接⼝仅仅⽀持list-style state
的Even Split分发策略。
snapshotState:在做系统检查点的时候,⽤户只需要将需要存储的数据返回即可。
restoreState:直接提供给⽤户需要恢复状态。
On snapshotState() the operator should return a list of objects to checkpoint and
restoreState has to handle such a list upon recovery. If the state is not re-partitionable, you can
always return a Collections.singletonList(MY_STATE) in the snapshotState() .
override def open(parameters: Configuration): Unit = {
//1.创建对应状态描述符
val vsd = new ValueStateDescriptor[Int]("wordcount", createTypeInformation[Int])
//2.获取RuntimeContext
var context: RuntimeContext = getRuntimeContext
//3.获取指定类型状态
vs=context.getState(vsd)
}
override def map(value: (String, Int)): (String, Int) = {
//获取历史值
val historyData = vs.value()
//更新状态
vs.update(historyData+value._2)
//返回最新值
(value._1,vs.value())
}
}
ListCheckpointed
ListCheckpointed接⼝是CheckpointedFunction的更有限的变体写法。因为该接⼝仅仅⽀持list-style state的Even Split分发策略。
public interface ListCheckpointed<T extends Serializable> {
List<T> snapshotState(long checkpointId, long timestamp) throws Exception;
void restoreState(List<T> state) throws Exception;
}
- snapshotState:在做系统检查点的时候,⽤户只需要将需要存储的数据返回即可。
- restoreState:直接提供给⽤户需要恢复状态。
object FlinkCounterSource {
def main(args: Array[String]): Unit = {
//1.创建流计算执⾏环境
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStateBackend(new RocksDBStateBackend("hdfs:///flink-rocksdbcheckpoints",true))
//间隔5s执⾏⼀次checkpoint 精准⼀次
env.enableCheckpointing(5000,CheckpointingMode.EXACTLY_ONCE)
//设置检查点超时 4s
env.getCheckpointConfig.setCheckpointTimeout(4000)
//开启本次检查点 与上⼀次完成的检查点时间间隔不得⼩于 2s 优先级⾼于 checkpoint interval
env.getCheckpointConfig.setMinPauseBetweenCheckpoints(2000)
//如果检查点失败,任务宣告退出 setFailOnCheckpointingErrors(true)
env.getCheckpointConfig.setTolerableCheckpointFailureNumber(0)
//设置如果任务取消,系统该如何处理检查点数据
//RETAIN_ON_CANCELLATION:如果取消任务的时候,没有加--savepoint,系统会保留检查点数据
//DELETE_ON_CANCELLATION:取消任务,⾃动是删除检查点(不建议使⽤)
env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.R
ETAIN_ON_CANCELLATION)
val text = env.addSource(new UserDefineCounterSource)
.uid("UserDefineCounterSource")
text.print("offset")
//5.执⾏流计算任务
env.execute("Stream WordCount")
}
}
class UserDefineCounterSource extends RichParallelSourceFunction[Long] with
ListCheckpointed[JLong]{
@volatile
private var isRunning = true
private var offset = 0L
//存储状态值
override def snapshotState(checkpointId: Long, timestamp: Long): util.List[JLong] =
{
println("snapshotState:"+offset)
Collections.singletonList(offset)//返回⼀个不可拆分集合
}
override def restoreState(state: util.List[JLong]): Unit = {
println("restoreState:"+state.asScala)
offset=state.asScala.head //取第⼀个元素
}
override def run(ctx: SourceFunction.SourceContext[Long]): Unit = {
val lock = ctx.getCheckpointLock
while (isRunning) {
Thread.sleep(1000)
lock.synchronized({
ctx.collect(offset) //往下游输出当前offset
offset += 1
})
}
}
override def cancel(): Unit = isRunning=false
}