一.简介
在流式处理的过程中, 在中间步骤的处理中, 如果涉及到一些费事的操作或者是外部系统的数据交互, 那么就会给整个流造成一定的延迟。在 Flink 的 1.2 版本中引入了 Asynchronous I/O,能够支持异步的操作,以提高 flink 系统与外部数据系统交互的性能及吞吐量。
图中棕色的长条表示等待时间,可以发现网络等待时间阻碍了吞吐和延迟,为了解决同步访问的问题,异步模式可以并发地处理多个请求和回复,也就是说,你可以连续地向数据库发送用户a、b、c等请求,与此同时,哪个请求的回复先返回了就处理哪个回复,从而连续的请求之间不需要阻塞等待,如上图右边所示。这也正是 Async I/O 的实现原理。
二.原理
2.1 API
/**
* An implementation of the 'AsyncFunction' that sends requests and sets the callback.
*/
class AsyncDatabaseRequest extends AsyncFunction[String, (String, String)] {
/** The database specific client that can issue concurrent requests with callbacks */
lazy val client: DatabaseClient = new DatabaseClient(host, post, credentials)
/** The context used for the future callbacks */
implicit lazy val executor: ExecutionContext = ExecutionContext.fromExecutor(Executors.directExecutor())
override def asyncInvoke(str: String, resultFuture: ResultFuture[(String, String)]): Unit = {
// issue the asynchronous request, receive a future for the result
// 发起异步请求,返回结果是一个 Future
val resultFutureRequested: Future[String] = client.query(str)
// set the callback to be executed once the request by the client is complete
// the callback simply forwards the result to the result future
// 请求完成时的回调,将结果交给 ResultFuture
resultFutureRequested.onSuccess {
case result: String => resultFuture.complete(Iterable((str, result)))
}
}
}
// create the original stream
val stream: DataStream[String] = ...
// 应用 async I/O 转换,设置等待模式、超时时间、以及进行中的异步请求的最大数量
val resultStream: DataStream[(String, String)] =
AsyncDataStream.unorderedWait(stream, new AsyncDatabaseRequest(), 1000, TimeUnit.MILLISECONDS, 100)
AsyncDataStream 提供了两种调用方法,分别是 orderedWait 和 unorderedWait,这分别对应了有序和无序两种输出模式。
之所以会提供两种输出模式,是因为异步请求的完成时间是不确定的,先发出的请求的完成时间可能会晚于后发出的请求。在“有序”的输出模式下,所有计算结果的提交完全和消息的到达顺序一致;而在“无序”的输出模式下,计算结果的提交则是和请求的完成顺序相关的,先处理完成的请求的计算结果会先提交。
注意
在使用“事件时间”的情况下,“无序”输出模式仍然可以保证 watermark 的正常处理,即在两个 watermark 之间的消息的异步请求结果可能是异步提交的,但在 watermark 之后的消息不能先于该 watermark 之前的消息提交。
由于异步请求的完成时间不确定,需要设置请求的超时时间,并配置同时进行中的异步请求的最大数量。
AsyncDataStream.orderedWait
def orderedWait[IN, OUT: TypeInformation](
input: DataStream[IN],
asyncFunction: AsyncFunction[IN, OUT],
timeout: Long, //超时时间
timeUnit: TimeUnit,
capacity: Int) //异步请求最大数量
: DataStream[OUT] = {
val javaAsyncFunction = wrapAsJavaAsyncFunction(asyncFunction)
val outType : TypeInformation[OUT] = implicitly[TypeInformation[OUT]]
asScalaStream(JavaAsyncDataStream.orderedWait[IN, OUT](
input.javaStream,
javaAsyncFunction,
timeout,
timeUnit,
capacity).returns(outType))
}
AsyncDataStream.unorderedWait
def unorderedWait[IN, OUT: TypeInformation](
input: DataStream[IN],
asyncFunction: AsyncFunction[IN, OUT],
timeout: Long,//超时时间
timeUnit: TimeUnit,
capacity: Int)//异步请求最大数量
: DataStream[OUT] = {
val javaAsyncFunction = wrapAsJavaAsyncFunction(asyncFunction)
val outType : TypeInformation[OUT] = implicitly[TypeInformation[OUT]]
asScalaStream(JavaAsyncDataStream.unorderedWait[IN, OUT](
input.javaStream,
javaAsyncFunction,
timeout,
timeUnit,
capacity).returns(outType))
}
2.2 实现
AsyncDataStream 在运行时被转换为 AsyncWaitOperator 算子,它是 AbstractUdfStreamOperator 的子类。下面我们来看看 AsyncWaitOperator 的实现原理。AsyncWaitOperator 采用 StreamElementQueue 来是实现消息的顺序性保证。有两个子类:OrderedStreamElementQueue 和 UnorderedStreamElementQueue。
基本原理
AsyncWaitOperator 算子相比于其它算子的最大不同在于,它的输入和输出并不是同步的。因此,在 AsyncWaitOperator 内部采用了一种 “生产者-消费者” 模型,基于一个队列解耦异步计算和计算结果的提交。StreamElementQueue 提供了一种队列的抽象,一个“消费者”线程 Emitter 从中取出已完成的计算结果,并提交给下游算子,而异步请求则充当了队列“生产者”的角色。基本的处理逻辑如下图所示。
public class AsyncWaitOperator<IN, OUT>
extends AbstractUdfStreamOperator<OUT, AsyncFunction<IN, OUT>>
implements OneInputStreamOperator<IN, OUT> {
private static final long serialVersionUID = 1L;
private static final String STATE_NAME = "_async_wait_operator_state_";
//队列最大容量
private final int capacity;
//选择模式:有序和无序
private final AsyncDataStream.OutputMode outputMode;
//超时时间
private final long timeout;
//snapshots 类型
private transient StreamElementSerializer<IN> inStreamElementSerializer;
/** Recovered input stream elements. */
private transient ListState<StreamElement> recoveredStreamElements;
//输出队列
private transient StreamElementQueue<OUT> queue;
/** Mailbox executor used to yield while waiting for buffers to empty. */
private final transient MailboxExecutor mailboxExecutor;
private transient TimestampedCollector<OUT> timestampedCollector;
public AsyncWaitOperator(
@Nonnull AsyncFunction<IN, OUT> asyncFunction,
long timeout,
int capacity,
@Nonnull AsyncDataStream.OutputMode outputMode,
@Nonnull MailboxExecutor mailboxExecutor) {
super(asyncFunction);
// TODO this is a temporary fix for the problems described under FLINK-13063 at the cost of breaking chains for
// AsyncOperators.
setChainingStrategy(ChainingStrategy.HEAD);
Preconditions.checkArgument(capacity > 0, "The number of concurrent async operation should be greater than 0.");
this.capacity = capacity;
this.outputMode = Preconditions.checkNotNull(outputMode, "outputMode");
this.timeout = timeout;
this.mailboxExecutor = mailboxExecutor;
}
@Override
public void setup(StreamTask<?, ?> containingTask, StreamConfig config, Output<StreamRecord<OUT>> output) {
super.setup(containingTask, config, output);
this.inStreamElementSerializer = new StreamElementSerializer<>(
getOperatorConfig().<IN>getTypeSerializerIn1(getUserCodeClassloader()));
//选择有序和无序
//AsyncWaitOperator 采用 StreamElementQueue 来是实现消息的顺序性保证。有两个子类:OrderedStreamElementQueue 和 UnorderedStreamElementQueue。
switch (outputMode) {
case ORDERED:
queue = new OrderedStreamElementQueue<>(capacity);
break;
case UNORDERED:
queue = new UnorderedStreamElementQueue<>(capacity);
break;
default:
throw new IllegalStateException("Unknown async mode: " + outputMode + '.');
}
this.timestampedCollector = new TimestampedCollector<>(output);
}
@Override
public void open() throws Exception {
super.open();
if (recoveredStreamElements != null) {
for (StreamElement element : recoveredStreamElements.get()) {
if (element.isRecord()) {
processElement(element.<IN>asRecord());
}
else if (element.isWatermark()) {
processWatermark(element.asWatermark());
}
else if (element.isLatencyMarker()) {
processLatencyMarker(element.asLatencyMarker());
}
else {
throw new IllegalStateException("Unknown record type " + element.getClass() +
" encountered while opening the operator.");
}
}
recoveredStreamElements = null;
}
}
@Override
public void processElement(StreamRecord<IN> element) throws Exception {
// 将元素加入到队列中
final ResultFuture<OUT> entry = addToWorkQueue(element);
//当异步IO之后完毕后,会调用resultHandler.complete() 方法,将结果收集到resutHandler中
final ResultHandler resultHandler = new ResultHandler(element, entry);
// 注册定时器
if (timeout > 0L) {
final long timeoutTimestamp = timeout + getProcessingTimeService().getCurrentProcessingTime();
final ScheduledFuture<?> timeoutTimer = getProcessingTimeService().registerTimer(
timeoutTimestamp,
timestamp -> userFunction.timeout(element.getValue(), resultHandler));
resultHandler.setTimeoutTimer(timeoutTimer);
}
// 异步IO 调用
userFunction.asyncInvoke(element.getValue(), resultHandler);
}
@Override
public void processWatermark(Watermark mark) throws Exception {
addToWorkQueue(mark);
// watermarks are always completed
// if there is no prior element, we can directly emit them
// this also avoids watermarks being held back until the next element has been processed
outputCompletedElement();
}
@Override
public void snapshotState(StateSnapshotContext context) throws Exception {
super.snapshotState(context);
ListState<StreamElement> partitionableState =
getOperatorStateBackend().getListState(new ListStateDescriptor<>(STATE_NAME, inStreamElementSerializer));
partitionableState.clear();
try {
partitionableState.addAll(queue.values());
} catch (Exception e) {
partitionableState.clear();
throw new Exception("Could not add stream element queue entries to operator state " +
"backend of operator " + getOperatorName() + '.', e);
}
}
@Override
public void initializeState(StateInitializationContext context) throws Exception {
super.initializeState(context);
recoveredStreamElements = context
.getOperatorStateStore()
.getListState(new ListStateDescriptor<>(STATE_NAME, inStreamElementSerializer));
}
@Override
public void close() throws Exception {
try {
waitInFlightInputsFinished();
}
finally {
super.close();
}
}
// ResultHandler 的complete 方法
@Override
public void complete(Collection<OUT> results) {
Preconditions.checkNotNull(results, "Results must not be null, use empty collection to emit nothing");
// 互斥条件
if (!completed.compareAndSet(false, true)) {
return;
}
//将结果发送给下一个处理节点
processInMailbox(results);
}
private void processInMailbox(Collection<OUT> results) {
// mail box thread 中进行消息发送,processResults() 进行消息处理
mailboxExecutor.execute(
() -> processResults(results),
"Result in AsyncWaitOperator of input %s", results);
}
private void processResults(Collection<OUT> results) {
// 计算出了结果,取消定时器
if (timeoutTimer != null) {
// canceling in mailbox thread avoids https://issues.apache.org/jira/browse/FLINK-13635
timeoutTimer.cancel(true);
}
// 更新Queue的Entry
resultFuture.complete(results);
// 从Queue中输出所有查询出来的结果
outputCompletedElement();
}
// 将结果发送出去
private void outputCompletedElement() {
if (queue.hasCompletedElements()) {
// emit only one element to not block the mailbox thread unnecessarily
queue.emitCompletedElement(timestampedCollector);
// if there are more completed elements, emit them with subsequent mails
if (queue.hasCompletedElements()) {
mailboxExecutor.execute(this::outputCompletedElement, "AsyncWaitOperator#outputCompletedElement");
}
}
}
上述最后一个函数的resultFuture.compete() 会更新Queue中的Entry。然后将队列中已经完成的元素给发送出去。
有序 OrderedStreamElementQueue
OrderedStreamElementQueue 实现了有序,内部数据结构是Java集合的Queue。当且当队列头的元素已经完成时,才会将元素发送。
@Override
public boolean hasCompletedElements() {
// 队列首的元素已经完成,可以发送
return !queue.isEmpty() && queue.peek().isDone();
}
// 发送元素
@Override
public void emitCompletedElement(TimestampedCollector<OUT> output) {
// 判断队首元素是否可以发送
if (hasCompletedElements()) {
final StreamElementQueueEntry<OUT> head = queue.poll();
head.emitResult(output);
}
}
无序
UnorderedStreamElementQueue 实现无序发送,使用一套逻辑实现了ProcessingTime无序 和 EventTime 无序。
无序处理指的是消息流入operator的顺序与经过处理后流入下一级operator的顺序无必然关联。
- 在processingTime模式下:应用对消息的顺序不敏感,因此可以实现严格意义的无序处理。
- 在EventTime时间模式下:应用对消息顺序敏感,消息的顺序对应用的统计结果影响较大,应用定期生成watermark并在task/operator间流动,在两个watermark之间的消息其消息无序不会对应用结果产生负面影响,如果一个watermark前后的消息发送到下游时,与接收到消息的顺序不一致,那么很有可能导致统计结果异常。因此该模式下的无序处理主要是指watermark之间的消息处理是无序的,而同一watermark两侧的消息必须遵循watermark前的消息早于watermark发送至下游,而watermark后的消息晚于watermark发送至下游。
static class Segment<OUT> {
/** Unfinished input elements. */
private final Set<StreamElementQueueEntry<OUT>> incompleteElements;
/** Undrained finished elements. */
private final Queue<StreamElementQueueEntry<OUT>> completedElements;
}
public final class UnorderedStreamElementQueue<OUT> implements StreamElementQueue<OUT> {
private static final Logger LOG = LoggerFactory.getLogger(UnorderedStreamElementQueue.class);
/** Capacity of this queue. */
private final int capacity;
/** Queue of queue entries segmented by watermarks. */
private final Deque<Segment<OUT>> segments;
// 取出Segments 的首个元素判断是否是完成的。
@Override
public boolean hasCompletedElements() {
return !this.segments.isEmpty() && this.segments.getFirst().hasCompleted();
}
}
Segment 就是一个队列,在UnorderedStreamElementQueue 中在外面又封装了一层队列。
双端队列用来解决ProcessingTime 和 EventTime 的无序。
ProcessingTime无序:segments 中永远只有一个 元素,所以将所有元素放在一个队列中。
EventTime 无序:每次放入watermark 时,在segments 队列中放入一个空的 Segment。后续的元素添加都会是另外一个队列。这样就保证了Watermark 之间的元素无序。
容错
@Override
public void snapshotState(StateSnapshotContext context) throws Exception {
super.snapshotState(context);
ListState<StreamElement> partitionableState =
getOperatorStateBackend().getListState(new ListStateDescriptor<>(STATE_NAME, inStreamElementSerializer));
partitionableState.clear();
try {
// 将队列中的元素保存在状态中即可。
partitionableState.addAll(queue.values());
} catch (Exception e) {
partitionableState.clear();
throw new Exception("Could not add stream element queue entries to operator state " +
"backend of operator " + getOperatorName() + '.', e);
}
}
在snapShot 函数中,保存了状态的信息,这是状态一致性的基础。
AsyncWaitOperator 执行快照非常简单。从代码中可以看到执行了如下步骤:
- 先清空原先的状态存储。
- 将Queue中的信息全部取出,然后放入到状态存储区中。
- 执行快照。
三.示例
object AsyncIOExample {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(10)
import org.apache.flink.api.scala._
val inputStream = env.addSource(new CustomNonParallelSourceFunction)
val result1 = AsyncDataStream.orderedWait(inputStream,new SampleAsyncFunction,1000,TimeUnit.MILLISECONDS,20)
val result2 = AsyncDataStream.unorderedWait(inputStream,new SampleAsyncFunction,1000,TimeUnit.MILLISECONDS,20)
result1.print("result1")
result2.print("result2")
env.execute("AsyncIOExample")
}
class CustomNonParallelSourceFunction extends SourceFunction[Long] {
var count = 0L
var isRunning = true
override def run(sourceContext: SourceFunction.SourceContext[Long]): Unit = {
while (isRunning){
sourceContext.collect(count)
count +=1
Thread.sleep(1000)
}
}
override def cancel(): Unit = {
isRunning = false
}
}
val executorService:ExecutorService = Executors.newFixedThreadPool(30)
class SampleAsyncFunction extends RichAsyncFunction[Long,String] {
val failRatio = 0.001f
val sleepFactor = 1000L
val shutdownWaitTS = 20000L
override def open(parameters: Configuration): Unit = {
super.open(parameters)
}
override def close(): Unit = {
super.close()
ExecutorUtils.gracefulShutdown(shutdownWaitTS, TimeUnit.MILLISECONDS, executorService)
}
override def asyncInvoke(input: Long, resultFuture: ResultFuture[String]): Unit = {
executorService.submit(new Runnable {
override def run(): Unit = {
val sleep = (ThreadLocalRandom.current().nextFloat() * sleepFactor).toLong
try {
Thread.sleep(sleep)
if(ThreadLocalRandom.current().nextFloat() < failRatio){
resultFuture.completeExceptionally(new Exception("lilili"))
}else resultFuture.complete(List("key-" + input))
}catch {
case e:Exception=>{
resultFuture.complete(List())
e.printStackTrace()
}
}
}
})
}
}
}
参考
http://wuchong.me/blog/2017/05/17/flink-internals-async-io/
公众号
微信号:bigdata_limeng