flink java任务停止 flink任务提交

转载

智能领航员 2023-10-13 22:41:56

文章标签 flink java任务停止配置文件客户端 flink 文章分类 Java 后端开发

文章目录

1. 首先StreamExecutionEnvironment是流作业的一个执行环境
2. StreamGraph的创建
3. 异步创建一个JobClient客户端

1. 创建一个执行器。
2. pipeline到jobgraph的转化

1.激活配置文件（准备JobGraph的配置）
2. 翻译Translator （执行JobGraph转化）
异步提交任务到Cluster（集群）中，并获取Job客户端

1. LocalExecutor 构建器
2. RemoteExecutor
RestServerEndpoint 与服务端通信
3. EmbeddedExecutor 构建器

4. 返回JobClient

忽略图转换以及Client端job提交的方式，我们以最传统的流式Stream做分析

flink java任务停止 flink任务提交_flink java任务停止

先对流程图做个整体的理解。在应用程序提交任务的时候主要涉及的5个流程

Enviroment创建
StreamGraph创建
StreamGraph转化为JobGraph
返回JobClient客户端
异步等待结果

大家细细品味上图的结果。

1. 首先StreamExecutionEnvironment是流作业的一个执行环境

这个是开发者最为熟悉的一层，也是能够直接操作flink，并启动flink的一层。
通过这一层，用户能够利用这一层设置一些全局配置。在配置章节我们讲过一些内容。

2. StreamGraph的创建

flink java任务停止 flink任务提交_客户端_02

这个StreamGraph是用户最原始的算子(transformation)所生成的图，为调度器在调度任务之前生成JobGraph做准备。我们可以发现，执行环境通过创建一个StreamGraph生成器来获取生成好的StreamGraph

StreamGraph保存了很多重要的信息，比如

flink java任务停止 flink任务提交_flink_03

streamNodes。（用户自定义的算子,由transformations转化而来）单个SteamNode保存的重要信息：inEdges,outEdges,parrallelism,以及相关的序列化类型器，为后续的JobGraph提供依据。
ExecutionConfig,通过官方介绍，说白了就是所有执行环境（Executors）都需要的配置，是全局性质的。
CheckpointConfig,Checkpoint配置文件的信息，包括checkpoint模式，chepoint间隔实时间，checkpiont超时时间，等等

flink java任务停止 flink任务提交_客户端_04

TimeCharacteristic,时间特性，对于一般的开发者而言更多的是关于时间窗口的描述。不过这个特性需要[++这个后期再开一篇++]
调度模型 scheduleMode 如何调度任务

LAZY_FROM_SOURCES/ LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST / EAGER 三种，不过LAZY_FROM_SOURCES,在1.11 并没有实现，实质只有两种调度方式
eager 急切把所有任务进行分发，因此适合流计算。 lazy_from_sources_with_batch_slot_request 表示当下游的input数据准备好（buffer）之后会进行任务的触发
org.apache.flink.runtime.executiongraph.SchedulingUtils#schedule 调度器工具类，根据调度模型来触发不同的调度机制

slotSharingGroup 和 colocationGroup 需要进一步了解

3. 异步创建一个JobClient客户端

JobClient客户端允许用户提交或者取消。当前我们整个获取jobClient客户端的图，并进行深入剖析。

flink java任务停止 flink任务提交_客户端_05

1. 创建一个执行器。

在这里创建客户端的时候首先会跟执行器工厂类打交道，执行器（Executor），默认实现了execute方法，用来提交客户端的请求。

flink java任务停止 flink任务提交_flink java任务停止_06

逻辑上来讲，Flink内部通过SPI方式获取当前Enviroment匹配的执行器工厂，来创建执行器（Executor），这样保证不同环境执行不同功能的执行器.逻辑如下所示。

org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#executeAsync(org.apache.flink.streaming.api.graph.StreamGraph)

final PipelineExecutorFactory executorFactory =
			executorServiceLoader.getExecutorFactory(configuration);
CompletableFuture<JobClient> jobClientFuture = executorFactory
			.getExecutor(configuration)
			.execute(streamGraph, configuration);

获取工厂的操作是在executorServiceLoader中获取的，因此核心获取逻辑，我们通过构造器中的 new DefaultExecutorServiceLoader()来定位

public PipelineExecutorFactory getExecutorFactory(final Configuration configuration) {
		checkNotNull(configuration);

		final ServiceLoader<PipelineExecutorFactory> loader =
				ServiceLoader.load(PipelineExecutorFactory.class);

		final List<PipelineExecutorFactory> compatibleFactories = new ArrayList<>();
		final Iterator<PipelineExecutorFactory> factories = loader.iterator();
		// 找一个跟配置文件中部署模式（configuration.get(DeploymentOptions.TARGET)）匹配（remote/local/colleciton/embeded）的工厂
		while (factories.hasNext()) {
			final PipelineExecutorFactory factory = factories.next();
			if (factory != null && factory.isCompatibleWith(configuration)) {
				compatibleFactories.add(factory);
			}
		}
        // 如果找到两个工厂就报错
		if (compatibleFactories.size() > 1) {
			throw new IllegalStateException("Multiple compatible client factories found for:\n" + configStr + ".");
		}
        // 如果找不到工厂报错
		if (compatibleFactories.isEmpty()) {
			throw new IllegalStateException("No ExecutorFactory found to execute the application.");
		}
        // 有且只有一个匹配的工厂
		return compatibleFactories.get(0);
	}

根据不同的配置文件的参数找到对应的执行器工厂，并通过该工厂获取执行器。

执行器有很多，我这里列出了四个99%使用的执行器，方便记忆。

可以理解为，当用户使用remote方式提交job的时候，配置对象中的DeploymentOptions.TARGET对应的value为 remote，所以找到了指定的工厂。

接下来就是根据各个工厂生成执行器。

这里梳理一下各个执行器的继承结构

flink java任务停止 flink任务提交_flink java任务停止_07

各个执行器，都实现了PipelineExecutor接口，我们看看，这个接口干了什么？

public interface PipelineExecutor {
	CompletableFuture<JobClient> execute(final Pipeline pipeline, final Configuration configuration) throws Exception;
}

也就是说，管道执行器接口开放了执行方法，让实现了这个接口的类需要开放这个接口，传入一个PipeLine类型的结构，和配置文件，最后返回一个JobClient类型的CompletableFuture对象，可以看到Future意味着execute操作是异步的。外部调用者，通过接口可以进行后续的异步监听，比如get方法堵塞等待返回一个JobClient实例,Flink也是这么写的，如下

org.apache.flink.streaming.api.environment.StreamExecutionEnvironment#executeAsync(org.apache.flink.streaming.api.graph.StreamGraph)

CompletableFuture<JobClient> jobClientFuture = executorFactory
			.getExecutor(configuration)
			.execute(streamGraph, configuration);
JobClient jobClient = jobClientFuture.get();
jobListeners.forEach(jobListener -> jobListener.onJobSubmitted(jobClient, null));
return jobClient;

在初始化JobClient的过程中，会真正地保证Pipeline到JobGraph的转化。因此我们接下来的过程就是：如下标题

2. pipeline到jobgraph的转化

我们这里主要查看executor中的AbstractSessionClusterExecutor类，通过该类的execute方法，我们可以知道转化过程也是相对固定的流程，使用了PipelineExecutorUtils工具类是西安最终的转化，在转化之前

需要激活配置，用户命令参数传入的指令会覆盖系统默认属性。
根据管道的类型来获取翻译器来翻译JobGraph

1.激活配置文件（准备JobGraph的配置）

在执行器生成JobGraph图的过程中，会激活配置文件。在这一步是通过用户的脚本传递的信息进行覆盖。也就是说，脚本参数会比系统参数的优先级更高！

flink java任务停止 flink任务提交_flink java任务停止_08

这里我们很明显地看出来，系统配置信息会被注入到JobGraph图中，JobGraph就是一个Job一个JobGraph，这个JobGraph是一个实例，因此很适合放全局配置文件以及激活的配置文件。接下来我们看下真正的被激活的配置文件

2. 翻译Translator （执行JobGraph转化）

我们看到在激活配置文件的时候，同时通过 Flinke内置的管道翻译工具类，统一了抽象。并通过传递的Pipeline（管道数据类型），优化过的配置文件（Coniguration）和并发度。三个参数使用跟管道相匹配的翻译器（translator）来翻译生成新的JobGraph。翻译操作，我们将在StreamGraph到JobGraph文章中详解。这里只要了解，是通过 PipeLine转化为一个JobGraph的一个过程。

org.apache.flink.client.FlinkPipelineTranslationUtil#getJobGraph

public static JobGraph getJobGraph(
		Pipeline pipeline,
		Configuration optimizerConfiguration,
		int defaultParallelism) {

	FlinkPipelineTranslator pipelineTranslator = getPipelineTranslator(pipeline);

	return pipelineTranslator.translateToJobGraph(pipeline,
			optimizerConfiguration,
			defaultParallelism);
}

flink java任务停止 flink任务提交_flink_09

可以发现在正在生成。通过源码我们可以确认，当用户传递的是Plan类型（batch）的pipeline，则会通过一个PlanTranslator转化器进行图的转化，如果用户传递的是非Plan（比如StreamGraph）类型的图，则使用一个StreamGraphTranslator翻译器来转化图节点。

private static FlinkPipelineTranslator getPipelineTranslator(Pipeline pipeline) {
		PlanTranslator planTranslator = new PlanTranslator();
        // 如果是Plan类型的管道，则直接返回
		if (planTranslator.canTranslate(pipeline)) {
			return planTranslator;
		}
        // 使用流翻译器来翻译
		StreamGraphTranslator streamGraphTranslator = new StreamGraphTranslator();

		if (streamGraphTranslator.canTranslate(pipeline)) {
			return streamGraphTranslator;
		}

		throw new RuntimeException("Translator " + streamGraphTranslator + " cannot translate "
			+ "the given pipeline " + pipeline + ".");
	}

异步提交任务到Cluster（集群）中，并获取Job客户端

在这里我们要重塑一下Executor，之前提到的仅仅是创建一个Executor，并通过executor返回一个JobClient，而具体executor做了什么并没有细说。

每一个类型的Executor会通过不同的工厂创建Executor，我们可以通过Executor构造参数来大致看看Executor一般持有哪些对象。

flink java任务停止 flink任务提交_flink java任务停止_10

1. LocalExecutor 构建器

public static LocalExecutor create(Configuration configuration) {
		return new LocalExecutor(configuration, MiniCluster::new);
	}

private LocalExecutor(Configuration configuration, Function<MiniClusterConfiguration, MiniCluster> miniClusterFactory) {
	this.configuration = configuration;
	this.miniClusterFactory = miniClusterFactory;
}

a) 配置对象

b) miniClusterFactory::==== 默认通过MiniCluster::new lambda构造工厂。这个工厂类就是用来创建miniCluster的，一个小型集群，可以执行任务的环境。并创建指定的Client端口与指定的Cluster进行交互

flink java任务停止 flink任务提交_客户端_11

我们现在考虑一个PerJobMiniClusterFactory模式，这个是最为复杂的提交模式。
在JobGraph生成之后，我们需要把这个JobGraph数据结构，提交给Flink runtime集群，因此在这一步中，Flink会根据不同的executor会启动不同的。具体如何提交？我们先看下源码。我们以PerJobMiniClusterFactory类及由LocalExecutor内部使用的集群类来看看

org.apache.flink.client.program.PerJobMiniClusterFactory

/**
 * Starts a {@link MiniCluster} and submits a job.
 */
public CompletableFuture<JobClient> submitJob(JobGraph jobGraph) throws Exception {
    // 1. 配置集群配置
	MiniClusterConfiguration miniClusterConfig = getMiniClusterConfig(jobGraph.getMaximumParallelism());
	// 2. 使用工厂模式创建mini集群，实现的方式格是利用java8的lambda方式，在LocalExecutor的创建函数中MiniCluster::new方式
    MiniCluster miniCluster = miniClusterFactory.apply(miniClusterConfig);
    // 启动集群
	miniCluster.start();
    // 向mini集群中提交集群
	return miniCluster
		.submitJob(jobGraph)
		.thenApply(result -> new PerJobMiniClusterJobClient(result.getJobID(), miniCluster))
		.whenComplete((ignored, throwable) -> {
			if (throwable != null) {
				// We failed to create the JobClient and must shutdown to ensure cleanup.
				shutDownCluster(miniCluster);
			}
		})
		.thenApply(Function.identity());
}

2. RemoteExecutor

远程提交job模式下的executor

public RemoteExecutor() {
	super(new StandaloneClientFactory());
}

a) 通过一个Standalone客户工厂，这个对象是什么？它主要是通过创建一个描述符，用来引用RestClusterClient,这个对象是一个与远程集群进行交互的客户端

RestClusterClient内部拥有维护与集群相关的信息变量，相关的服务，以及状态信息。

flink java任务停止 flink任务提交_配置文件_12

RestServerEndpoint 与服务端通信

flink java任务停止 flink任务提交_客户端_13

RestClusterClient 与之对应的服务器的终端在RestServerEndpoint中，该ServerEndpoint 在start()中，会用netty构建一个boss线程和default数量的worker来开器服务，一般是Flink集群内部的服务，改EndPoint的handler主要处理两种请求：

用来缓存从web页面上传的jar包同时根据导入的载各种handler
router规则通过路由请求类型（post/get/dlete），并执行相关的处理。这里的handler主要处理的是哪种类型的业务呢？

用户提交一个job是否成功（jobSumbmit）？任务是否被取消（jobCanceled）？这些job任务的变更如何在客户端和服务端能够保证同步呢？Flink内部使用了一个RouterHandler包装了这些任务的状态。这个任务就是我们提交的任务jobGraph。这些图的状态信息就是由这些router保存的。验证如下
org.apache.flink.runtime.rest.RestServerEndpoint

public final void start() throws Exception {

    final Router router = new Router();
    
    ...
    ChannelInitializer<SocketChannel> initializer = new ChannelInitializer<SocketChannel>() {

				@Override
				protected void initChannel(SocketChannel ch) {
					RouterHandler handler = new RouterHandler(router, responseHeaders);

					// SSL should be the first handler in the pipeline
					if (isHttpsEnabled()) {
						ch.pipeline().addLast("ssl",
							new RedirectingSslHandler(restAddress, restAddressFuture, sslHandlerFactory));
					}

					ch.pipeline()
						.addLast(new HttpServerCodec())
						.addLast(new FileUploadHandler(uploadDir))
						.addLast(new FlinkHttpObjectAggregator(maxContentLength, responseHeaders))
						.addLast(new ChunkedWriteHandler())
						.addLast(handler.getName(), handler)
						.addLast(new PipelineErrorHandler(log, responseHeaders));
				}
			};

}

大家也可以自己去看看 RestClusterClientTest#testJobSubmitCancel() 方法

注意：handler中会处理各种粘包/半包的问题

3. EmbeddedExecutor 构建器

我们通过 EmbeddedExecutorFactory 来查看EmbeddedExecutor的构建过程

EmbeddedExecutorFactory

public PipelineExecutor getExecutor(final Configuration configuration) {
		checkNotNull(configuration);
		return new EmbeddedExecutor(
				submittedJobIds,
				dispatcherGateway,
				jobId -> {
					final Time timeout = Time.milliseconds(configuration.get(ClientOptions.CLIENT_TIMEOUT).toMillis());
					return new EmbeddedJobClient(jobId, dispatcherGateway, retryExecutor, timeout);
				});
	}

如上代码的意思就是通过工厂函数创建一个内建的EmbeddedJobClient，包含jobid，分发网关，重试执行器以及超时时间。

flink java任务停止 flink任务提交_flink_14

核心组件

a) 分发网关: 该网关的作用是请求不同的任务到Dispatcher，dispacher的作用就是用户用来接收job的提交/孵化taskManagers/恢复taskManager/知道session集群当前状态的作用. 在实现层面上,如下图所示，就是通过Dispatcher实现这个网关接口

flink java任务停止 flink任务提交_配置文件_15

b) JobClient创建者
我们看到这里Flink没有实现，why？因为这个是比如用户要使用yarn等调度器的时候由yarn提供的实现兼容于Flink集群的代码，但是通过前面两种介绍，可以很简单发现，client就是用户请求job的一个客户端而已。

public interface EmbeddedJobClientCreator {

	/**
	 * Creates a {@link JobClient} that is adequate for the context in which the job is executed.
	 * @param jobId the job id of the job associated with the returned client.
	 * @return the job client.
	 */
	JobClient getJobClient(final JobID jobId);
}

总结如下图的方式各个Executor创建了各自的JobClient

flink java任务停止 flink任务提交_flink java任务停止_16

4. 返回JobClient

并通过JobClient获取响应的结果
流程结束

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：大宗商品数据仓库构建如何支撑不低于100TB的总数据规模大宗商品数据app

下一篇：MySql Cluster可以用InnoDB mysql innodb myisam

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯