Flink下载地址:https://flink.apache.org/downloads.html

因目前Flink尚未集成hadoop2.9版本,因此选择2.7稳定版进行安装(兼容)

flink 实现 doris增量 flink conf_flink


以下操作请在集群的所有节点进行

解压重命名

$ tar -zxvf flink-1.7.1-bin-hadoop27-scala_2.11.tgz /opt/core 

$ mv flink-1.7.1 flink

添加环境变量

vi /opt/conf/env

#FLINK export FLINK_HOME=/opt/core/flink

使配置文件生效

source /opt/conf/env

Flink配置文件——masters

请根据集群主节点hostname进行配置

hadoop001:8082

Flink配置文件——slaves

请根据集群各节点hostname进行配置

hadoop001
hadoop002
hadoop003

Flink配置文件——flink-conf.yaml

vi conf/flink-conf.yaml

基础配置

参数


说明

jobmanager.rpc.address

hadoop001

jobmanager所在节点

jobmanager.rpc.port

6123

jobManager端口,默认为6123

jobmanager.heap.size

2048m

jobmanager可用内存

taskmanager.heap.size

4096m

每个TaskManager可用内存,根据集群情况指定

taskmanager.numberOfTaskSlots

3

每个taskmanager的并行度(5以内)

parallelism.default

6

启动应用的默认并行度(该应用所使用总的CPU数)

rest.port

8082

Flink web UI默认端口与spark的端口8081冲突,更改为8082

history server配置

参数


说明

-----------------------------------------

----------------------------------

------------------------------------------------------------

jobmanager.archive.fs.dir

hdfs://hsotname001/var/log/hadoop-flink

因为配置了HA,所以hdfs nameservices指定为hsotname001

historyserver.web.address

hadoop001

historyserver web UI地址(需要在本地hosts文件中指定该映射关系)

historyserver.web.port

18082

historyserver web UI端口

historyserver.archive.fs.dir

hdfs://hsotname001/var/log/hadoop-flink

值与“jobmanager.archive.fs.dir”保持一致

historyserver.archive.fs.refresh-interval

10000

history server页面默认刷新时长

添加jar包依赖

cd $FLINK_HOME/lib

添加如下依赖jar包(打包下载地址:):

flink-hadoop-compatibility_2.11-1.7.1.jar 
javax.ws.rs-api-2.0.1.jar 
jersey-common-2.27.jar 
jersey-core-1.9.jar							`

若不在lib目录中添加以上jar包,则会在运行flink on yarn时发生如下异常信息:

提示找不到 jersey 类,请检查如下jar包依赖是否正确添加

18/08/25 17:29:28 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/08/25 17:29:28 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
        at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
		...
		Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig

Start Flink Cluster

[wuhuan@hadoop001~]$ sh $FLINK_HOME/bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host master.
Starting taskexecutor daemon on host slave.
Starting taskexecutor daemon on host slave1.

[wuhuan@hadoop001~]$ jps
4153 StandaloneSessionClusterEntrypoint
3863 TaskManagerRunner
4207 Jps

[wuhuan@hadoop002~]$ jps
6109 TaskManagerRunner
7421 Jps

添加JobManager或TaskManager实例到集群

[wuhuan@hadoop001~]$ jobmanager.sh start
[wuhuan@hadoop001~]$ taskmanager.sh start

备注:flink1.6开始取消了start-local.sh命令

Flink相关命令:https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/cluster_setup.html

Flink Web UI

查看Flink web UI 界面:https://hadoop001:8082

flink 实现 doris增量 flink conf_flink on yarn_02

查看Flink history job web UI 界面:https://hadoop001:18082

flink 实现 doris增量 flink conf_安装_03


flink程序运行结束后可以在completed job中查看到历史job信息:

flink 实现 doris增量 flink conf_flink_04


同时flink的jar可以通过web页面上传:

flink 实现 doris增量 flink conf_flink_05

Flink提交方式

flink同样支持两种提交方式,默认不指定就是客户端方式

如果需要使用集群方式提交的话。可以在提交作业的命令行中指定-d或者–detached 进行进群模式提交。
-d,–detached If present, runs the job indetached mode(分离模式)

客户端提交:
    FLINK_HOME/bin/flink run -c com.daxin.batch.App flinkwordcount.jar
    客户端会多出来一个CliFrontend进程,就是驱动进程。
    
集群模式提交:
    FLINK_HOME/bin/flink run -d -c com.daxin.batch.App flinkwordcount.jar 
    程序提交完毕退出客户端,不在打印作业进度等信息!

Flink on yarn

请首先确保Hadoop HDFS、YARN集群模式的正确运行,测试flink on yarn是否可以正常运行:

cd $FLINK_HOME

提交flink任务到yarn

$ ./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar

Flink on yarn命令:https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/yarn_setup.html