conf/flink-conf.yaml配置:
-- 必选
jobmanager.rpc.address: localhost #自己的地址
-- 打开配置
state.checkpoints.dir: hdfs:///flink/flink-checkpoints
state.savepoints.dir: hdfs:///flink/flink-checkpoints
rest.port: 8081
rest.address: 0.0.0.0
jobmanager.archive.fs.dir: hdfs:///flink/completed-jobs/
historyserver.web.address: 0.0.0.0
historyserver.web.port: 8082
historyserver.archive.fs.dir: hdfs:///flink/completed-jobs/
-- 其他根据需要开启
Hadoop环境变量配置(必须,不然报错):
export HADOOP_HOME=/usr/hdp/2.6.1.0-129/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`
FlinkTableAPI代码
POM配置:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>CollectData</artifactId>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.10.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner-blink_2.11</artifactId>
<version>1.10.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-scala-bridge_2.11</artifactId>
<version>1.10.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_2.11</artifactId>
<version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.10.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-filesystem_2.11</artifactId>
<version>1.10.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-csv</artifactId>
<version>1.10.0</version>
</dependency>
</dependencies>
</project>
请注意,所有这些依赖项都将其范围设置为provided。这意味着需要对它们进行编译,但不应将它们打包到项目生成的应用程序jar文件中 , 这些依赖项是Flink Core Dependencies,它们已在任何设置中可用。强烈建议将依赖关系保持在provided的范围内。如果它们未设置为provided,则最好的情况是生成的JAR变得过大,因为它还包含所有Flink核心依赖项。最糟糕的情况是添加到应用程序的jar文件的Flink核心依赖项与您自己的一些依赖版本冲突(通常通过反向类加载来避免)。关于IntelliJ的注意事项:
要使应用程序在IntelliJ IDEA中运行,需要将scope设置为comiple(或者注释掉),而不是provided。否则,IntelliJ不会将它们添加到类路径中,并且失败并带有
NoClassDefFountError
。为了避免必须将依赖范围声明为compile。
完整代码:
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api._
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{FileSystem, OldCsv, Schema}
/**
* Author z
* Date 2020-11-13 14:04:20
*/
object Myfirsttest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val inputStream = env.fromCollection(List(
Student(1, "张飞", 88),
Student(1, "赵云", 880),
Student(1, "王八", 18),
Student(1, "刘备", 68)
))
//创建表执行环境
val tableEnv = StreamTableEnvironment.create(env)
//基于数据流,转换成一张表
val dataTable = tableEnv.fromDataStream(inputStream)
//sql查询
val resultTable = tableEnv
.sqlQuery("select id,name,age from " + dataTable)
//将table转换成stream
val resultStream = resultTable
.toAppendStream[(Double, String, Double)]
//输出
tableEnv.connect(new FileSystem().path("/tmp/out.txt"))
.withFormat(new OldCsv)
.withSchema(
new Schema()
.field("id", DataTypes.DOUBLE())
.field("name", DataTypes.STRING())
.field("age", DataTypes.DOUBLE())
).createTemporaryTable("outtable")
resultTable.insertInto("outtable")
env.execute()
}
}
case class Student(id: Double, name: String, age: Double)
本地运行结果(注释provided):
文件在项目的根目录
打包到服务器上(打开provided):
运行命令:
bin/flink run
-m yarn-cluster
-c com.duoduo.Myfirsttest
./CollectData-1.0-SNAPSHOT-jar-with-dependencies.jar
服务器运行结果: