Flink的安装配置

 

conf/flink-conf.yaml配置:

  •  
-- 必选
jobmanager.rpc.address: localhost #自己的地址

-- 打开配置
state.checkpoints.dir: hdfs:///flink/flink-checkpoints
state.savepoints.dir: hdfs:///flink/flink-checkpoints
rest.port: 8081
rest.address: 0.0.0.0
jobmanager.archive.fs.dir: hdfs:///flink/completed-jobs/
historyserver.web.address: 0.0.0.0
historyserver.web.port: 8082
historyserver.archive.fs.dir: hdfs:///flink/completed-jobs/

-- 其他根据需要开启

Hadoop环境变量配置(必须,不然报错):

  •  
export HADOOP_HOME=/usr/hdp/2.6.1.0-129/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`

 

FlinkTableAPI代码

 

POM配置:

  •  
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
    </parent>
    <modelVersion>4.0.0</modelVersion>
    <artifactId>CollectData</artifactId>
    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-scala-bridge_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-core</artifactId>
            <version>1.2.3</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.2.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.11</artifactId>
            <version>1.10.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-filesystem_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-csv</artifactId>
            <version>1.10.0</version>
        </dependency>
    </dependencies>
 
</project>
请注意,所有这些依赖项都将其范围设置为provided。这意味着需要对它们进行编译,但不应将它们打包到项目生成的应用程序jar文件中 , 这些依赖项是Flink Core Dependencies,它们已在任何设置中可用。强烈建议将依赖关系保持在provided范围内。如果它们未设置为provided,则最好的情况是生成的JAR变得过大,因为它还包含所有Flink核心依赖项。最糟糕的情况是添加到应用程序的jar文件的Flink核心依赖项与您自己的一些依赖版本冲突(通常通过反向类加载来避免)。

关于IntelliJ的注意事项:

 

要使应用程序在IntelliJ IDEA中运行,需要将scope设置为comiple(或者注释掉),而不是provided。否则,IntelliJ不会将它们添加到类路径中,并且失败并带有NoClassDefFountError。为了避免必须将依赖范围声明为compile。

 

完整代码:

  •  
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api._
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{FileSystem, OldCsv, Schema}
/**
 * Author z
 * Date 2020-11-13 14:04:20
 */
object Myfirsttest {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1) 
    val inputStream = env.fromCollection(List(
      Student(1, "张飞", 88),
      Student(1, "赵云", 880),
      Student(1, "王八", 18),
      Student(1, "刘备", 68)
    ))
    //创建表执行环境
    val tableEnv = StreamTableEnvironment.create(env)
    //基于数据流,转换成一张表
    val dataTable = tableEnv.fromDataStream(inputStream)
    //sql查询
    val resultTable = tableEnv
      .sqlQuery("select id,name,age from " + dataTable)
    //将table转换成stream
    val resultStream = resultTable
      .toAppendStream[(Double, String, Double)]
    //输出
    
    tableEnv.connect(new FileSystem().path("/tmp/out.txt"))
      .withFormat(new OldCsv)
      .withSchema(
        new Schema()
          .field("id", DataTypes.DOUBLE())
          .field("name", DataTypes.STRING())
          .field("age", DataTypes.DOUBLE())
      ).createTemporaryTable("outtable")
    resultTable.insertInto("outtable") 
    env.execute()
  }
}
case class Student(id: Double, name: String, age: Double)

本地运行结果(注释provided):

 

Flink TableAPI on Yarn Demo_hadoop

 

文件在项目的根目录

 

打包到服务器上打开provided

 

Flink TableAPI on Yarn Demo_apache_02

 

运行命令:

  •  
bin/flink run 
-m yarn-cluster 
-c com.duoduo.Myfirsttest 
./CollectData-1.0-SNAPSHOT-jar-with-dependencies.jar

 

服务器运行结果:

 

Flink TableAPI on Yarn Demo_apache_03

 

Flink TableAPI on Yarn Demo_apache_04