Flink TableAPI on Yarn Demo

原创

hyunbar777 2021-08-02 13:53:53 ©著作权

文章标签 flink apache ide hadoop scala 文章分类 大数据

©著作权归作者所有：来自51CTO博客作者hyunbar777的原创作品，请联系作者获取转载授权，否则将追究法律责任

Flink的安装配置

conf/flink-conf.yaml配置：

-- 必选
jobmanager.rpc.address: localhost #自己的地址

-- 打开配置
state.checkpoints.dir: hdfs:///flink/flink-checkpoints
state.savepoints.dir: hdfs:///flink/flink-checkpoints
rest.port: 8081
rest.address: 0.0.0.0
jobmanager.archive.fs.dir: hdfs:///flink/completed-jobs/
historyserver.web.address: 0.0.0.0
historyserver.web.port: 8082
historyserver.archive.fs.dir: hdfs:///flink/completed-jobs/

-- 其他根据需要开启

Hadoop环境变量配置（必须，不然报错）：

export HADOOP_HOME=/usr/hdp/2.6.1.0-129/hadoop
export HADOOP_CLASSPATH=`hadoop classpath`

FlinkTableAPI代码

POM配置：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
    </parent>
    <modelVersion>4.0.0</modelVersion>
    <artifactId>CollectData</artifactId>
    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-scala-bridge_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-core</artifactId>
            <version>1.2.3</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.2.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.11</artifactId>
            <version>1.10.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-filesystem_2.11</artifactId>
            <version>1.10.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-csv</artifactId>
            <version>1.10.0</version>
        </dependency>
    </dependencies>
 
</project>

请注意，所有这些依赖项都将其范围设置为provided。这意味着需要对它们进行编译，但不应将它们打包到项目生成的应用程序jar文件中 , 这些依赖项是Flink Core Dependencies，它们已在任何设置中可用。强烈建议将依赖关系保持在provided的范围内。如果它们未设置为provided，则最好的情况是生成的JAR变得过大，因为它还包含所有Flink核心依赖项。最糟糕的情况是添加到应用程序的jar文件的Flink核心依赖项与您自己的一些依赖版本冲突（通常通过反向类加载来避免）。

关于IntelliJ的注意事项：

要使应用程序在IntelliJ IDEA中运行，需要将scope设置为comiple（或者注释掉），而不是provided。否则，IntelliJ不会将它们添加到类路径中，并且失败并带有NoClassDefFountError。为了避免必须将依赖范围声明为compile。

完整代码：

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api._
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{FileSystem, OldCsv, Schema}
/**
 * Author z
 * Date 2020-11-13 14:04:20
 */
object Myfirsttest {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1) 
    val inputStream = env.fromCollection(List(
      Student(1, "张飞", 88),
      Student(1, "赵云", 880),
      Student(1, "王八", 18),
      Student(1, "刘备", 68)
    ))
    //创建表执行环境
    val tableEnv = StreamTableEnvironment.create(env)
    //基于数据流，转换成一张表
    val dataTable = tableEnv.fromDataStream(inputStream)
    //sql查询
    val resultTable = tableEnv
      .sqlQuery("select id,name,age from " + dataTable)
    //将table转换成stream
    val resultStream = resultTable
      .toAppendStream[(Double, String, Double)]
    //输出
    
    tableEnv.connect(new FileSystem().path("/tmp/out.txt"))
      .withFormat(new OldCsv)
      .withSchema(
        new Schema()
          .field("id", DataTypes.DOUBLE())
          .field("name", DataTypes.STRING())
          .field("age", DataTypes.DOUBLE())
      ).createTemporaryTable("outtable")
    resultTable.insertInto("outtable") 
    env.execute()
  }
}
case class Student(id: Double, name: String, age: Double)

本地运行结果（注释provided）：

Flink TableAPI on Yarn Demo_hadoop

文件在项目的根目录

打包到服务器上（打开provided）：

Flink TableAPI on Yarn Demo_apache_02

运行命令：

bin/flink run 
-m yarn-cluster 
-c com.duoduo.Myfirsttest 
./CollectData-1.0-SNAPSHOT-jar-with-dependencies.jar

服务器运行结果：

Flink TableAPI on Yarn Demo_apache_03

Flink TableAPI on Yarn Demo_apache_04

上一篇：Scala-变量和数据类型

下一篇：Sqoop参数的配置及使用方法

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯