1、创建一个Maven项目并配置Java SDK和Scala SDK,如图:

IDEA搭建Flink开发环境及WordCount_Flink

IDEA搭建Flink开发环境及WordCount_Flink_02

这里选择的是jdk1.8和scala2.12版本。

 

2、添加pom依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.leboop</groupId>
    <artifactId>com.leboop.www</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <scala.version>2.12</scala.version>
        <flink.version>1.9.3</flink.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_${scala.version}</artifactId>
            <version>${flink.version}</version>
            <scope>compile</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_${scala.version}</artifactId>
            <version>${flink.version}</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>
</project>

scala和flink版本分别为2.12和1.9.3。

 

3、BatchWordCount

批处理的WordCount程序代码如下:

package wordcount

import org.apache.flink.streaming.api.scala._
import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}

object BatchWordCount {

  def main(args: Array[String]) {
    val env = ExecutionEnvironment.getExecutionEnvironment

    val filePath = "G:\\idea_workspace\\myflink\\src\\main\\resources\\word.txt"
    // get input data
    val text: DataSet[String] = env.readTextFile(filePath)
    val counts = text.flatMap(_.toLowerCase().split("\\W+"))
      .map((_, 1)).groupBy(0).sum(1)

    counts.print()
  }
}

程序读取word.txt文件,统计词频。word.txt内容如下:

hello world
hello java
hello scala

输出如下:

(scala,1)
(world,1)
(hello,3)
(java,1)

 

4、StreamingWordCount

使用流式处理统计词频,代码如下:

package wordcount

import org.apache.flink.streaming.api.scala._

/**
  * Created by leboop on 2020/5/19.
  */
object StreamingWordCount {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    val filePath = "G:\\idea_workspace\\myflink\\src\\main\\resources\\word.txt"
    // get input data
    val text: DataStream[String] = env.readTextFile(filePath)
    val counts = text.flatMap(_.toLowerCase().split("\\W+"))
      .map((_, 1)).keyBy(0).sum(1)

    counts.print()
    env.execute("Streaming Count")
  }
}

 输出如下:

3> (hello,1)
1> (scala,1)
5> (world,1)
3> (hello,2)
3> (hello,3)
2> (java,1)

 

监听端口统计词频,代码如下:

package wordcount

import org.apache.flink.streaming.api.scala._

/**
  * Created by leboop on 2020/5/19.
  */
object StreamingWordCount {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment

    // get input data
    val text: DataStream[String] = env.socketTextStream("192.168.128.111", 6666)
    val counts = text.flatMap(_.toLowerCase().split("\\W+"))
      .map((_, 1)).keyBy(0).sum(1)

    counts.print()
    env.execute("Streaming Count")
  }
}

在192.168.128.111主机上执行如下命令启动端口

nc -lk 6666

启动程序对端口监听

如图:

IDEA搭建Flink开发环境及WordCount_Flink_03

 

5、batch和stream不同

(1)环境不同

batch:

val env = ExecutionEnvironment.getExecutionEnvironment

stream:

val env = StreamExecutionEnvironment.getExecutionEnvironment

(2)启动不同

streaming:

env.execute("Streaming Count")

batch:

启动程序即可。