1、创建一个Maven项目并配置Java SDK和Scala SDK,如图:
这里选择的是jdk1.8和scala2.12版本。
2、添加pom依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.leboop</groupId>
<artifactId>com.leboop.www</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.12</scala.version>
<flink.version>1.9.3</flink.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
</project>
scala和flink版本分别为2.12和1.9.3。
3、BatchWordCount
批处理的WordCount程序代码如下:
package wordcount
import org.apache.flink.streaming.api.scala._
import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}
object BatchWordCount {
def main(args: Array[String]) {
val env = ExecutionEnvironment.getExecutionEnvironment
val filePath = "G:\\idea_workspace\\myflink\\src\\main\\resources\\word.txt"
// get input data
val text: DataSet[String] = env.readTextFile(filePath)
val counts = text.flatMap(_.toLowerCase().split("\\W+"))
.map((_, 1)).groupBy(0).sum(1)
counts.print()
}
}
程序读取word.txt文件,统计词频。word.txt内容如下:
hello world
hello java
hello scala
输出如下:
(scala,1)
(world,1)
(hello,3)
(java,1)
4、StreamingWordCount
使用流式处理统计词频,代码如下:
package wordcount
import org.apache.flink.streaming.api.scala._
/**
* Created by leboop on 2020/5/19.
*/
object StreamingWordCount {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val filePath = "G:\\idea_workspace\\myflink\\src\\main\\resources\\word.txt"
// get input data
val text: DataStream[String] = env.readTextFile(filePath)
val counts = text.flatMap(_.toLowerCase().split("\\W+"))
.map((_, 1)).keyBy(0).sum(1)
counts.print()
env.execute("Streaming Count")
}
}
输出如下:
3> (hello,1)
1> (scala,1)
5> (world,1)
3> (hello,2)
3> (hello,3)
2> (java,1)
监听端口统计词频,代码如下:
package wordcount
import org.apache.flink.streaming.api.scala._
/**
* Created by leboop on 2020/5/19.
*/
object StreamingWordCount {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// get input data
val text: DataStream[String] = env.socketTextStream("192.168.128.111", 6666)
val counts = text.flatMap(_.toLowerCase().split("\\W+"))
.map((_, 1)).keyBy(0).sum(1)
counts.print()
env.execute("Streaming Count")
}
}
在192.168.128.111主机上执行如下命令启动端口
nc -lk 6666
启动程序对端口监听
如图:
5、batch和stream不同
(1)环境不同
batch:
val env = ExecutionEnvironment.getExecutionEnvironment
stream:
val env = StreamExecutionEnvironment.getExecutionEnvironment
(2)启动不同
streaming:
env.execute("Streaming Count")
batch:
启动程序即可。