spark解析aa.txt

 

 

 

1、aa.txt

spark springboot   实例WordCount.scala20221021_scala

 

 

2、pom.xml
 

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.1.1</version>
    <exclusions>
      <exclusion>
      <groupId>io.netty</groupId>
      <artifactId>netty-all</artifactId>
      </exclusion>
    </exclusions>
  </dependency>
  <dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-all</artifactId>
    <version>4.1.17.Final</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.6.2</version>
  </dependency>

3、添加scala SDK

spark springboot   实例WordCount.scala20221021_spark_02

 

 

spark springboot   实例WordCount.scala20221021_spark_03

 

 

spark springboot   实例WordCount.scala20221021_apache_04

 

 

 

 

 

 

4、WordCount.scala

spark springboot   实例WordCount.scala20221021_apache_05

 

 

spark springboot   实例WordCount.scala20221021_spark_06

 

 

import org.apache.spark.{SparkConf, SparkContext}

object WordCount {
  def main(args: Array[String]): Unit = {

  // 1. 创建 SparkConf对象, 并设置 App名字, 并设置为 local 模式
  val conf: SparkConf = new SparkConf().setAppName("WordCount").setMaster("local[*]")

  // 2. 创建SparkContext对象
  val sc = new SparkContext(conf)

  // 3. 使用sc创建RDD并执行相应的transformation和action
  val lineRDD = sc.textFile(args(0))
  val wordCount: Array[(String, Int)] =lineRDD.flatMap(_.split(" "))
  .map((_, 1))
  .reduceByKey(_ + _)
  .collect()
  wordCount.foreach(println)

  // 4. 关闭连接
  sc.stop()
  }
}

 

 

 

 

5、调试配置

spark springboot   实例WordCount.scala20221021_scala_07

 

 

spark springboot   实例WordCount.scala20221021_apache_08

 

 

 

 

 

 

6、输出结果

 

spark springboot   实例WordCount.scala20221021_spark_09

 

 

 

 

 

 

参考 https://cloud.tencent.com/developer/article/1733758