1、idea中新建一个Maven项目

idea编写wordcount程序及spark-submit运行_spark

 

idea编写wordcount程序及spark-submit运行_spark_02

 

自定义名称

 idea编写wordcount程序及spark-submit运行_spark_03

 

 2、编辑pom文件,包括spark、scala、hadoop等properties版本信息、dependencies依赖、和plugins 插件信息

idea编写wordcount程序及spark-submit运行_spark_04

 

 

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">4.0.0org.examplesparktest1.0-SNAPSHOTUTF-82.11.82.4.5-hw-ei-3020023.1.1-hw-ei-3020022.2.3-hw-ei-302002org.scala-langscala-library${scala.version}org.apache.sparkspark-core_2.11${spark.version}org.apache.hadoophadoop-client${hadoop.version}org.apache.hadoophadoop-common${hadoop.version}org.apache.hadoophadoop-mapreduce-client-core${hadoop.version}org.scala-toolsmaven-scala-plugin2.15.2modified-onlymain-scalacprocess-resourcesadd-sourcecompilescala-test-compileprocess-test-resourcestestCompileorg.apache.maven.pluginsmaven-compiler-plugin3.1compilecompile1.81.8targettarget/classestarget/test-classes

 

 创建derectory,命名:scala。看颜色同java不一样,此时还不是源码包,不能添加class

 idea编写wordcount程序及spark-submit运行_spark_05

 

3、点击reimport ,也就是小圆圈,变成源码包

 idea编写wordcount程序及spark-submit运行_spark_06

 

4、创建scala class,编写程序

 idea编写wordcount程序及spark-submit运行_spark_07

 

 创建object,自定义名称

idea编写wordcount程序及spark-submit运行_spark_08

 

 5、编辑好程序,双击package打包,即可生成jar包

idea编写wordcount程序及spark-submit运行_spark_09

 

package cn.study.spark.scalaimport org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext, rdd}

object WordCount{  def main(args: Array[String]): Unit = {    //创建spark配置,设置应用程序名称
    val conf = new SparkConf().setAppName("ScalaWordCount")    //创建spark执行的入口
    var sc = new SparkContext(conf)    //指定读取创建RDD(弹性分布式数据集)的文件    //sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).saveAsTextFile(args(1))
    val lines:RDD[String] = sc.textFile(args(0))    //切分压平
    val words:RDD[String] = lines.flatMap(_.split(" "))    //将单词和一组合
    val wordAndOne:RDD[(String,Int)] = words.map((_,1))    //按key进行聚合
    val reduced:RDD[(String,Int)] = wordAndOne.reduceByKey(_+_)
    val sorted:RDD[(String,Int)]  = reduced.sortBy(_._2,false)    //将结果保存到hdfs中
    sorted.saveAsTextFile(args(1))    //释放资源
    sc.stop()

  }

}

6、jar包上传集群节点,hdfs上传存放word的文件

idea编写wordcount程序及spark-submit运行_spark_10


7、运行程序跑起来

spark-submit --master yarn --deploy-mode client --class cn.study.spark.scala.WordCount sparktest-1.0-SNAPSHOT.jar /tmp/test/t1.txt /tmp/test/t2

 8、查看结果

idea编写wordcount程序及spark-submit运行_spark_11