1、idea中新建一个Maven项目
自定义名称
2、编辑pom文件,包括spark、scala、hadoop等properties版本信息、dependencies依赖、和plugins 插件信息
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">4.0.0org.examplesparktest1.0-SNAPSHOTUTF-82.11.82.4.5-hw-ei-3020023.1.1-hw-ei-3020022.2.3-hw-ei-302002org.scala-langscala-library${scala.version}org.apache.sparkspark-core_2.11${spark.version}org.apache.hadoophadoop-client${hadoop.version}org.apache.hadoophadoop-common${hadoop.version}org.apache.hadoophadoop-mapreduce-client-core${hadoop.version}org.scala-toolsmaven-scala-plugin2.15.2modified-onlymain-scalacprocess-resourcesadd-sourcecompilescala-test-compileprocess-test-resourcestestCompileorg.apache.maven.pluginsmaven-compiler-plugin3.1compilecompile1.81.8targettarget/classestarget/test-classes
创建derectory,命名:scala。看颜色同java不一样,此时还不是源码包,不能添加class
3、点击reimport ,也就是小圆圈,变成源码包
4、创建scala class,编写程序
创建object,自定义名称
5、编辑好程序,双击package打包,即可生成jar包
package cn.study.spark.scalaimport org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext, rdd} object WordCount{ def main(args: Array[String]): Unit = { //创建spark配置,设置应用程序名称 val conf = new SparkConf().setAppName("ScalaWordCount") //创建spark执行的入口 var sc = new SparkContext(conf) //指定读取创建RDD(弹性分布式数据集)的文件 //sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).saveAsTextFile(args(1)) val lines:RDD[String] = sc.textFile(args(0)) //切分压平 val words:RDD[String] = lines.flatMap(_.split(" ")) //将单词和一组合 val wordAndOne:RDD[(String,Int)] = words.map((_,1)) //按key进行聚合 val reduced:RDD[(String,Int)] = wordAndOne.reduceByKey(_+_) val sorted:RDD[(String,Int)] = reduced.sortBy(_._2,false) //将结果保存到hdfs中 sorted.saveAsTextFile(args(1)) //释放资源 sc.stop() } }
6、jar包上传集群节点,hdfs上传存放word的文件
7、运行程序跑起来
spark-submit --master yarn --deploy-mode client --class cn.study.spark.scala.WordCount sparktest-1.0-SNAPSHOT.jar /tmp/test/t1.txt /tmp/test/t2
8、查看结果