1、Scala 安装


http://www.scala-lang.org/files/archive/scala-2.10.4.tgz

tar -zxvf scala-2.10.4.tgz -C app/
cd  app
ln -s scala-2.10.4 scala




2、Spark 安装


tar -zxvf spark-1.4.0-bin-hadoop2.6.tgz -C app


ln -s spark-1.4.0-bin-hadoop2.6 spark




# vim spark-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_76
export SCALA_HOME=/home/hadoop/app/scala
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0




## worker节点的主机名列表
# vim slaves

192.168.2.20
192.168.2.33



# mv log4j.properties.template log4j.properties






## 在Master节点上执行


 cd  $SPARK_HOME/bin



./start-all.sh




3、配置系统环境变量


vim /etc/profile
export SCALA_HOME=/home/hadoop/app/scala
export SPARK_HOME=/home/hadoop/app/spark
export PATH=$PATH:$HIVE_HOME/bin:$HBASE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin






source /etc/profile




4、相关测试
## 监控页面URL
http://192.168.2.20:8080/




## 先切换到“cd $SPARK_HOME”目录




(1)、本地模式
#进行spark-shell命令



./spark-shell 



#测试



sc.textFile("/home/hadoop/wc.txt").flatMap( line=>line.split("\t") ).map( word=>(word,1) ).reduceByKey(_ + _).collect



#验证



http://192.168.2.20:4040/





(2)、 基于YARN模式



cd $SPARK_HOME
bin/spark-submit  --class  org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 1 \
lib/spark-examples*.jar  10









执行步骤出现的日志



[hadoop@mycluster spark]$ bin/spark-submit  --class  org.apache.spark.examples.SparkPi \
> --master yarn-cluster \
> --num-executors 3 \
> --driver-memory 1g \
> --executor-memory 1g \
> -executor-cores 1 \
> lib/spark-examples*.jar  10
Error: Unrecognized option '-executor-cores'.
Run with --help for usage help or --verbose for debug output
[hadoop@mycluster spark]$ bin/spark-submit  --class  org.apache.spark.examples.SparkPi \
> --master yarn-cluster \
> --num-executors 3 \
> --driver-memory 1g \
> --executor-memory 1g \
> --executor-cores 1 \
> lib/spark-examples*.jar  10
15/08/30 22:53:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/08/30 22:53:29 INFO RMProxy:  Connecting to ResourceManager at mycluster/192.168.2.20:8032
15/08/30 22:53:29 INFO Client:  Requesting a new application from cluster with 1 NodeManagers
15/08/30 22:53:29 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/08/30 22:53:29 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
15/08/30 22:53:29 INFO Client: Setting up container launch context for our AM
15/08/30 22:53:29 INFO Client: Preparing resources for our AM container
15/08/30 22:53:30 INFO Client: Uploading resource file:/home/hadoop/app/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar -> hdfs://mycluster:9000/user/hadoop/.sparkStaging/application_1440995865051_0005/spark-assembly-1.4.0-hadoop2.6.0.jar
15/08/30 22:53:33 INFO Client: Uploading resource file:/home/hadoop/app/spark-1.4.0-bin-hadoop2.6/lib/spark-examples-1.4.0-hadoop2.6.0.jar -> hdfs://mycluster:9000/user/hadoop/.sparkStaging/application_1440995865051_0005/spark-examples-1.4.0-hadoop2.6.0.jar
15/08/30 22:53:39 INFO Client: Uploading resource file:/tmp/spark-ecb5f2dc-f66b-42e6-a8ae-befce75074c0/__hadoop_conf__846873578807129658.zip -> hdfs://mycluster:9000/user/hadoop/.sparkStaging/application_1440995865051_0005/__hadoop_conf__846873578807129658.zip
15/08/30 22:53:40 INFO Client: Setting up the launch environment for our AM container
15/08/30 22:53:40 INFO SecurityManager: Changing view acls to: hadoop
15/08/30 22:53:40 INFO SecurityManager: Changing modify acls to: hadoop
15/08/30 22:53:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/08/30 22:53:40 INFO Client: Submitting application 5 to ResourceManager
15/08/30 22:53:40 INFO YarnClientImpl: Submitted application application_1440995865051_0005
15/08/30 22:53:41 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:41 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1441000420286
         final status: UNDEFINED
         tracking URL: http://mycluster:8088/proxy/application_1440995865051_0005/
         user: hadoop
15/08/30 22:53:43 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:45 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:46 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:48 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:50 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:52 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:54 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:56 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:57 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:58 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:59 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:00 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:01 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:02 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:03 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:04 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:04 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.2.20
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1441000420286
         final status: UNDEFINED
         tracking URL: http://mycluster:8088/proxy/application_1440995865051_0005/
         user: hadoop
15/08/30 22:54:05 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:06 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:07 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:08 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:09 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:10 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:11 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:12 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:13 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:15 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:17 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:18 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:19 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:20 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:21 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:23 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:24 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:25 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:26 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:27 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:29 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:30 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:31 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:33 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:34 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:36 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:37 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:38 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:40 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:41 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:42 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:43 INFO Client: Application report for application_1440995865051_0005 (state: FINISHED)
15/08/30 22:54:43 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.2.20
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1441000420286
         final status: SUCCEEDED
         tracking URL: http://mycluster:8088/proxy/application_1440995865051_0005/A
         user: hadoop
15/08/30 22:54:43 INFO Utils: Shutdown hook called
15/08/30 22:54:43 INFO Utils: Deleting directory /tmp/spark-ecb5f2dc-f66b-42e6-a8ae-befce75074c0








常见问题:



基于YARN模式下执行上述spark-submit,出现下面的错误



[hadoop@mycluster spark]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi   --master yarn-cluster   --master yarn-cluster 10
Exception in thread "main" java.lang.Exception: When running with master 'yarn-cluster' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
        at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:239)
        at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:216)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:103)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/08/30 22:25:45 INFO Utils: Shutdown hook called






解决方案: 配置



cd $SPARK_HOME/conf



vi spark-env.sh 
# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.6.0/etc/hadoop