一、安装hadoop-2.5.0-cdh5.3.6
----------------------------------------------
1.下载安装包 http://archive.cloudera.com/cdh5/cdh/5/)
2.将hadoop包进行解压缩:tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz
3.对hadoop目录进行重命名:
mv /package/hadoop-2.5.0-cdh5.3.6 /soft
ln -s /soft/hadoop-2.5.0-cdh5.3.6/ /soft/hadoop
4.配置hadoop相关环境变量
nano ~/.bashrc
export HADOOP_HOME=/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bashrc
5.创建/data/hadoop目录
6.修改core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://s101:9000</value>
</property>
7.修改hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/datanode</value>
</property>
<property>
<name>dfs.tmp.dir</name>
<value>/data/hadoop/tmp</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>s105:9000</value>
</property>
8. 修改mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
9. 修改yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>s101</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
10.配置slaves
s102
s103
s104
二、编写shell脚本
---------------------------------------------
[xcall.sh ]
#!/bin/bash
params=$@
i=101
for (( i=101 ; i <= 105 ; i = $i + 1 )) ; do
tput setaf 2
echo ============= s$i =============
tput setaf 7
ssh -4 s$i "source /etc/profile ; $params"
done
[xcopy.sh]
#!/bin/bash
#################
#x copy
#################
# argu < 1 , no args
if [ $# -lt 1 ]
then
echo no args!
exit
fi
#get first argument
arg1=$1
#get current userName
cuser=`whoami`
#get fileName
fname=`basename $arg1`
#get dir
dir=`dirname $arg1`
#dir= . or dir = /xxx/xx,get absPath
if [ "$dir" = "." ]
then
dir=`pwd`
fi
for((i=102;i<=105;i=i+1))
do
echo ---- coping $arg1 to s$i ----
if [ -d $arg1 ]
then
scp -r $arg1 $cuser@s$i:$dir
else
scp $arg1 $cuser@s$i:$dir
fi
echo
done
[xrm.sh]
#!/bin/bash
#################
#x rm
#################
# argu < 1 , no args
if [ $# -lt 1 ]
then
echo no args!
exit
fi
#get first argument
arg1=$1
#get current userName
cuser=`whoami`
#get fileName
fname=`basename $arg1`
#get dir
dir=`dirname $arg1`
#dir= . or dir = /xxx/xx,get absPath
if [ "$dir" = "." ]
then
dir=`pwd`
fi
echo ---- rming $arg1 from localhost ----
rm -rf $arg1
echo
for((i=102;i<=105;i=i+1))
do
echo ---- rming $arg1 to s$i ----
ssh s$i rm -rf $dir/$fname
echo
done
三、拷贝hadoop到其他机器
------------------------------------------------
1.使用xcopy.sh 拷贝hadoop文件夹
2.拷贝环境变量文件
3.在其他机器创建 /data/hadoop目录
xcall.sh "mkdir /data/hadoop"
4.启动hadoop集群
a.格式化namenode:在s101上执行以下命令,hdfs namenode -format
b.启动hdfs集群:start-dfs.sh
c.验证启动是否成功:jps、50070端口
四、安装 hive-0.13.1-cdh5.3.6
--------------------------------------------
1.将课程提供的hive-0.13.1-cdh5.3.6.tar.gz 使用WinSCP上传到s101
2.解压缩hive安装包:tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz
3.重命名hive目录:mv hive-0.13.1-cdh5.3.6 hive
4.配置hive相关的环境变量
nano /etc/profile
export HIVE_HOME=/soft/hive
export PATH=$PATH:$HIVE_HOME/bin
source /etc/profile
5.配置配置文件
[配置hive-site.xml]
mv hive-default.xml.template hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://s101:3306/hive_metadata?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
[配置hive-env.sh和hive-config.sh]
mv hive-env.sh.template hive-env.sh
nano /soft/hive/bin/hive-config.sh
export JAVA_HOME=/soft/jdk
export HIVE_HOME=/soft/hive
export HADOOP_HOME=/soft/hadoop
6.验证hive是否成功安装
五、使用yum在centos-1上安装mysql
-------------------------------------------------
1.在centos-1上安装mysql。
据说centos7没有mysql 的yum源,使用
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
从网上下载这个rpm包,下载好rpm包后安装这个包
rpm -ivh mysql-community-release-el7-5.noarch.rpm
安装好这个rpm包后,开始安装mysql-server
sudo yum install mysql-server
2.使用yum安装mysql server。
yum install -y mysql-server
service mysqld start
chkconfig mysqld on
3.使用yum安装mysql connector
yum install -y mysql-connector-java
4.将mysql connector拷贝到hive的lib包中
cp /usr/share/java/mysql-connector-java-5.1.17.jar /usr/local/hive/lib
5.在mysql上创建hive元数据库,创建hive账号,并进行授权
create database if not exists hive_metadata;
grant all privileges on hive_metadata.* to 'hive'@'%' identified by 'hive';
grant all privileges on hive_metadata.* to 'hive'@'localhost' identified by 'hive';
grant all privileges on hive_metadata.* to 'hive'@'s101' identified by 'hive';
flush privileges;
use hive_metadata;
六、搭建zk
--------------------------------------
1.安装zk
a.将课程提供的zookeeper-3.4.5-cdh5.3.6.tar.gz使用WinSCP拷贝到sparkproject1的/usr/local目录下。
b.对zookeeper-3.4.5-cdh5.3.6.tar.gz进行解压缩:tar -zxvf zookeeper-3.4.5-cdh5.3.6.tar.gz。
c.对zookeeper目录进行重命名:mv zookeeper-3.4.5-cdh5.3.6 zk。
d.配置zookeeper相关的环境变量
vi ~/.bashrc
export ZOOKEEPER_HOME=/soft/zk
export PATH=$PATH:$ZOOKEEPER_HOME/bin
source ~/.bashrc
2.配置zk
cd zk/conf
mv zoo_sample.cfg zoo.cfg
vi zoo.cfg
修改:dataDir=/usr/local/zk/data
新增:
server.0=sparkproject1:2888:3888
server.1=sparkproject2:2888:3888
server.2=sparkproject3:2888:3888
3.设置zk节点标识
cd zk
mkdir data
cd data
vi myid
0
4. 搭建zk集群
a.在另外两个节点上按照上述步骤配置ZooKeeper,使用scp将zk和.bashrc拷贝到spark2和spark3上即可。
b.唯一的区别是spark2和spark3的标识号分别设置为1和2
5. 启动ZooKeeper集群
a.分别在三台机器上执行:zkServer.sh start。
b.检查ZooKeeper状态:zkServer.sh status,应该是一个leader,两个follower
c.jps:检查三个节点是否都有QuromPeerMain进程
七、安装scala
-------------------------------------
1、将课程提供的scala-2.11.4.tgz使用WinSCP拷贝到sparkproject1的/usr/local目录下。
2、对scala-2.11.4.tgz进行解压缩:tar -zxvf scala-2.11.4.tgz。
3、对scala目录进行重命名:mv scala-2.11.4 scala
4、配置scala相关的环境变量
vi ~/.bashrc
export SCALA_HOME=/usr/local/scala
export PATH=$SCALA_HOME/bin
source ~/.bashrc
5、查看scala是否安装成功:scala -version
6、按照上述步骤在sparkproject2和sparkproject3机器上都安装好scala。使用scp将scala和.bashrc拷贝到另外两台机器上即可。
八、安装kafka
-------------------------------------
1、将课程提供的kafka_2.9.2-0.8.1.tgz使用WinSCP拷贝到sparkproject1的/usr/local目录下。
2、对kafka_2.9.2-0.8.1.tgz进行解压缩:tar -zxvf kafka_2.9.2-0.8.1.tgz。
3、对kafka目录进行改名:mv kafka_2.9.2-0.8.1 kafka
4、配置kafka
vi /usr/local/kafka/config/server.properties
broker.id:依次增长的整数,0、1、2,集群中Broker的唯一id
zookeeper.connect=192.168.1.105:2181,192.168.1.106:2181,192.168.1.107:2181
5、安装slf4j
将课程提供的slf4j-1.7.6.zip上传到/usr/local目录下
unzip slf4j-1.7.6.zip
把slf4j中的slf4j-nop-1.7.6.jar复制到kafka的libs目录下面
6.启动kafka集群
a.解决kafka Unrecognized VM option 'UseCompressedOops'问题
vi /usr/local/kafka/bin/kafka-run-class.sh
if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then
KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true"
fi
去掉-XX:+UseCompressedOops即可
b.在三台机器上的kafka目录下,分别执行以下命令:nohup /soft/kafka/bin/kafka-server-start.sh /soft/kafka/config/server.properties &
c.使用jps检查启动是否成功
7. 测试kafka集群
使用基本命令检查kafka是否搭建成功
/soft/kafka/bin/kafka-topics.sh --zookeeper s101:2181,s101:2181,s103:2181 --topic TestTopic --replication-factor 1 --partitions 1 --create
/soft/kafka/bin/kafka-console-producer.sh --broker-list s101:9092,s102:9092,s103:9092 --topic TestTopic
/soft/kafka/bin/kafka-console-consumer.sh --zookeeper s101:2181,s101:2181,s103:2181 --topic TestTopic --from-beginning
九、安装flume
------------------------------------------------
1.安装flume
a.将课程提供的flume-ng-1.5.0-cdh5.3.6.tar.gz使用WinSCP拷贝到sparkproject1的/usr/local目录下。
b.对flume进行解压缩:tar -zxvf flume-ng-1.5.0-cdh5.3.6.tar.gz
c.对flume目录进行重命名:mv apache-flume-1.5.0-cdh5.3.6-bin flume
d.配置scala相关的环境变量
vi ~/.bashrc
export FLUME_HOME=/usr/local/flume
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=$FLUME_HOME/bin
source ~/.bashrc
2.配置flume
vi /usr/local/flume/conf/flume-conf.properties
#agent1表示代理名称
agent1.sources=source1
agent1.sinks=sink1
agent1.channels=channel1
#配置source1
agent1.sources.source1.type=spooldir
agent1.sources.source1.spoolDir=/usr/local/logs
agent1.sources.source1.channels=channel1
agent1.sources.source1.fileHeader = false
agent1.sources.source1.interceptors = i1
agent1.sources.source1.interceptors.i1.type = timestamp
#配置channel1
agent1.channels.channel1.type=file
agent1.channels.channel1.checkpointDir=/usr/local/logs_tmp_cp
agent1.channels.channel1.dataDirs=/usr/local/logs_tmp
#配置sink1
agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=hdfs://s101:9000/logs
agent1.sinks.sink1.hdfs.fileType=DataStream
agent1.sinks.sink1.hdfs.writeFormat=TEXT
agent1.sinks.sink1.hdfs.rollInterval=1
agent1.sinks.sink1.channel=channel1
agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d
3.测试flume
flume-ng agent -n agent1 -c conf -f /soft/flume/conf/flume-conf.properties -Dflume.root.logger=DEBUG,console
新建一份文件,移动到/usr/local/logs目录下,flume就会自动上传到HDFS的/logs目录中
十、安装Spark
-----------------------------------------
1.安装spark
a.将spark-1.5.1-bin-hadoop2.4.tgz使用WinSCP上传到/usr/local目录下。
b.解压缩spark包:tar -zxvf spark-1.5.1-bin-hadoop2.4.tgz。
c.重命名spark目录:mv spark-1.5.1-bin-hadoop2.4 spark
d.修改spark环境变量
vi ~/.bashrc
export SPARK_HOME=/usr/local/spark
export PATH=$SPARK_HOME/bin
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
source ~/.bashrc
2.修改spark-env.sh文件
a.cd /usr/local/spark/conf
b.cp spark-env.sh.template spark-env.sh
c.vi spark-env.sh
export JAVA_HOME=/soft/jdk
export SCALA_HOME=/soft/scala
export HADOOP_HOME=/soft/hadoop
export HADOOP_CONF_DIR=/soft/hadoop/etc/hadoop
3.用yarn-client模式提交spark作业
/soft/spark/bin/spark-submit \
--class org.apache.spark.examples.JavaSparkPi \
--master yarn-client \
--num-executors 1 \
--driver-memory 10m \
--executor-memory 10m \
--executor-cores 1 \
/soft/spark/lib/spark-examples-1.5.1-hadoop2.4.0.jar \
4. 用yarn-cluster模式提交spark作业
/soft/spark/bin/spark-submit \
--class org.apache.spark.examples.JavaSparkPi \
--master yarn-cluster \
--num-executors 1 \
--driver-memory 10m \
--executor-memory 10m \
--executor-cores 1 \
/soft/spark/lib/spark-examples-1.5.1-hadoop2.4.0.jar \