搭建准备:

Linux环境,jdk-7u79-linux-i586.tar.gz,基本的shell知识,hadoop-2.6.0.tar.gz



1、配置主机名

vim /etc/hosts ----- 3节点都修改

192.168.8.201           h201

192.168.8.202           h202

192.168.8.203           h203



2、如果使用root权限则直接执行第3步;如果使用其他用户权限则执行第2步

(本次使用的为hadoop用户)

创建bigdata用户 密码:123  ------ 3节点都修改



3、安装JDK

  • 卸载系统自带jdk
for x in $(rpm -qa | grep java); do rpm -e --nodeps $x; done

解释:rpm -qa | grep java 查找rpm安装的工具并通过管道符过滤出带java关键字的包

通过for循环依次使用rpm -e --nodeps 命令强制卸载

  • 上传jdk-7u79-linux-i586.tar.gz并解压至/usr

命令:

tar -zxvf jdk-7u79-linux-i586.tar.gz -C /usr
  • 解压后切换至root用户修改系统环境变量/etc/profile

命令:

vim /etc/profile
export JAVA_HOME=/usr/jdk1.7.0_25
export JAVA_BIN=/usr/jdk1.7.0_25/bin
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME JAVA_BIN PATH CLASSPATH
  • 保存并退出然后source /etc/profile提升环境变量
  • 检查java环境是否安装成功 java  -version

Apache Hadoop集群2.6.0离线搭建_大数据



4、配置ssh免秘钥证书

切回至bigdata用户

ssh-keygen -t rsa

--- 连续回车即可


ssh-copy-id -i /home/bigdata/.ssh/id_rsa.pub h201 

ssh-copy-id -i /home/bigdata/.ssh/id_rsa.pub h202

ssh-copy-id -i /home/bigdata/.ssh/id_rsa.pub h203

--- 发送公钥到其它节点以及自己当前环境



5、解压hadoop-2.6.0.tar.gz

解压后加入环境变量

命令:

HADOOP_HOME=/home/bigdata/hadoop-2.6.0

HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

PATH=$HADOOP_HOME/bin:$PATH

export HADOOP_HOME HADOOP_CONF_DIR PATH

source /etc/profile

进入到hadoop解压目录下的etc/hadoop找到.xml文件

Apache Hadoop集群2.6.0离线搭建_hadoop集群搭建_02



注意以上操作需要在每台节点分别执行并且以下关于zookeeper和yarn的搭建工程后续更新.............



6、修改hadoop配置文件--core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://h201:9000</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>h201:2181,h202:2181,h203:2181</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>false</value>
</property>

<property>

<name>hadoop.proxyuser.spark.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.spark.groups</name>

<value>*</value>

</property>

</configuration>



7、修改hadoop配置文件--hdfs-site.xml

<configuration>
<!--指定hdfs的nameservice为masters,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
<!-- masters下面有两个NameNode,分别是h202,h203 -->
<property>
<name>dfs.ha.namenodes.masters</name>
<value>h202,h203 </value>
</property>
<!-- Master的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h201</name>
<value>h201:9000</value>
</property>
<!-- Master的http通信地址 -->
<property>
<name>dfs.namenode.http-address.masters.h201</name>
<value>h201:50070</value>
</property>
<!-- Slave1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.masters.h202</name>
<value>h202:9000</value>
</property>
<!-- Slave1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.masters.h202</name>
<value>h202:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://h201:8485;h202:8485;h203:8485/masters</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/journal</value>
</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>shell(/bin/true)</value>

</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>~/.ssh/id_dsa</value>
</property>
</configuration>



8、修改hadoop配置文件--mapred-site.xml

cp mapred-site.xml.template mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.</description>
</property>

<property>
<name>mapreduce.jobhistory.address</name>
<value>h201:10020</value>
<description>MapReduce JobHistoryServer IPC host:port</description>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>h201:19888</value>
<description>MapReduce JobHistoryServer Web UI host:port</description>
</property>



9、修改hadoop配置文件--yarn-site.xml

<configuration>
<!-- 指定resourcemanager地址 -->
<!-- 开启RM高可靠 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_HA_ID</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>h201</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>h202</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>h201:2181,h202:2181,h203:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>h201:8132</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>h202:8132</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>h201:8130</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>h202:8130</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>h201:8131</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>h202:8131</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>h201:8188</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>h202:8188</value>
</property>
</configuration>



10、修改hadoop配置文件--hadoop-env.sh

添加:

export JAVA_HOME=/usr/jdk1.7.0_25




11、修改hadoop配置文件--slaves

h201
h202
h203



注意当以上6-11步骤完成后

scp -r hadoop目录 hadoop@h202:/home/hadoop/
scp -r hadoop目录 hadoop@h203:/home/hadoop/



12、格式化

hadoop namenode -format



13、启动集群

切换至HADOOP_HOME/sbin下执行

./start-all.sh

主节点出现以下进程则表明hadoop集群没有问题

7054 SecondaryNameNode
7844 Jps
7318 NameNode
7598 ResourceManager
7584 QuorumPeerMain
7854 ResourceManager
7859 NodeManagers

从节点出现以下进程则表明hadoop集群没有问题

7054 SecondaryNameNode
7844 Jps
7598 ResourceManager
7584 QuorumPeerMain
7854 ResourceManager

注意:防火墙和安全模式需要关闭,否则可能导致心跳延迟

最终效果图50070端口此图非以上配置原图仅供参考:

Apache Hadoop集群2.6.0离线搭建_hadoop集群搭建_03


如果安装失败,其查看日志,并逐步排查问题。功夫不负有心人!!!!