hadoop集群模式安装是伪集成模式安装的继续,部分配置信息需要参照前述文档。
This document does not cover advanced topics such as Security or High Availability。
集群安装里面没有设计安全和高可用的话题。
HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN damones are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be used, then the MapReduce Job History Server will also be running. For large installations, these are generally running on separate hosts。
说一下,网管没有关于WebAppProxy的详细说明,暂时没有进行配置。
总体说明
▲硬件准备
物理机准备、虚拟机准备、MAC地址设置
▲网络及环境变量设置
配置主机名、设置IP地址
▲HADOOP集群配置
编辑文件:slaves、core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml
▲验证安装结果
访问hdfs web地址、访问yarn web地址、访问historyserver web地址
▲开关集群方式汇总
集群总开关、namenode开关、datanode开关、resourcemanager开关、nodemanager开关
一、硬件准备
1、安装之前,先看一下硬件,笔记本一台,i5-四核,4G内存,用了都×××年了。详细配置信息如下:
2、在一台笔记本上同时跑三个linux虚拟机,响应速度还可以,要点就是笔记本上的服务能关就关,给虚拟机多腾点内存。
3、调整虚拟机MAC地址
将一台linux虚拟机复制为三台虚拟机的时候,需要调整机器的MAC地址。MAC地址的调整包括两个部分,虚拟机配置文件*.vmx,另外一个就是linux内部的IP配置文件 /etc/sysconfig/network-scripts/ifcfg-eth***。
MAC地址不能任意修改,一般修改地址的后两位。详细说明可以参考网上文档。
二、网络及环境变量设置
1、设置环境变量
按官方文档要求设置,不过没看出来有什么太大的作用,或许后面会用到吧,姑且设之。
[root@localhost ~]# vi /etc/profile |
2、设置hosts文件,以便可以通过主机名访问
[root@localhost hadoop]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 #以下为新增内容 192.168.1.201 hadoop01 192.168.1.202 hadoop02 192.168.1.203 hadoop03 |
3、设置机器ID, 分别在三个不同的虚拟机上设置不同的IP地址。不同的机器, 配置文件名称不尽相同,以下ifcfg-eth3为举例说明。
[root@localhost hadoop]# vi /etc/sysconfig/network-scripts/ifcfg-eth3 BOOTPROTO=static DEFROUTE=yes IPV4_FAILURE_FATAL=no IPV6INIT=no NAME=eth3 UUID=ddf5b32f-428d-4a5b-af6c-bf8d0d232946 ONBOOT=yes PEERDNS=yes PEERROUTES=yes IPADDR=192.168.1.201 NETMASK=255.255.255.0 GATEWAY=192.168.1.254 |
4、配置主机名,redhat机器都是修改这个文件。
[hadoop@hadoop02 ~]$ vi /etc/sysconfig/network NETWORKING=yes #HOSTNAME=localhost.localdomainHOSTNAME=hadoop02 |
5、测试ssh连通情况
三台机器各自启动,网络相关设置完成后,验证一下hadoop用户的ssh连通情况,保证三台机器两两都可以联通,这样就可以在任意一台机器上启动集群。
三、HADOOP集群配置
1、配置datanode节点的主机名,如果带上hadoop01,则自动在hadoop01主机上也启动datanode节点,也就是是说在hadoop01上将同时包括 namenode 和datanode。
[hadoop@localhost ~]$ vi ./hadoop-2.7.2/etc/hadoop/slaves #localhost #fix 20160627 #hadoop01 hadoop02 hadoop03 |
2、设置临时目录
[hadoop@localhost ~]$ mkdir /home/hadoop/tmp |
3、配置core-site.xml,其中主要的配置是 fs.defaultFS,设置了namenode的主机为hadoop01。
[hadoop@localhost ~]$ vi ./hadoop-2.7.2/etc/hadoop/core-site.xml <!--add start 20160623 --> <property> <name>fs.defaultFS</name> <!-- modify start 20160627 <value>hdfs://localhost:9000</value> --> <value>hdfs://hadoop01:9000</value> <!-- modify end --> </property> <!--add end 20160623 --> <!--add start by 20160627 --> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/tmp</value> </property> <!--add end by 20160627 --> </configuration> |
4、设置数据存放目录
[hadoop@localhost ~]$ cd /home/hadoop [hadoop@localhost ~]$ mkdir dfs ----目录自动创建 [hadoop@localhost ~]$ mkdir dfs/name [hadoop@localhost ~]$ mkdir dfs/data |
5、设置hdfs-site.xml,主要设置数据冗余份数和工作目录
[hadoop@localhost ~]$ vi ./hadoop-2.7.2/etc/hadoop/hdfs-site.xml <!-- add start 20160623 --> <property> <name>dfs.replication</name> <!-- modify start 20160627 <value>1</value> --> <value>2</value> <!-- modify end 20160627 --> </property> <!-- add end 20160623 --> <!-- add start 20160627 --> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/dfs/data</value> </property> <!-- add end 20160627 --> </configuration> |
6、设置yarn-site.xml,设定resourcemanager的机器地址
[hadoop@localhost ~]$ vi ./hadoop-2.7.2/etc/hadoop/yarn-site.xml <!-- Site specific YARN configuration properties --> <!--add start 20160627 --> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>hadoop01:8032</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop01:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop01:8031</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>hadoop01:8033</value> </property> <property> <description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop01:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!-- add end 20160627 --> </configuration> |
7、设置jobhistory相关的访问地址
[hadoop@localhost ~]$ vi ./hadoop-2.7.2/etc/hadoop/mapred-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- add start 20160627 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop01:10020</value> <description>MapReduce JobHistory Server IPC host:port</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop01:19888</value> <description>MapReduce JobHistory Server Web UI host:port</description> </property> <!-- add end 20160627 --> </configuration> |
8、说明
以上文件可以在单台机器上配置完成,配置完成后通过远程拷贝命令分发给另外的两台机器
[hadoop@hadoop01 etc]$ cd /home/hadoop/hadoop-2.7.2/etc/ [hadoop@hadoop01 etc]$ scp -r hadoop hadoop02:$PWD [hadoop@hadoop01 etc]$ scp -r hadoop hadoop03:$PWD |
四、验证安装结果
1、连接到hadoop01机器,格式化namenode
[hadoop@localhost ~]$ $HADOOP_PREFIX/bin/hdfs namenode -format |
2、启动namenode、datanode、secondarynamenode、resourcemanager、nodemanager
[hadoop@localhost ~]$ start-all.sh 16/03/16 10:38:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hadoop01] hadoop01: starting namenode, logging to /home/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-namenode-localhost.localdomain.out hadoop02: starting datanode, logging to /home/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-localhost.localdomain.out hadoop03: starting datanode, logging to /home/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-localhost.localdomain.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out 16/03/16 10:38:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.7.2//logs/yarn-hadoop-resourcemanager-localhost.localdomain.out hadoop03: starting nodemanager, logging to /home/hadoop/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-localhost.localdomain.out hadoop02: starting nodemanager, logging to /home/hadoop/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-localhost.localdomain.out |
3、启动historyserver
[hadoop@hadoop01 ~]$ mr-jobhistory-daemon.sh start historyserver |
4、访问http://192.168.1.201:8088,ResourceManager状态为STARTED
5、访问http://192.168.1.201:50070,查看namenode,状态为active
6、访问http://192.168.1.201:50070,查看datanode节点状态,
hadoop02和hadoop03均为 In Service 状态
7、查看historyserver状态,http://192.168.1.201:19888
五、开关集群方式总汇
开/关namenode、datanode、secondarynamenode、resourcemanager、nodemanager
[hadoop@hadoop01 ~]$ start-all.sh
[hadoop@hadoop01 ~]$ stop-all.sh
同时开/关namenode、datanode命令
[hadoop@hadoop01 ~]$ start-dfs.sh
[hadoop@hadoop01 ~]$ stop-dfs.sh
同时开/关resourcemanager、nodemanager命令
[hadoop@hadoop01 ~]$ start-yarn.sh
[hadoop@hadoop01 ~]$ stop-yarn.sh
开/关namenode命令
[hadoop@hadoop01 ~]$ hadoop-daemon.sh --script hdfs start namenode
[hadoop@hadoop01 ~]$ hadoop-daemon.sh --script hdfs stop namenode
开/关datanode命令,该命令只启动本机节点
[hadoop@hadoop01 ~]$ hadoop-daemon.sh --script hdfs start datanode
[hadoop@hadoop01 ~]$ hadoop-daemon.sh --script hdfs stop datanode
开/关resourcemanager命令
[hadoop@hadoop01 ~]$ yarn-daemon.sh start resourcemanager
[hadoop@hadoop01 ~]$ yarn-daemon.sh stop resourcemanager
开/关nodemanager命令,该命令只启动本机节点
[hadoop@hadoop01 ~]$ yarn-daemon.sh start nodemanager
[hadoop@hadoop01 ~]$ yarn-daemon.sh stop nodemanager
开/关historyserver命令
[hadoop@hadoop01 ~]$ mr-jobhistory-daemon.sh start historyserver
[hadoop@hadoop01 ~]$ mr-jobhistory-daemon.sh stop historyserver