Zookeeper和hadoop的安装与部署
- Zookeeper的安装与部署
- hadoop高可用集群的搭建部署
- 一、配置虚拟机
- 二、安装hadoop
- 4.修改相关配置
- (1)修改core-site.xml
- (2)修改hdfs-site.xml
- (3)修改yarn-site.xml
- (4).修改mapred-site.xml(该文件不存在,需要手动创建)
- (5)修改slaves文件
- (6)f.修改hadoop-env.sh和yarn-env.sh文件,指定jdk的地址
- (7)配置环境变量
- (8)将hadoop复制到其他两台上
- 启动hadoop
- 3.将hadoop文件夹下的tmp文件传到hadoop-2
- 4.启动hadoop
- 5.jps查看
- 6.通过Web界面查看hadoop信息
- 7.测试namenode高可用
Zookeeper的安装与部署
zookeeper是用来进行分布式服务的协调,集群也是一个主集群,它一般是由一个Leader(领导者)和多个Follower(跟随者)组成。
1.下载安装包
2.上传安装包
3.解压安装包
4.zookeeper的相关配置
(1)配置zoo.cfg文件
设置数据文件目录及持久化路径
dataDir=/export/data/zookeeper/zkdata
配置zookeeper集群的服务器编号及对应的三个主机名选举端口号和通讯端口号
29 server.1=master:2888:3888
30 server.2=slave1:2888:3888
31 server.3=slave2:2888:3888
(2)创建myid文件
创建数据文件目录
mkdir -p /export/data/zookeeper/zkdata
新建myid文件在其中写一个1
vi myid
(3)配置环境变量
(4)分发相关文件到其他节点
scp -r /export/software/zookeeper-3.5.9 slave1:/export/software/
scp -r /export/software/zookeeper-3.5.9 slave2:/export/software/
(5)生效环境变量
source /etc/profile
5启动和关闭zookeeper
境变量
source /etc/profile
5.启动和关闭zookeeper
zkServer.sh start
hadoop高可用集群的搭建部署
hadoop-1 | hadoop-2 | hadoop-3 |
NodeManager | NodeManager | NodeManager |
NameNode | NameNode | |
DataNode | DataNode | DataNode |
DFSZKFailoverController | DFSZKFailoverController | |
JournalNode | JournalNode | JournalNode |
ResourceManager | ResourceManager | ResourceManager |
QuorumPeerMain | QuorumPeerMain | QuorumPeerMain |
一、配置虚拟机
1.新建虚拟机:hadoop-1、hadoop-2、hadoop-3
2.配置hosts映射
192.168.95.3 hadoop-1
192.168.95.4 hadoop-2
192.168.95.5 hadoop-3
- 配置服务器间ssh免密码登陆(3台)
(1)生成秘钥
ssh-keygen
(2)拷贝秘钥
ssh-copy-id hadoop-1
ssh-copy-id hadoop-2
ssh-copy-id hadoop-3
(3)检测是否成功
ssh hadoop-1
ssh hadoop-2
shh hadoop-3
二、安装hadoop
1.下载hadoop安装包
2.解压hadoop
tar -zxvf hadoop-2.4.1.tar.gz
3.移动到/export/software/
mv hadoop-2.4.1 /export/software/
4.修改相关配置
(1)修改core-site.xml
<configuration>
<!--指定hdfs连接地址,集群模式(高可用)-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/export/software/hadoop-2.4.1/tmp</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>false</value>
</property>
<!-- 指定ZooKeeper集群的地址和端口。注意,数量一定是奇数,且不少于三个节点-->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop-1:2181,hadoop-2:2181,hadoop-3:2181</value>
</property>
</configuration>
(2)修改hdfs-site.xml
<configuration>
<!--指定HDFS副本的数量,不能超过机器节点数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 为namenode集群定义一个services name -->
<property>
<name>dfs.nameservices</name>
<value>cluster</value>
</property>
<!-- nameservice 包含哪些namenode,为各个namenode起名 -->
<property>
<name>dfs.ha.namenodes.cluster</name>
<value>nn01,nn02</value>
</property>
<!-- 名为nn01的namenode的rpc地址和端口号,rpc用来和datanode通讯 -->
<property>
<name>dfs.namenode.rpc-address.cluster.nn01</name>
<value>hadoop-1:9000</value>
</property>
<!--名为nn01的namenode的http地址和端口号,用来和web客户端通讯 -->
<property>
<name>dfs.namenode.http-address.cluster.nn01</name>
<value>hadoop-1:50070</value>
</property>
<!-- 名为nn02的namenode的rpc地址和端口号,rpc用来和datanode通讯 -->
<property>
<name>dfs.namenode.rpc-address.cluster.nn02</name>
<value>hadoop-2:9000</value>
</property>
<!--名为nn02的namenode的http地址和端口号,用来和web客户端通讯 -->
<property>
<name>dfs.namenode.http-address.cluster.nn02</name>
<value>hadoop-2:50070</value>
</property>
<!-- namenode间用于共享编辑日志的journal节点列表 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop-1:8485;hadoop-2:8485;hadoop-3:8485/cluster</value>
</property>
<!-- journalnode 上用于存放edits日志的目录 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/export/data/hadoop/journaldata</value>
</property>
<!-- 指定该集群出现故障时,是否自动切换到另一台namenode -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.cluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 一旦需要NameNode切换,使用ssh方式进行操作 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence
shell(/bin/true)
</value>
</property>
<!-- 如果使用ssh进行故障切换,使用ssh通信时用的密钥存储的位置 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- connect-timeout超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/export/software/hadoop-2.4.1/tmp/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/export/software/hadoop-2.4.1/tmp/dfs/data</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
(3)修改yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 启用Resource Manager HA高可用性 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- NodeManager上运行的附属服务,默认是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定resourcemanager的名字 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 使用了2个resourcemanager,分别指定Resourcemanager的地址 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 指定rm1的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop-1</value>
</property>
<!-- 指定rm2的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop-2</value>
</property>
<!-- 指定zookeeper集群机器 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop-1:2181,hadoop-2:2181,hadoop-3:2181</value>
</property>
</configuration>
(4).修改mapred-site.xml(该文件不存在,需要手动创建)
<configuration>
<!-- 采用yarn作为mapreduce的资源调度框架 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
(5)修改slaves文件
hadoop-1
hadoop-2
hadoop-3
(6)f.修改hadoop-env.sh和yarn-env.sh文件,指定jdk的地址
export JAVA_HOME=/export/software/jdk1.8.0_161
(7)配置环境变量
(8)将hadoop复制到其他两台上
scp -r /export/software/hadoop-2.4.1 hadoop-2:/export/software/hadoop-2.4.1/
scp -r /export/software/hadoop-2.4.1 hadoop-3:/export/software/hadoop-2.4.1/
启动hadoop
1.启动journalnode(三台都要启动)
hadoop-daemon.sh start journalnode
2.格式化hadoop
hadoop namenode -format
3.将hadoop文件夹下的tmp文件传到hadoop-2
scp -r /export/software/hadoop-2.4.1/tmp hadoop-2:/export/software/hadoop-2.4.1/
4.启动hadoop
start-all.sh
若yarn的resourcemanager没开启,则单独开启
yarn-daemon.sh start resourcemanager
5.jps查看
6.通过Web界面查看hadoop信息
7.测试namenode高可用
A.在hadoop1上kill掉namenode进程,然后通过浏览器查看hadoop2的状态,发现状态变为active,说明高可用测试成功
B.重新启动hadoop1的namenode进程,sh start-dfs.sh,浏览器访问hadoop1,此时hadoop1的状态为standbyetails/119346090