Tags: Hadoop
Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive(CDH5.14.2离线安装tar包)
- Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive(CDH5.14.2离线安装tar包)
- 主机环境
- 软件环境
- 主机规划
- 主机安装前准备
- 安装jdk1.8
- 安装zookeeper
- 安装hadoop
- 配置HDFS
- 配置YARN
- 集群初始化
- 启动HDFS
- 启动YARN
- 整个集群启动顺序
- 启动
- 停止
- Hbase安装
- Hive安装
主机环境
基本配置:
节点数 | 5 |
操作系统 | CentOS Linux release 7.5.1804 (Core) |
内存 | 8GB |
流程配置:
节点数 | 5 |
操作系统 | CentOS Linux release 7.5.1804 (Core) |
内存 | 16GB |
注: 实际生产中按照需求分配内存,如果只是在vmvare中搭建虚拟机,内存可以调整为每台主机1-2GB即可
软件环境
软件 | 版本 | 下载地址 |
jdk | jdk-8u172-linux-x64 | |
hadoop | hadoop-2.6.0-cdh5.14.2 | |
zookeeeper | zookeeper-3.4.5-cdh5.14.2 | |
hbase | hbase-1.2.0-cdh5.14.2 | |
hive | hive-1.1.0-cdh5.14.2 |
注: CDH5的所有软件可以在此下载:http://archive.cloudera.com/cdh5/cdh/5/
主机规划
5个节点角色规划如下:
主机名 | CDHNode1 | CDHNode2 | CDHNode3 | CDHNode4 | CDHNode5 |
IP | 192.168.223.201 | 192.168.223.202 | 192.168.223.203 | 192.168.223.204 | 192.168.223.205 |
namenode | yes | yes | no | no | no |
dataNode | no | no | yes | yes | yes |
resourcemanager | yes | yes | no | no | no |
journalnode | yes | yes | yes | yes | yes |
zookeeper | yes | yes | yes | no | no |
hmaster(hbase) | yes | yes | no | no | no |
regionserver(hbase) | no | no | yes | yes | yes |
hive(hiveserver2) | no | no | yes | yes | yes |
注: Journalnode和ZooKeeper保持奇数个,如果需要高可用则不少于 3 个节点。具体原因,以后详叙。
主机安装前准备
- 关闭所有节点的
SELinux
sed -i 's/^SELINUX=.*$/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
- 关闭所有节点防火墙
firewalld
iptables
systemctl disable firewalld;
systemctl stop firewalld;
systemctl disable iptables;
systemctl stop iptables;
- 开启所有节点时间同步
ntpdate
echo "*/5 * * * * /usr/sbin/ntpdate asia.pool.ntp.org | logger -t NTP" >> /var/spool/cron/root
- 设置所有节点语言编码以及时区
echo 'export TZ=Asia/Shanghai' >> /etc/profile
echo 'export LANG=en_US.UTF-8' >> /etc/profile
. /etc/profile
- 所有节点添加hadoop用户
useradd -m hadoop
echo '123456' | passwd --stdin hadoop
# 设置PS1
su - hadoop
echo 'export PS1="\u@\h:\$PWD>"' >> ~/.bash_profile
echo "alias mv='mv -i'
alias rm='rm -i'" >> ~/.bash_profile
. ~/.bash_profile
- 设置hadoop用户之间免密登录 首先在CDHNode1主机生成秘钥
su - hadoop
ssh-keygen -t rsa # 一直回车即可生成hadoop用户的公钥和私钥
cd .ssh
vi id_rsa.pub # 去掉私钥末尾的主机名 hadoop@CDHNode1
cat id_rsa.pub > authorized_keys
chmod 600 authorized_keys
压缩.ssh文件夹
su - hadoop
zip -r ssh.zip .ssh
随后分发ssh.zip到CDHNode2-5主机hadoop用户家目录解压即完成免密登录
- 主机内核参数优化以及最大文件打开数、最大进程数等参数优化 不同主机优化参数有可能不一样,故这里不作出具体优化方法,但如果Hadoop环境用于正式生产,必须优化,linux默认参数可能会导致hadoop集群性能低下。
- datanode节点(CDHNode3-5)挂载数据盘/chunk1,大小15G,请挂载后目录需要授权给hadoop用户
注: root
hadoop
用户
安装jdk1.8
所有节点都需要安装,安装方式都一样 解压 jdk-8u172-linux-x64.tar.gz
tar zxvf jdk-8u172-linux-x64.tar.gz
mkdir -p /home/hadoop/app
mv jdk-8u172-linux-x64 /home/hadoop/app/jdk
rm -f jdk-8u172-linux-x64.tar.gz
配置环境变量 vi ~/.bash_profile
添加以下内容:
#java
export JAVA_HOME=/home/hadoop/app/jdk
export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
加载环境变量
. ~/.bash_profile
查看是否安装成功 java -version
java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
如果出现以上结果证明安装成功。
安装zookeeper
首先在CDHNode1上安装
解压 zookeeper-3.4.5-cdh5.14.2.tar.gz
tar zxvf zookeeper-3.4.5-cdh5.14.2.tar.gz
mv zookeeper-3.4.5-cdh5.14.2 /home/hadoop/app/zookeeper
rm -f zookeeper-3.4.5-cdh5.14.2.tar.gz
设置环境变量 vi ~/.bash_profile
添加以下内容:
#zk
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
加载环境变量
. ~/.bash_profile
添加配置文件 vi /home/hadoop/app/zookeeper/conf/zoo.cfg
添加以下内容:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
#数据文件目录与日志目录
dataDir=/home/hadoop/data/zookeeper/zkdata
dataLogDir=/home/hadoop/data/zookeeper/zkdatalog
# the port at which the clients will connect
clientPort=2181
#server.服务编号=主机名称:Zookeeper不同节点之间同步和通信的端口:选举端口(选举leader)
server.1=CDHNode1:2888:3888
server.2=CDHNode2:2888:3888
server.3=CDHNode3:2888:3888
# 节点变更时只需在此添加或者删除相应的节点(所有节点配置都需要修改),然后在启动新增或者停止删除的节点即可
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
创建所需目录
mkdir -p /home/hadoop/data/zookeeper/zkdata
mkdir -p /home/hadoop/data/zookeeper/zkdatalog
mkdir -p /home/hadoop/app/zookeeper/logs
添加myid vim /home/hadoop/data/zookeeper/zkdata/myid
,添加:
1
注: server.1=CDHNode1:2888:3888
行server后面的1,故CDHNode2填写2,CDHNode3填写3
配置日志目录 vim /home/hadoop/app/zookeeper/libexec/zkEnv.sh
,修改以下参数为:
ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
注: /home/hadoop/app/zookeeper/libexec/zkEnv.sh
/home/hadoop/app/zookeeper/bin/zkEnv.sh
/home/hadoop/app/zookeeper/bin/zkServer.sh
会优先读取/home/hadoop/app/zookeeper/libexec/zkEnv.sh
,当其不存在时才会读取 /home/hadoop/app/zookeeper/bin/zkEnv.sh
。
vim /home/hadoop/app/zookeeper/conf/log4j.properties
,修改以下参数为:
zookeeper.root.logger=INFO, ROLLINGFILE
zookeeper.log.dir=/home/hadoop/app/zookeeper/logs
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
复制zookeeper到CDHNode2-3
scp ~/.bash_profile CDHNode2:/home/hadoop
scp ~/.bash_profile CDHNode3:/home/hadoop
scp -pr /home/hadoop/app/zookeeper CDHNode2:/home/hadoop/app
scp -pr /home/hadoop/app/zookeeper CDHNode3:/home/hadoop/app
ssh CDHNode2 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs"
ssh CDHNode2 "echo 2 > /home/hadoop/data/zookeeper/zkdata/myid"
ssh CDHNode3 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs"
ssh CDHNode3 "echo 3 > /home/hadoop/data/zookeeper/zkdata/myid"
启动zookeeper 3个节点均启动
/home/hadoop/app/zookeeper/bin/zkServer.sh start
查看节点状态
/home/hadoop/app/zookeeper/bin/zkServer.sh status
如果一个节点为leader,另2个节点为follower,则说明Zookeeper安装成功
查看进程
jps
其中 QuorumPeerMain
进程为zookeeper
停止zookeeper
/home/hadoop/app/zookeeper/bin/zkServer.sh stop
安装hadoop
首先在CDHNode1节点安装,然后复制到其他节点 解压 hadoop-2.6.0-cdh5.14.2.tar.gz
tar zxvf hadoop-2.6.0-cdh5.14.2.tar.gz
mv hadoop-2.6.0-cdh5.14.2 /home/hadoop/app/hadoop
rm -f hadoop-2.6.0-cdh5.14.2.tar.gz
设置环境变量 vi ~/.bash_profile
添加以下内容:
#hadoop
HADOOP_HOME=/home/hadoop/app/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME PATH
加载环境变量
. ~/.bash_profile
配置HDFS
配置 /home/hadoop/app/hadoop/etc/hadoop/hadoop-env.sh
, 修改以下内容
export JAVA_HOME=/home/hadoop/app/jdk
配置 /home/hadoop/app/hadoop/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster1</value>
</property>
<!-- 这里的值指的是默认的HDFS路径 ,取名为cluster1 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/tmp</value>
</property>
<!-- hadoop的临时目录,如果需要配置多个目录,需要逗号隔开,data目录需要我们自己创建 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181</value>
</property>
<!-- 配置Zookeeper 管理HDFS -->
</configuration>
配置 /home/hadoop/app/hadoop/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 数据块副本数为3 -->
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/data/hdfs/name</value>
</property>
<!-- 元数据保存目录,多个以','隔开 -->
<property>
<name>dfs.data.dir</name>
<value>/chunk1</value>
</property>
<!-- 数据保存目录,多个以','隔开 -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!-- 权限默认配置为false -->
<property>
<name>dfs.nameservices</name>
<value>cluster1</value>
</property>
<!-- 命名空间,它的值与fs.defaultFS的值要对应,namenode高可用之后有两个namenode,cluster1是对外提供的统一入口 -->
<property>
<name>dfs.ha.namenodes.cluster1</name>
<value>CDHNode1,CDHNode2</value>
</property>
<!-- 指定 nameService 是 cluster1 时的nameNode有哪些,这里的值也是逻辑名称,名字随便起,相互不重复即可 -->
<property>
<name>dfs.namenode.rpc-address.cluster1.CDHNode1</name>
<value>CDHNode1:9000</value>
</property>
<!-- CDHNode1 rpc地址 -->
<property>
<name>dfs.namenode.http-address.cluster1.CDHNode1</name>
<value>CDHNode1:50070</value>
</property>
<!-- CDHNode1 http地址 -->
<property>
<name>dfs.namenode.rpc-address.cluster1.CDHNode2</name>
<value>CDHNode2:9000</value>
</property>
<!-- CDHNode2 rpc地址 -->
<property>
<name>dfs.namenode.http-address.cluster1.CDHNode2</name>
<value>CDHNode2:50070</value>
</property>
<!-- CDHNode2 http地址 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 启动故障自动恢复 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://CDHNode1:8485;CDHNode2:8485;CDHNode3:8485;CDHNode4:8485;CDHNode5:8485/cluster1</value>
</property>
<!-- 指定journal -->
<property>
<name>dfs.client.failover.proxy.provider.cluster1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 指定 cluster1 出故障时,哪个实现类负责执行故障切换 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/journaldata/jn</value>
</property>
<!-- 指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>