1. 创建hadoop用户
为每台机子创建一个hadoop用户,并且用户名和密码一致
[root@localhost ho]# useradd hadoop
[root@localhost ho]# echo "hadoop"|passwd --stdin hadoop # echo "yourpassword"|passwd --stdin username
2. 修改network文件来配置hostsname
修改集群中机子的主机名,修改成每个机子对应的主机名
[root@localhost ho]# vim /etc/sysconfig/network
将etc/sysconfig/network修改成如下:
NETWORKING=yes
HOSTNAME=hadoop1 #改成主机名
修改/etc/hostname,将原本内容替换成新的hostname
hadoop1 # hadoop2
修改完成后进行重启。
3. 配置网络
hosts文件是用来进行域名解析,配置各个主机名和IP地址之间的映射
ifconfig查看IP地址,集群中的所有机子的/etc/hosts文件内容一致,主机名和IP地址
将/etc/hosts修改为如下:
192.168.159.130 hadoop1
192.168.159.129 hadoop2
# IP地址按照自己的IP地址
完成network文件的配置后,测试集群是否能ping通
[hadoop@hadoop2 ~]$ ping hadoop1 -c 1
PING hadoop1 (192.168.159.130) 56(84) bytes of data.
64 bytes from hadoop1 (192.168.159.130): icmp_seq=1 ttl=64 time=0.490 ms
--- hadoop1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.490/0.490/0.490/0.000 ms # ping通了
4. 配置Java环境
下载JDK安装包,解压安装包并配置环境变量
JDK安装包:https://pan.baidu.com/s/1kUZTVfk8YHksSBeqtHozvw 提取码:xhr7
解压安装包
[hadoop@hadoop1 ~]$ tar -zxvf jdk-8u211-linux-x64.tar.gz
# 查看已经安装的jdk
[root@hadoop1 hadoop]# rpm -qa|grep jdk
java-1.7.0-openjdk-headless-1.7.0.221-2.6.18.1.el7.x86_64
copy-jdk-configs-3.3-10.el7_5.noarch
java-1.7.0-openjdk-1.7.0.221-2.6.18.1.el7.x86_64
java-1.8.0-openjdk-1.8.0.222.b03-1.el7.x86_64
java-1.8.0-openjdk-headless-1.8.0.222.b03-1.el7.x86_64
# 删除所有的jdk yum -y remove 列表中的jdk
[root@hadoop1 hadoop]# yum -y remove java-1.7.0-openjdk-headless-1.7.0.221-2.6.18.1.el7.x86_64
[root@hadoop1 hadoop]# yum -y remove java-1.7.0-openjdk-1.7.0.221-2.6.18.1.el7.x86_64
[root@hadoop1 hadoop]# yum -y remove java-1.8.0-openjdk-1.8.0.222.b03-1.el7.x86_64
[root@hadoop1 hadoop]# yum -y remove java-1.8.0-openjdk-headless-1.8.0.222.b03-1.el7.x86_64
# 检查是否完全删除已经安装的jdk
[root@hadoop1 hadoop]# rpm -qa|grep jdk
copy-jdk-configs-3.3-10.el7_5.noarch # 删除完成
完全删除已经安装的jdk之后,修改/etc/profile,在文件末尾加上
export JAVA_HOME=/home/hadoop/Downloads/java/jdk1.8.0_211
#(JDK的安装位置)
export JRE_HOME=/home/hadoop/Downloads/java/jdk1.8.0_211/jre
#(JDK的安装位置/jre)
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
使环境变量生效
[root@localhost hadoop]# source /etc/profile
使环境生效然后进行测试配置是否成功,当出现命令下的内容即配置成功
[root@localhost hadoop]# java -version
# 配置成功
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
集群中每台机子都需要配置。
5. ssh免密码通信
注意:整个免密过程都在普通用户下进行
设置Master节点的免密通信
[hadoop@hadoop1 ~]$ ssh hadoop1
需要密码登录
[hadoop@hadoop1 ~]$ ssh-keygen -t rsa # 创建ssh-key
# 所有的提示直接按Enter即可
[hadoop@hadoop1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # 加入授权
[hadoop@hadoop1 ~]$ chmod 600 ~/.ssh/authorized_keys # 为authorized_keys文件赋予权限
检查ssh免密登录是否成功
[hadoop@hadoop1 ~]$ ssh hadoop1
#不需要密码即可登录,证明免密登录设置成功
先在Slave节点中创建ssh-key(同主节点的方法)后,在 Master 节点将公匙传输到 Slave 节点,并将Slave节点的ssh-key添加到接收的authorized_keys文件
[hadoop@hadoop1 ~]$ scp ~/.ssh/authorized_keys hadoop@hadoop2:~/.ssh/
[hadoop@hadoop2 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
将Slave节点的id_rsa.pub传输到Master节点,并将接受的id_rsa.pub添加到Master节点中authorized_keys文件
[hadoop@hadoop2 ~]$ scp ~/.ssh/id_rsa.pub hadoop@hadoop1:~/
[hadoop@hadoop1 ~] cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
此时配置已完成,检测免密通信是否成功,当不需要密码即表示免密通信配置成功
[hadoop@hadoop1 ~]$ ssh hadoop2
Last login: Fri Apr 10 15:01:18 2020 from hadoop2
[hadoop@hadoop2 ~]$ ssh hadoop1
Last login: Fri Apr 10 14:59:04 2020 from hadoop1
6. 安装zookeeper
Zookeeper安装包:https://pan.baidu.com/s/15WbQ99c3222TiNrdg_-pSg
提取码:5mii
下载并解压安装包,修改/etc/profile文件配置Zookeeper环境变量,内容如下:
export ZK_HOME=/home/hadoop/Downloads/hadoop/zookeeper
export PATH=$ZK_HOME/bin:$PATH
复制zookeeper中的zoo_sample.cfg并命名为zoo.cfg
修改zoo.cfg文件,
dataDir=/home/hadoop/Downloads/hadoop/zookeeper/data/ # 修改成想存放的位置
dataLogDir=/home/hadoop/Downloads/hadoop/zookeeper/dataLog/
# the port at which the clients will connect
clientPort=2181
#
server.1=0.0.0.0:2888:3888
server.2=192.168.159.129:2888:3888
server.3=192.168.159.131:2888:3888
新建data和dataLog目录
[hadoop@hadoop1 ~]$ mkdir ~/Downloads/hadoop/zookeeper/data
[hadoop@hadoop1 ~]$ mkdir ~/Downloads/hadoop/zookeeper/dataLog
[hadoop@hadoop1 ~]$ cd ~/Downloads/hadoop/zookeeper/data
[hadoop@hadoop1 data]$ vim myid
# 写入对应数字,如hadoop1 写入的是1
将~/Downloads/hadoop/zookeeper/ 复制到其他机子。
[hadoop@hadoop1 ~]$ scp -r /home/hadoop/Downloads/hadoop/zookeeper/ hadoop@hadoop2: /home/hadoop/Downloads/hadoop/zookeeper/
[hadoop@hadoop1 ~]$ scp -r /home/hadoop/Downloads/hadoop/zookeeper/ hadoop@hadoop3: /home/hadoop/Downloads/hadoop/zookeeper/
注意每台机子都需要修改myid文件,并修改对应的数字。
启动zookeeper服务,检查是否能正常启动
[hadoop@hadoop1 ~]$ zkServer.sh start
[hadoop@hadoop2 ~]$ zkServer.sh start
[hadoop@hadoop3 ~]$ zkServer.sh start
[hadoop@hadoop1 ~]$ zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/Downloads/hadoop/zookeeper/bin/../conf/zoo.cfg
Mode: leader
nabled by default
Using config: /home/hadoop/Downloads/hadoop/zookeeper/bin/../conf/zoo.cfg
Mode: follower
出现mode后,即表示配置成功
7. 安装配置Hadoop
Hadoop安装包:https://pan.baidu.com/s/1lO-CT5QZVzkLzqIdlAkc-Q
提取码:4q2v
下载并解压安装包,修改配置文件
在/etc/profile文件和~/.bash_profile文件添加如下内容:
export HADOOP_HOME=/home/hadoop/Downloads/hadoop/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$PATH
注意,完成所有配置之后还需要重启一遍才能生效。
[hadoop@hadoop1 ~]$ source /etc/profile
[hadoop@hadoop1 ~]$ source ~/.bash_profile
注意:这一步很重要,有了这一步才能在随意地方进行HDFS的操作
在以下文件中添加如下内容:
hadoop-env.sh文件
[hadoop@hadoop1 ~]$ cd ~/Downloads/hadoop/hadoop/etc/hadoop
[hadoop@hadoop1 hadoop]$ vim hadoop-env.sh
内容如下:
export JAVA_HOME=/home/hadoop/Downloads/java/jdk1.8.0_211
export HADOOP_HOME=/home/hadoop/Downloads/hadoop/hadoop
core-site.xml文件
[hadoop@hadoop1 hadoop]$ vim core-site.xml
# 在文件尾部添加
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop</value> # 随便取的名字
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/Downloads/hadoop/hadoop/data/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
</configuration>
hdfs-site.xml文件
<configuration>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/Downloads/hadoop/hadoop/journal</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/Downloads/hadoop/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/Downloads/hadoop/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.ha.namenodes.hadoop </name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop.nn1</name>
<value>hadoop1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop.nn2</name>
<value>hadoop2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop.nn1</name>
<value>hadoop1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop.nn2</name>
<value>hadoop2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/hadoop</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.hadoop </name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
mapred-env.sh文件
export JAVA_HOME=/home/hadoop/Downloads/java/jdk1.8.0_211
mapred-site.xml文件
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-env.sh文件
export JAVA_HOME=/home/hadoop/Downloads/java/jdk1.8.0_211
yarn-site.xml文件
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop2</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
配置slaves
hadoop1
hadoop2
hadoop3
将安装配置好的Hadoop复制到其他机子中
[hadoop@hadoop1 ~]$ scp -r ~/Downloads/hadoop/hadoop/ hadoop@hadoop2:~/Downloads/hadoop/hadoop/
[hadoop@hadoop1 ~]$ scp -r ~/Downloads/hadoop/hadoop/ hadoop@hadoop3:~/Downloads/hadoop/hadoop/
8. 启动集群
hadoop/sbin or hadoop/bin
格式换命名空间
[hadoop@hadoop1 bin]$ ./hdfs zkfc -formatZK
启动journalnode(每个机子都需要启动)
[hadoop@hadoop1 sbin]$ ./hadoop-daemon.sh start journalnode
[hadoop@hadoop1 sbin]$ jps # 查看journalnode是否启动
格式化master节点并启动hadoop服务
[hadoop@hadoop1 bin]$ ./hadoop namenode -format hadoop
[hadoop@hadoop1 sbin]$ ./hadoop-daemon.sh start namenode
# 将格式化后的文件复制到hadoop2
[hadoop@hadoop1 ~]$ scp -r /home/hadoop/Downloads/hadoop/hadoop/dfs/name/ hadoop@hadoop2:/home/hadoop/Downloads/hadoop/hadoop/dfs/name/
# 在hadoop2中启动hadoop服务
[hadoop@hadoop2 sbin]$ ./hadoop-daemon.sh start namenode
在hadoop1,hadoop2中启动zkfs
[hadoop@hadoop1 sbin]$ ./hadoop-daemon.sh start zkfc
[hadoop@hadoop2 sbin]$ ./hadoop-daemon.sh start zkfc
启动DataNode(每台机子都需要)
[hadoop@hadoop1 sbin]$ ./hadoop-daemon.sh start datanode
[hadoop@hadoop2 sbin]$ ./hadoop-daemon.sh start datanode
[hadoop@hadoop3 sbin]$ ./hadoop-daemon.sh start datanode
启动yarn
[hadoop@hadoop1 sbin]$ ./start-yarn.sh
9. 访问Web界面(http://192.168.159.130:50070/)来查看Hadoop的信息
能成功访问即表示Hadoop安装配置成功,成功访问会出现以下内容:
10.测试
先将文件上传到HDFS中
[hadoop@hadoop1 ~]$ hadoop fs -put /home/hadoop/words /
[hadoop@hadoop1 ~]$ /home/hadoop/Downloads/hadoop/hadoop/bin/hadoop jar /home/hadoop/Downloads/hadoop/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar wordcount /words /output
执行没有出现问题,Hadoop分布式集群环境构建完成。
平时使用Hadoop时,进入sbin使用start-all.sh启动,使用stop-all.sh即可。
[hadoop@hadoop1 sbin]$ ./start-all.sh
[hadoop@hadoop1 sbin]$ ./stop-all.sh
有任何错误之处欢迎指正。