一、前提条件

以下所有操作都基于Docker,需要预先安装

二、编写Dockerfile

[xiaokang@hadoop ~]$ sudo vim Hadoop-Single-Dockerfile
#选择centos7.7.1908作为基础镜像
FROM centos:centos7.7.1908
#镜像维护者信息
MAINTAINER "xiaokang<xiaokang.188@qq.com>"
#描述信息
LABEL name="Hadoop-Single" \
build_date="2020-04-16 11:24:12" \
wechat="xk1181259634" \
personal_site="https://www.xiaokang.cool/"
#构建容器时需要运行的命令
#安装openssh-server、openssh-clients、sudo、vim和net-tools软件包
RUN yum -y install openssh-server openssh-clients sudo vim net-tools
#生成相应的主机密钥文件
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key
RUN ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key
#并授予root权限
RUN groupadd -g 1124 bigdata && useradd -m -u 1124 -g bigdata xiaokang
RUN echo "xiaokang:xiaokang" | chpasswd
RUN echo "xiaokang ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
#创建模块和软件目录并修改权限
RUN mkdir /opt/software && mkdir /opt/moudle
#将宿主机的文件拷贝至镜像(ADD会自动解压)
COPY copyright.txt /home/xiaokang/copyright.txt
ADD jdk-8u191-linux-x64.tar.gz /opt/moudle
ADD hadoop-2.7.7.tar.gz /opt/software
RUN chown -R xiaokang:bigdata /opt/moudle && chown -R xiaokang:bigdata /opt/software
#设置环境变量
ENV CENTOS_DEFAULT_HOME /root
ENV JAVA_HOME /opt/moudle/jdk1.8.0_191
ENV HADOOP_HOME /opt/software/hadoop-2.7.7
ENV JRE_HOME ${JAVA_HOME}/jre
ENV CLASSPATH ${JAVA_HOME}/lib:${JRE_HOME}/lib
ENV PATH ${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
#终端默认登录进来的工作目录
WORKDIR $CENTOS_DEFAULT_HOME
#启动sshd服务并且暴露22端口
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

三、构建镜像

Dockerfile中所需的文件默认从当前目录读取并进行构建,注意最后一定要有一个英文句号

[xiaokang@hadoop ~]$ sudo docker build -f Hadoop-Single-Dockerfile -t xiaokangxxs/hadoop-single:2.7.7 .

Docker环境下Hadoop单机伪分布式_分布式Docker环境下Hadoop单机伪分布式_xml_02

构建成功:

Docker环境下Hadoop单机伪分布式_分布式_03

查看镜像信息:

Docker环境下Hadoop单机伪分布式_hadoop_04

四、修改配置

3.1 创建自定义网络

默认情况下启动的Docker容器,都是使用 bridge,Docker安装时创建的桥接网络,每次Docker容器重启时,会按照顺序获取对应的IP地址,这个就导致重启后容器的IP地址就变了。因此我们需要创建自定义网络:

[xiaokang@hadoop ~]$ sudo docker network ls
NETWORK ID NAME DRIVER SCOPE
7a8469ab926c bridge bridge local
eddaf7096efc host host local
62d73bc5bf8f none null local
[xiaokang@hadoop ~]$ sudo docker network inspect 7a8469ab926c

#创建自定义网络,并且指定网段:172.24.0.0/24
[xiaokang@hadoop ~]$ sudo docker network create --subnet=172.24.0.0/24 xiaokang-network
f460e6d2ad6ce77890c1b49d358ce6ca9ce38bbab857227e145296ae2dfdef1e
[xiaokang@hadoop ~]$ sudo docker network ls
NETWORK ID NAME DRIVER SCOPE
7a8469ab926c bridge bridge local
eddaf7096efc host host local
62d73bc5bf8f none null local
f460e6d2ad6c xiaokang-network bridge local

3.2 创建并启动容器

这里需要注意,一定要使用我们刚才自定义的网络,这样才能固定IP

[xiaokang@hadoop ~]$ sudo docker run -d --name hadoop --hostname hadoop --net xiaokang-network --ip 172.24.0.2 -P -p 50070:50070 -p 8088:8088 -p 19888:19888 343c54a5a1f3

成功启动:

Docker环境下Hadoop单机伪分布式_分布式_05

查看容器的IP:

[xiaokang@hadoop ~]$ sudo docker inspect 2c209e844385 | grep -i ip

Docker环境下Hadoop单机伪分布式_xml_06

3.3 容器内进行配置

以交互式方式进入容器执行操作:

[xiaokang@hadoop ~]$ sudo docker exec -it --privileged=true 2c209e844385 /bin/bash
#再次确认software和moudle目录下文件夹的拥有者和组
[xiaokang@hadoop hadoop]$ chown -R xiaokang:bigdata /opt/moudle
[xiaokang@hadoop hadoop]$ chown -R xiaokang:bigdata /opt/software

配置免密登录

[xiaokang@hadoop ~]$ ssh-keygen -t rsa -C "xiaokang.188@qq.com"
Generating public/private rsa key pair.
Enter file in which to save the key (/home/xiaokang/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/xiaokang/.ssh/id_rsa.
Your public key has been saved in /home/xiaokang/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:gFnr0eagWRaTKq2bcM7TEmEaWK2fCFt/5iNjKD5Xz10 xiaokang.188@qq.com
The key's randomart image is:
+---[RSA 2048]----+
| . +. |
| . .+.= |
|.. oo.B o |
|+ * o* * |
| B Bo.. S |
|+ = +.o E |
| = *.+o . . |
|..O.= oo . |
|.oo+ o . |
+----[SHA256]-----+
[xiaokang@hadoop ~]$
[xiaokang@hadoop ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/xiaokang/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
xiaokang@hadoop's password:

Number of key(s) added: 1

Now try logging into the machine, with: "ssh 'hadoop'"
and check to make sure that only the key(s) you wanted were added.

#测试
[xiaokang@hadoop ~]$ ssh hadoop
Last login: Thu Apr 16 09:49:49 2020 from hadoop

创建所需文件夹:

[xiaokang@hadoop ~]$ mkdir /opt/software/hadoop-2.7.7/tmp
[xiaokang@hadoop ~]$ mkdir -p /opt/software/hadoop-2.7.7/dfs/namenode_data
[xiaokang@hadoop ~]$ mkdir -p /opt/software/hadoop-2.7.7/dfs/datanode_data

进入 ​​${HADOOP_HOME}/etc/hadoop/​​ 目录下,修改以下配置:

[xiaokang@hadoop ~]$ cd ${HADOOP_HOME}/etc/hadoop

1. hadoop-env.sh

#25行 export JAVA_HOME
export JAVA_HOME=/opt/moudle/jdk1.8.0_191
#33行 export HADOOP_CONF_DIR
export HADOOP_CONF_DIR=/opt/software/hadoop-2.7.7/etc/hadoop

2. core-site.xml

<configuration>
<!--默认文件系统的名称 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value>
</property>
<!--指定HDFS执行时的临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/software/hadoop-2.7.7/tmp</value>
</property>
</configuration>

3. hdfs-site.xml

指定副本系数和hdfs操作权限:

<configuration>
<property>
<!--指定hdfs保存数据副本的数量,包括自己,默认为3-->
<!--伪分布式模式,此值必须为1-->
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<!--namenode节点数据(元数据)的存放位置,可以指定多个目录实现容错,用逗号分隔
-->
<name>dfs.name.dir</name>
<value>/opt/software/hadoop-2.7.7/dfs/namenode_data</value>
</property>
<property>
<!--datanode节点数据(数据块)的存放位置-->
<name>dfs.datanode.data.dir</name>
<value>/opt/software/hadoop-2.7.7/dfs/datanode_data</value>
</property>
<property>
<!--设置hdfs操作权限,false表示任何用户都可以在hdfs上操作文件-->
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

4. mapred-site.xml

说明:在${HADOOP_HOME}/etc/hadoop的目录下,只有一个mapred-site.xml.template文件,复制一个进行更改。

[xiaokang@hadoop hadoop]$ cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<!--指定mapreduce运行在yarn上-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<!--配置任务历史服务器地址-->
<name>mapreduce.jobhistory.address</name>
<value>hadoop:10020</value>
</property>
<property>
<!--配置任务历史服务器web-UI地址-->
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop:19888</value>
</property>
</configuration>

5. yarn-site.xml

<configuration>
<property>
<!--指定yarn的老大resourcemanager的地址-->
<name>yarn.resourcemanager.hostname</name>
<value>hadoop</value>
</property>
<property>
<!--NodeManager获取数据的方式-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<!--开启日志聚集功能-->
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<!--配置日志保留7天-->
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>

6. slaves

配置所有从属节点的主机名或 IP 地址,由于是单机版本,所以指定本机即可:

hadoop

3.4 关闭防火墙

不关闭防火墙可能导致无法访问 Hadoop 的 Web UI 界面:

# 查看防火墙状态
systemctl status firewalld
# 关闭防火墙:
sudo systemctl stop firewalld.service

3.5 初始化

第一次启动 Hadoop 时需要进行初始化,执行以下命令:

[xiaokang@hadoop ~]$ hdfs namenode -format

3.6 启动HDFS和YARN

分别启动 HDFS、YARN和任务历史服务器:

[xiaokang@hadoop ~]$ start-dfs.sh
[xiaokang@hadoop ~]$ start-yarn.sh
[xiaokang@hadoop ~]$ mr-jobhistory-daemon.sh start historyserver

3.7 验证是否启动成功

方式一:执行 ​​jps​​​ 查看 ​​NameNode​​​ 、 ​​DataNode​​​ 和 ​​JobHistoryServer​​ 服务是否已经启动:

[xiaokang@hadoop ~]$ jps
1254 NameNode
1354 DataNode
1547 SecondaryNameNode
1692 ResourceManager
2172 JobHistoryServer
1789 NodeManager
2205 Jps

方式二:查看HDFS Web UI 界面,端口为 ​​50070​​:

Docker环境下Hadoop单机伪分布式_hdfs_07

查看YARN Web UI 界面,端口号为 ​​8088​​:

Docker环境下Hadoop单机伪分布式_docker_08

查看任务历史服务器 Web UI 界面,端口号为 ​​19888​