网上找了很多资料,但能够实现Docker安装Hadoop3.X和Hbase2.X真正分布式集群的教程很零散,坑很多, 把经验做了整理, 避免趟坑。
一、安装Docker Hadoop3.X分布式集群
1、机器环境
这里采用三台机器来部署分布式集群环境:
192.168.1.101 hadoop1 (docker管理节点)
192.168.1.102 hadoop2
192.168.1.103 hadoop3
2、下载Docker Hadoop的配置文件
地址: https://github.com/big-data-europe/docker-hadoop/tree/2.0.0-hadoop3.1.3-java8
根据需要切换分支选择版本,这里选择hadoop3.1.3版本。
3、安装Docker
自行参考之前教程, 这里安装版本为docker-ce-3:20.10.8-3.el7.x86_64
4、系统配置
# 关闭防火墙
systemctl stop firewalld
# 永久关闭
systemctl disable firewalld
# 重启docker(更改网络环境需要重启)
systemctl restart docker
5、安装Docker Compose
# 下载配置Compose
curl -SL https://github.com/docker/compose/releases/download/1.29.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
# 查看版本
[root@localhost ~]# docker-compose version
docker-compose version 1.29.0, build 07737305
docker-py version: 5.0.0
CPython version: 3.7.10
OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019
6、拉取相关镜像
1) 拉取hadoop的镜像
# 解压docker-hadoop-2.0.0-hadoop3.1.3-java8
tar -xvf docker-hadoop-2.0.0-hadoop3.1.3-java8.zip
# 运行docker-compose脚本, 拉取hadoop相关的镜像
docker-compose up
成功之后可以看到相关的容器实例:
运行完成后, 删除容器实例:
docker rm $(docker ps -aq)
删除磁盘卷:
docker volume rm $(docker volume ls |awk ‘{print $2}’)
删除网络:
docker network rm docker-hadoop-200-hadoop313-java8_default
2) 拉取traefik镜像
traefik是一款网络工具,能够实现容器内部的反向代理与负载均衡
docker pull traefik:2.9.10
3) 拉取zookeeper镜像
docker pull zookeeper:3.4.10
以上步骤, 三台机器都分别执行, 确保docker环境与镜像都已经准备好。
7、配置docker swarm环境
在管理节点执行:
[root@hadoop1 ~]# docker swarm init --advertise-addr 192.168.1.101
Swarm initialized: current node (swfdinosstcc5h9k1wkz1bp9l) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-1xlri07uvjsjscxalipmtcqrfzk6bh9rasrh1mnx0xt2trq20h-6h1szze1p8d7ag6in1ejxc6wi 192.168.1.101:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
其他两个节点执行生成join命令, 加入swarm管理
加入成功后, 检查:
[root@hadoop1 ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
swfdinosstcc5h9k1wkz1bp9l * hadoop1 Ready Active Leader 20.10.8
o3v4fekz682vl7whgzov9nybd hadoop2 Ready Active 20.10.8
1cptz9kcz4d65llkq8j7kwrnn hadoop3 Ready Active 20.10.8
8、配置集群网络环境
1) 创建hbase集群内部网络:
# docker network create --driver overlay --attachable --subnet 10.20.0.0/24 hbase
docker network create -d overlay --attachable hbase
2) 给swarm的子节点增加标签, 标识为数据节点datanode:
# 这里集群配置两个数据节点, 视具体情况配置,在docker compose的yml配置文件中会使用到:
docker node update --label-add hadoop-datanode=datanode hadoop2
docker node update --label-add hadoop-datanode=datanode hadoop3
9、配置Hadoop的docker-compose文件
这里主节点部署一个namenode、resourcemanager和historyserver,两个从节点分各部署一个datanode和nodemanager。
# 修改配置文件名称
mv docker-compose-v3.yml docker-compose-hadoop.yml
进入目录:
cd /usr/local/hadoop-hbase/docker-hadoop-2.0.0-hadoop3.1.3-java8
修改 docker-compose-hadoop.yml的具体配置:
version: '3'
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.1.3-java8
networks:
- hbase
ports:
- 19870:9870
- 19000:9000
volumes:
- namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
max_attempts: 3
placement:
constraints:
- node.hostname == hadoop1
labels:
traefik.docker.network: hbase
traefik.port: 9870
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.1.3-java8
networks:
- hbase
ports:
- 19864:9864
volumes:
- datanode:/hadoop/dfs/data
env_file:
- ./hadoop.env
environment:
SERVICE_PRECONDITION: "namenode:9870"
deploy:
mode: global
restart_policy:
condition: on-failure
max_attempts: 3
placement:
constraints:
- node.labels.hadoop-datanode == datanode
labels:
traefik.docker.network: hbase
traefik.port: 9864
resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.1.3-java8
networks:
- hbase
ports:
- 18088:8088
environment:
SERVICE_PRECONDITION: "namenode:9000 datanode:9864"
env_file:
- ./hadoop.env
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
max_attempts: 3
placement:
constraints:
- node.hostname == hadoop1
labels:
traefik.docker.network: hbase
traefik.port: 8088
healthcheck:
disable: true
nodemanager:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.1.3-java8
networks:
- hbase
ports:
- 18042:8042
environment:
SERVICE_PRECONDITION: "namenode:9000 datanode:9864 resourcemanager:8088"
env_file:
- ./hadoop.env
deploy:
mode: global
restart_policy:
condition: on-failure
max_attempts: 3
placement:
constraints:
- node.labels.hadoop-datanode == datanode
labels:
traefik.docker.network: hbase
traefik.port: 8042
historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.1.3-java8
networks:
- hbase
ports:
- 18188:8188
volumes:
- hadoophistoryserver:/hadoop/yarn/timeline
environment:
SERVICE_PRECONDITION: "namenode:9000 datanode:9864 nodemanager:8042 resourcemanager:8088"
env_file:
- ./hadoop.env
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.hostname == hadoop1
labels:
traefik.docker.network: hbase
traefik.port: 8188
volumes:
datanode:
namenode:
hadoophistoryserver:
networks:
hbase:
external:
name: hbase
10、部署hadoop集群环境
docker stack deploy -c docker-compose-hadoop.yml hadoop
成功启动后, 主节点可以看到对应的实例信息:
其他两个节点, 可以看到nodemanager与datanode实例:
如果失败, 删除重建:
docker stack rm hadoop
11、访问管理界面
安装成功后, 可以通过宿主机的映射端口直接访问:
二、安装Docker HBase2.X分布式集群
1、自定义生成Hbase镜像
1)构建Dockerfile脚本:
进入目录:
mkdir -p /usr/local/hadoop-hbase/docker-hbase-master/hbase_base
cd /usr/local/hadoop-hbase/docker-hbase-master/hbase_base
Dockerfile脚本
FROM debian:9
MAINTAINER Mirson <mirson.ho@gmail.com>
RUN echo > /etc/apt/sources.list
RUN echo "deb http://mirrors.aliyun.com/debian/ stretch main non-free contrib \ndeb-src http://mirrors.aliyun.com/debian/ stretch main non-free contrib \ndeb http://mirrors.aliyun.com/debian-security stretch/updates main \ndeb-src http://mirrors.aliyun.com/debian-security stretch/updates main \ndeb http://mirrors.aliyun.com/debian/ stretch-updates main non-free contrib \ndeb-src http://mirrors.aliyun.com/debian/ stretch-updates main non-free contrib \ndeb http://mirrors.aliyun.com/debian/ stretch-backports main non-free contrib \ndeb-src http://mirrors.aliyun.com/debian/ stretch-backports main non-free contrib" > /etc/apt/sources.list
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
openjdk-8-jdk \
net-tools \
curl \
netcat \
gnupg \
libtinfo5 \
vim \
&& rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
ENV HBASE_VERSION 2.3.5
ENV HBASE_URL http://archive.apache.org/dist/hbase/$HBASE_VERSION/hbase-$HBASE_VERSION-bin.tar.gz
RUN set -x \
&& curl -fSL "$HBASE_URL" -o /tmp/hbase.tar.gz \
&& curl -fSL "$HBASE_URL.asc" -o /tmp/hbase.tar.gz.asc \
&& tar -xvf /tmp/hbase.tar.gz -C /opt/ \
&& rm /tmp/hbase.tar.gz*
RUN ln -s /opt/hbase-$HBASE_VERSION/conf /etc/hbase
RUN mkdir /opt/hbase-$HBASE_VERSION/logs
COPY core-site.xml /opt/hbase-$HBASE_VERSION/conf
COPY hdfs-site.xml /opt/hbase-$HBASE_VERSION/conf
RUN mkdir /hadoop-data
ENV HBASE_PREFIX=/opt/hbase-$HBASE_VERSION
ENV HBASE_CONF_DIR=/etc/hbase
ENV USER=root
ENV PATH $HBASE_PREFIX/bin/:$PATH
ADD entrypoint.sh /entrypoint.sh
RUN chmod a+x /entrypoint.sh
EXPOSE 16000 16010 16020 16030
ENTRYPOINT ["/entrypoint.sh"]
将hadoop的配置文件COPY至当前目录下
docker cp b1e6:/opt/hadoop-3.1.3/etc/hadoop/core-site.xml .
docker cp b1e6:/opt/hadoop-3.1.3/etc/hadoop/hdfs-site.xml .
2)创建entrypoint.sh脚本, 用于实现对hbase的参数配置管理
#!/bin/bash
function addProperty() {
local path=$1
local name=$2
local value=$3
local entry="<property><name>$name</name><value>${value}</value></property>"
local escapedEntry=$(echo $entry | sed 's/\//\\\//g')
sed -i "/<\/configuration>/ s/.*/${escapedEntry}\n&/" $path
}
function configure() {
local path=$1
local module=$2
local envPrefix=$3
local var
local value
echo "Configuring $module"
for c in `printenv | perl -sne 'print "$1 " if m/^${envPrefix}_(.+?)=.*/' -- -envPrefix=$envPrefix`; do
name=`echo ${c} | perl -pe 's/___/-/g; s/__/_/g; s/_/./g'`
var="${envPrefix}_${c}"
value=${!var}
echo " - Setting $name=$value"
addProperty /etc/hbase/$module-site.xml $name "$value"
done
}
configure /etc/hbase/hbase-site.xml hbase HBASE_CONF
function wait_for_it()
{
local serviceport=$1
local service=${serviceport%%:*}
local port=${serviceport#*:}
local retry_seconds=5
local max_try=100
let i=1
nc -z $service $port
result=$?
until [ $result -eq 0 ]; do
echo "[$i/$max_try] check for ${service}:${port}..."
echo "[$i/$max_try] ${service}:${port} is not available yet"
if (( $i == $max_try )); then
echo "[$i/$max_try] ${service}:${port} is still not available; giving up after ${max_try} tries. :/"
exit 1
fi
echo "[$i/$max_try] try in ${retry_seconds}s once again ..."
let "i++"
sleep $retry_seconds
nc -z $service $port
result=$?
done
echo "[$i/$max_try] $service:${port} is available."
}
for i in "${SERVICE_PRECONDITION[@]}"
do
wait_for_it ${i}
done
exec $@
3)构建镜像
进入Dockerfile目录, 执行:
docker build -f ./Dockerfile -t bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8 .
注意后面要有个点号“.”,如果下载hbase包太慢, 可以先下载好, 再上传
...
ENV HBASE_VERSION 2.3.5
ADD hbase-2.3.5-bin.tar.gz /opt/
RUN ln -s /opt/hbase-$HBASE_VERSION/conf /etc/hbase
...
ADD与COPY命令不同, 会自行解压。
4) 将生成的镜像同步至其他节点
# 导出镜像
docker save bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8 > hbase_image.tar
# 复制其他节点
scp hbase_image.tar root@192.168.102:/root
scp hbase_image.tar root@192.168.103:/root
# 导入镜像
docker load -i hbase_image.tar
2、部署ZooKeeper
这里搭建三个节点的Zookeeper集群。
1)docker-compose-zookeeper集群配置脚本:
version: '3'
services:
zoo1:
image: zookeeper:3.4.10
networks:
- hbase
volumes:
- zoo1_data:/data
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.hostname == hadoop1
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
zoo2:
image: zookeeper:3.4.10
networks:
- hbase
volumes:
- zoo2_data:/data
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.hostname == hadoop2
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
zoo3:
image: zookeeper:3.4.10
networks:
- hbase
volumes:
- zoo3_data:/data
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.hostname == hadoop3
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
volumes:
zoo1_data:
zoo2_data:
zoo3_data:
networks:
hbase:
external:
name: hbase
2) 部署Zookeeper集群:
docker stack deploy -c docker-compose-zookeeper.yml zookeeper
3)查看容器
[root@hadoop1 docker-hbase-master]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fe84630dce2d zookeeper:3.4.10 "/docker-entrypoint.…" 48 seconds ago Up 47 seconds 2181/tcp, 2888/tcp, 3888/tcp zookeeper_zoo1.1.r98xzm9bklsdau1ydu4eug5d3
...
执行成功, 每台节点会新增一个zookeeper的实例。
如果失败, 删除重新部署:
docker stack rm zookeeper
3、部署traefik
需安装此组件, 用于负责管理hbase内部节点的名称连接。
docker service create --name traefik --constraint node.hostname==hadoop1 --publish 18880:80 --publish 18080:8080 --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock --network hbase traefik --api.insecure=true --providers.docker
成功后可以看到对应服务名:
访问后台管理界面:
4、配置Hbase的docker-compose文件
这里在主节点运行一个HMaster, 其余两个节点运行regionserver。
1) docker-compose-hbase.yml配置脚本:
version: '3.2'
services:
HMaster:
image: bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8
networks:
- hbase
ports:
- target: 16000
published: 16000
protocol: tcp
mode: host
- target: 16010
published: 16010
protocol: tcp
mode: host
env_file:
- ./hbase.env
command:
- /opt/hbase-2.3.5/bin/hbase master start
deploy:
mode: replicated
replicas: 1
endpoint_mode: dnsrr
restart_policy:
condition: none
placement:
constraints:
- node.hostname == hadoop1
labels:
traefik.docker.network: hbase
traefik.port: 16010
RegionServer1:
image: bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8
networks:
- hbase
ports:
- target: 16020
published: 26020
protocol: tcp
mode: host
- target: 16030
published: 26030
protocol: tcp
mode: host
env_file:
- ./hbase.env
command:
- /opt/hbase-2.3.5/bin/hbase regionserver start
deploy:
mode: replicated
replicas: 1
endpoint_mode: dnsrr
restart_policy:
condition: none
placement:
constraints:
- node.hostname == hadoop2
environment:
HBASE_CONF_hbase_regionserver_hostname: RegionServer1
RegionServer2:
image: bde2020/hadoop-hmaster:2.0.0-hmaster2.3.5-java8
networks:
- hbase
ports:
- target: 16020
published: 36020
protocol: tcp
mode: host
- target: 16030
published: 36030
protocol: tcp
mode: host
env_file:
- ./hbase.env
command:
- /opt/hbase-2.3.5/bin/hbase regionserver start
deploy:
mode: replicated
replicas: 1
endpoint_mode: dnsrr
restart_policy:
condition: none
placement:
constraints:
- node.hostname == hadoop3
environment:
HBASE_CONF_hbase_regionserver_hostname: RegionServer2
networks:
hbase:
external:
name: hbase
2)hbase.env 配置文件,管理hbase的配置
HBASE_CONF_hbase_rootdir=hdfs://namenode:9000/hbase
HBASE_CONF_hbase_cluster_distributed=true
HBASE_CONF_hbase_zookeeper_quorum=zoo1,zoo2,zoo3
HBASE_CONF_hbase_master=HMaster:16000
HBASE_CONF_hbase_master_hostname=HMaster
HBASE_CONF_hbase_master_port=16000
HBASE_CONF_hbase_master_info_port=16010
HBASE_CONF_hbase_regionserver_port=16020
HBASE_CONF_hbase_regionserver_info_port=16030
HBASE_MANAGES_ZK=false
5、部署Hbase集群环境
docker stack deploy -c docker-compose-hbase.yml hbase
访问HBase的管理界面:
6、验证Hbase环境
# 进入HMaster容器
docker exec -it 6a5b bash
# 进入Shell
hbase shell
# 创建表
hbase(main):001:0> create 'mirson','country','address','email'
Created table mirson
Took 1.2737 seconds
# 查看表
hbase(main):002:0> list
TABLE
mirson
1 row(s)
Took 0.0209 seconds
=> ["mirson"]
查看管理界面,显示刚才创建的表:
至此, Hadoop3.X + Hbase2.X的分布式集群环境已搭建完成。
7、FAQ问题
1) 如果启动hbase集群后报错: There are 2 datanode(s) running and 2 node(s) are excluded in this operation…
将hadoop下的core-site.xml与hdfs-site.xml复制到Hbase的配置目录下。
2) 如果出现: port published with ingress mode can’t be used with dnsrr mode
在docker-compose配置文件中,将端口改成此方式映射:
ports:
- target: 3000
published: 3000
protocol: tcp
mode: host