目录

一、Docker介绍

二、Docker安装

2.1 Centos Docker安装

2.2 Ubuntu Docker安装【推荐】

2.3 MacOs Docker安装

2.4 Windows Docker安装【不推荐】

三、容器准备

3.1 启动Docker

3.2 拉取镜像

3.3 启动并创建容器

3.4 进入容器

四、环境准备

4.1 安装必要软件

4.2 配置SSH免密登录

4.3 设置时区

4.4 关闭防火墙

4.5 时间同步、静态ip、主机映射

4.6 shell连接容器

 五、MYSQL安装

5.1 上传解压安装包

5.2 安装必要依赖

5.4启动并配置Mysql

六、安装JDK

6.1 上传并解压

6.2 配置环境变量

6.3 查看版本

七、Hadoop安装

7.1 上传并解压

7.2 修改配置

7.3 添加变量

7.4 HDFS格式化

7.5 启动Hadoop服务

7.6 Web端查看

八、Hive安装

 8.1 上传并解压

8.2 修改配置

8.3 添加依赖包

8.4 添加环境变量

8.5 启动服务

8.6 Jps查看

九、Sqoop安装

9.1 上传并解压

9.2 修改sqoop-env.sh

9.3 添加依赖包

9.4 添加环境变量

9.5 查看版本

十、Flume安装

10.1 上传并解压

10.2 删除依赖

10.3 添加环境变量


一、Docker介绍

Docker 是一个开源的应用容器引擎,基于 Go 语言 并遵从 Apache2.0 协议开源。

Docker 可以让开发者打包他们的应用以及依赖包到一个轻量级、可移植的容器中,然后发布到任何流行的 Linux 机器上,也可以实现虚拟化。

容器是完全使用沙箱机制,相互之间不会有任何接口(类似 iPhone 的 app),更重要的是容器性能开销极低。

Docker 从 17.03 版本之后分为 CE(Community Edition: 社区版) 和 EE(Enterprise Edition: 企业版)。

二、Docker安装

启用转发路由器功能,若ip_forword=0,会导致虚拟机重启后虚拟机内Docker镜像连不上

echo 'net.ipv4.ip_forword=1'>>/usr/lib/sysctl.d/50-default.conf

sysctl -p /usr/lib/sysctl.d/50-default.conf

2.1 Centos Docker安装

# 镜像比较大, 需要准备一个网络稳定的环境
# 其中--mirror Aliyun代表使用阿里源
curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun

2.2 Ubuntu Docker安装【推荐】

curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun

2.3 MacOs Docker安装

# 下载安装包, 拖动安装即可
https://hub.docker.com/editions/community/docker-ce-desktop-mac/

2.4 Windows Docker安装【不推荐】

# win10家庭版 【参考】
https://docs.docker.com/docker-for-windows/install-windows-home/

# win10专业版、商业版或教育版 【参考】
https://docs.docker.com/docker-for-windows/install/

三、容器准备

3.1 启动Docker

systemctl start docker

3.2 拉取镜像

docker pull image_name:version

#案例
docker pull centos:7

查看镜像

docker images

删除镜像

docker rmi centos:7

3.3 启动并创建容器

docker run -itd --privileged --name singleNode -h singleNode \
-p 2222:22 \
-p 3306:3306 \
-p 8020:8020 \
-p 9870:9870 \
-p 19888:19888 \
-p 8088:8088 \
-p 9083:9083 \
-p 10000:10000 \
-p 2181:2181 \
-p 9092:9092 \
-p 8091:8091 \
-p 8080:8080 \
-p 16010:16010 \
-p 4000:4000 \
-p 3000:3000 \
centos:7 /usr/sbin/init

# 其中端口号解释
2222:22# SSH
3306:3306 #MySQL
8020:8020 # HDFS RPC
9870:9870 # HDFS web UI
19888:19888 # Yarn job history 
8088:8088 # Yarn web UI
9083:9083 # Hive metastore
10000:10000 # HiveServer2
2181:2181 # zk
9092:9092 # kafka
8091:8091 # flink

查看容器

docker ps -a
# -a 显示所有容器,包括已经退出的

进入容器

docker exec -it container_name|container_ID bash

关闭容器

docker stop continer_name|continer_ID
# 示例
docker stop test

删除容器

docker rm test

3.4 进入容器

docker exec -it singleNode /bin/bash

四、环境准备

4.1 安装必要软件

yum clean all    #净化容器安装环境
yum -y install unzip bzip2-devel vim bashname
yum install kde-l10n-Chinese -y
yum install glibc-common -y
localedef -c -f UTF-8 -i zh_CN zh_CN.utf8
echo "export LANG=zh_CN.UTF-8" >> /etc/locale.conf

4.2 配置SSH免密登录

# 修改root密码
passwd root  # 输入两次密码

# 安装必要SSH服务
yum install -y openssh openssh-server openssh-clients openssl openssl-devel 
# 生成秘钥
ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' 

#方式1:
# 配置免密
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

# 方式2:
# 启动SSH服务
systemctl start sshd
ssh-copy-id

ssh singleNode

4.3 设置时区

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

4.4 关闭防火墙

yum -y install firewalld
systemctl stop firewalld
systemctl disable firewalld

4.5 时间同步、静态ip、主机映射

  • 由于本次课使用docker进行环境搭建,所以对于静态ip和主机映射可以不用配置
  • 由于本次课搭建的是单节点的伪分布式集群,所以时间同步可以用不设置
  • 如果在物理机上搭建多节点的完全分布式集群则必须配置

4.6 shell连接容器

SSH连接到宿主主机,ip为宿主机ip,端口为容器映射的端口

docker 大数据租户 docker 大数据环境搭建_docker 大数据租户

 五、MYSQL安装

5.1 上传解压安装包

cd /opt/software/
tar xvf MySQL-5.5.40-1.linux2.6.x86_64.rpm-bundle.tar -C /opt/install

5.2 安装必要依赖

yum -y install libaio perl

5.3 安装服务端和客户端

cd /opt/install

rpm -ivh MySQL-server-5.5.40-1.linux2.6.x86_64.rpm
rpm -ivh MySQL-client-5.5.40-1.linux2.6.x86_64.rpm

5.4启动并配置Mysql

方式一【推荐】

# 启动服务
systemctl start mysql
# 修改MySQL密码
/usr/bin/mysqladmin -u root password 'root'
# 登陆MySQL设置权限
mysql -uroot -proot 
> update mysql.user set host='%' where host='localhost';
> delete from mysql.user where host<>'%' or user='';
> flush privileges;

方式二

# 启动服务
systemctl start mysql
# 执行MySQL的初始化
/usr/bin/mysql_secure_installation
# 输入一次回车, 两次相同的密码进行修改密码
# Remove anonymous users? [Y/n] 是否移除掉anonymous用户  n
# Disallow root login remotely? [Y/n] 是否允许root用户远程登录 y
# Remove test database and access to it? [Y/n] 是否移除掉test数据库 n
# Reload privilege tables now? [Y/n] 是否现在刷新权限 y

# 登陆MySQL设置权限
mysql -uroot -proot 
> update mysql.user set host='%' where host='localhost';
> delete from mysql.user where host<>'%' or user='';
> flush privileges;

六、安装JDK

6.1 上传并解压

tar zxvf /opt/software/jdk-8u171-linux-x64.tar.gz -C /opt/install/

#创建软链接
ln -s /opt/install/jdk1.8.0_171 /opt/install/java

6.2 配置环境变量

环境变量配置在 ~/.bashrc里

vim ~/.bashrc
-------------------------------------------
export JAVA_HOME=/opt/install/java
export PATH=$JAVA_HOME/bin:$PATH
-------------------------------------------
source ~/.bashrc

6.3 查看版本

java -version

七、Hadoop安装

7.1 上传并解压

tar zxvf hadoop-3.2.1.tar.gz -C /opt/install/
ln -s /opt/install/hadoop-3.2.1/ /opt/install/hadoop

7.2 修改配置

# 进入路径
cd /opt/install/hadoop/etc/hadoop/

7.2.1 配置core-site.xml

vim core-site.xml
-------------------------------------------
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://singleNode:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/install/hadoop/data</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>
</configuration>
-------------------------------------------

7.2.2 配置hdf-site.xml

vim hdfs-site.xml
-------------------------------------------
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>singleNode:9868</value>
    </property>
</configuration>
-------------------------------------------

7.2.3 配置mapred-site.xml

vim mapred-site.xml
-------------------------------------------
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.address</name>
    <value>singleNode:10020</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>singleNode:19888</value>
  </property>
</configuration>
-------------------------------------------

7.2.4 配置yarn-site.xml

vi yarn-site.xml
-------------------------------------------
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>singleNode</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>

    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>  
        <name>yarn.log.server.url</name>  
        <value>http://${yarn.timeline-service.webapp.address}/applicationhistory/logs</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
    <property>
        <name>yarn.timeline-service.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.timeline-service.hostname</name>
        <value>${yarn.resourcemanager.hostname}</value>
    </property>
    <property>
        <name>yarn.timeline-service.http-cross-origin.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
        <value>true</value>
    </property>
</configuration>
-------------------------------------------

7.2.5 配置hadoop-env.sh

vim hadoop-env.sh 
-------------------------------------------
export JAVA_HOME=/opt/install/java
-------------------------------------------

7.2.6 配置mapred-env.sh

vim mapred-env.sh 
-------------------------------------------
export JAVA_HOME=/opt/install/java
-------------------------------------------

7.2.7 配置yarn-env.sh

vim yarn-env.sh
-------------------------------------------
export JAVA_HOME=/opt/install/java
-------------------------------------------

7.2.8 配置workers

vi workers
-------------------------------------------
singleNode
-------------------------------------------

7.3 添加变量

vim ~/.bashrc
------------------------------------------------
export HADOOP_HOME=/opt/install/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
------------------------------------------------
source ~/.bashrc

# 以下两个配置文件都需要配置 切记:配置在前边,不要配置在最后
start-dfs.sh
stop-dfs.sh
------------------------------------------------
HDFS_NAMENODE_USER=root 
HDFS_DATANODE_USER=root 
HDFS_SECONDARYNAMENODE_USER=root 
YARN_RESOURCEMANAGER_USER=root 
YARN_NODEMANAGER_USER=root
------------------------------------------------

# 以下两个配置文件都需要配置 切记:配置在前边,不要配置在最后
start-yarn.sh
stop-yarn.sh
------------------------------------------------
YARN_RESOURCEMANAGER_USER=root 
HADOOP_SECURE_DN_USER=yarn 
YARN_NODEMANAGER_USER=root
------------------------------------------------

7.4 HDFS格式化

hdfs namenode -format

7.5 启动Hadoop服务

# 启动HDFS
start-dfs.sh
# 启动yarn
start-yarn.sh
# 启动历史服务器
mapred --daemon start historyserver

7.6 Web端查看

查看9870端口

docker 大数据租户 docker 大数据环境搭建_数据仓库_02

查看8088端口

docker 大数据租户 docker 大数据环境搭建_大数据_03

八、Hive安装

 8.1 上传并解压

tar zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/install/
ln -s /opt/install/apache-hive-3.1.2-bin/ /opt/install/hive

8.2 修改配置

# 进入路径
cd /opt/install/hive/conf/

8.2.1 修改hive-site.xml

cp hive-default.xml.template hive-site.xml
vi hive-site.xml
# !!!注意:删除掉原来的默认配置 dG 从当前行删除到最后一行
-------------------------------------------
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://singleNode:3306/metastore?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://singleNode:9083</value>
    </property>
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>singleNode</value>
    </property>
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
</configuration>
-------------------------------------------

8.2.2 修改hive-env.sh

cp hive-env.sh.template hive-env.sh
vi hive-env.sh
-------------------------------------------
HADOOP_HOME=/opt/install/hadoop
-------------------------------------------

8.3 添加依赖包

cp /opt/software/mysql-connector-java-5.1.31.jar /opt/install/hive/lib/

8.4 添加环境变量

vi ~/.bashrc
------------------------------------------------
export HIVE_HOME=/opt/install/hive
export PATH=$HIVE_HOME/bin:$PATH
------------------------------------------------
source ~/.bashrc

8.5 启动服务

# 初始化元数据表
schematool -dbType mysql -initSchema
# 启动hiveserver2服务
nohup hive --service hiveserver2 &
# 启动元数据服务
nohup hive --service metastore &

#############报错 Exception in thread "main" java.lang.NoSuchMethodError ################
# jar 包冲突, 需要删除低版本包
rm -rf /opt/install/hive/lib/guava-19.0.jar
cp /opt/install/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /opt/install/hive/lib/

8.6 Jps查看

docker 大数据租户 docker 大数据环境搭建_大数据_04

九、Sqoop安装

9.1 上传并解压

tar zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt/install/
ln -s /opt/install/sqoop-1.4.7.bin__hadoop-2.6.0/ /opt/install/sqoop

9.2 修改sqoop-env.sh

cd /opt/install/sqoop/conf/
cp sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
-------------------------------------------
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/install/hadoop

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/install/hadoop

#Set the path to where bin/hive is available
export HIVE_HOME=/opt/install/hive
-------------------------------------------

9.3 添加依赖包

cp /opt/software/mysql-connector-java-5.1.31.jar /opt/install/sqoop/lib/
cp /opt/software/commons-lang-2.6.jar /opt/install/sqoop/lib/
cp /opt/software/java-json.jar /opt/install/sqoop/lib/

9.4 添加环境变量

vi ~/.bashrc
------------------------------------------------
export SQOOP_HOME=/opt/install/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
------------------------------------------------

9.5 查看版本

sqoop version

十、Flume安装

10.1 上传并解压

tar zxvf /opt/software/apache-flume-1.9.0-bin.tar.gz -C /opt/install/
cd /opt/install/
ln -s apache-flume-1.9.0-bin/ flume

10.2 删除依赖

cd flume/
rm -rf lib/guava-11.0.2.jar

10.3 添加环境变量

vi ~/.bashrc
---------------------
export FLUME_HOME=/opt/install/flume
export PATH=$FLUME_HOME/bin:$PATH
---------------------
source ~/.bashrc