最近想去学一下Hive,结果发现在搭建环境这一步花了好大一笔时间才搞定,然而实际上多数人在工作时是不需要自己搭建环境的。因此我把自己已经搭建好(Java&Hadoop&MySQL&Hive)环境的虚拟机分享出来供小伙伴们直接使用,同时也把搭建过程记录的内容分享在下面。
- 系统下载–>百度网盘
由于网盘限制,文件采用分卷压缩的形式上传。
OVF
目录下为虚拟机导出文件,需要重新配置网卡信息;VirtualBox_VMs
目录以及Virtual_Machines
目录下为分别在VirtualBox 以及 VMware Workstation 下创建的Linux虚拟机的完整工作目录,应当不需要配置网卡。系统内所有密码均为Hadoop。Hadoop采用伪分布式,所以压缩包内只有一台虚拟机。虚拟机环境在VMware Workstation 16 上搭建,理论上VMware Workstation以及Oracle VM VirtualBox均可加载。
- 具体环境–>Ubuntu 20.04
Software | Version | Software | Version |
Java | 1.8 | Hadoop | 2.7.1 |
MySQL | 8.0 | Hive | 2.8.3 |
- 搭建过程
文章目录
- 方向键乱码
- 安装jdk1.8
- 下载&安装
- 配置环境
- 测试java环境
- Hadoop
- 创建Hadoop用户(选做)
- 配置ssh无密码登录
- 下载&安装 Hadoop
- 配置Hadoop
- 1. 配置hadoop-env.sh,
- 2. 配置core-site.xml
- 3. 配置hdfs-site.xml
- 4. 配置mapred-site.xml
- 5. 配置yarn-site.xml
- 启动Hadoop
- Hive
- Hive2.3.8安装
- 启动Hive
- 初始化默认derby数据库(如果使用MySQL则跳过这步)
- 连接MySQL(8.0)数据库
方向键乱码
sudo gedit /etc/vim/vimrc.tiny
- 进行如下设置
set nocompatible
set backspace=2
安装jdk1.8
下载&安装
- 下载,使用root用户 --> # <–,目录:
/opt
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz
- 解压,使用root用户 --> # <–,目录:
/opt
tar -zxvf jdk-8u131-linux-x64.tar.gz # 解压
mv jdk-8u131-linux-x64/ jdk # 改名
配置环境
- 编辑
profile
,使用root用户 --> # <–
vi /etc/profile
- 进行如下设置
• export JAVA_HOME=/opt/jdkexport JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATHexport JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/binexport PATH=$PATH:${JAVA_PATH}
- 使环境变量生效,使用root用户 --> # <–
source /etc/profile
测试java环境
java -version
java version “1.8.0_131”
Java™ SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot™ 64-Bit Server VM (build 25.131-b11, mixed mode)
Hadoop
创建Hadoop用户(选做)
因为后面配置免密登录、添加权限等操作都是以hadoop用户名为例,所以这里提一下,如果不是使用hadoop用户的话记得后边追加权限时修改对应的用户名
sudo useradd -m hadoop -s /bin/bash # 创建用户名为hadoop的用户
sudo passwd hadoop # 设置hadoop用户的密码
sudo adduser hadoop sudo # 对hadoop用户追加管理员权限
sudo chown -R hadoop /opt # 为hadoop用户添加/opt目录读写权限
配置ssh无密码登录
- 安装
ssh
服务
sudo apt-get install openssh-server
- 测试从
localhost
登录,使用hadoop用户 --> $ <–,此时应当需要输入密码才能登录
ssh localhost
exit
logout
Connection to localhost closed.
- 创建密钥,使用hadoop用户 --> $ <–,目录:
~
,
cd ~/.ssh/
ssh-keygen -t rsa
cat ./id_rsa.pub >> ./authorized_keys
- 测试从
localhost
登录,使用hadoop用户 --> $ <–,此时可以直接登录,无需输入密码
ssh localhost
exit
下载&安装 Hadoop
- 下载,使用root用户 --> # <–,目录:
/opt
,其他版本下载地址
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
tar -zxvf hadoop-2.7.1.tar.gz # 解压
mv hadoop-2.7.1/ hadoop # 改名
rm -f hadoop-2.7.1.tar.gz # 删除下载的安装包
chown -R hadoop ./hadoop # 修改目录权限
- 创建目录,使用hadoop用户 --> $ <–
mkdir /opt/hadoop/tmp # 创建目录
mkdir /opt/hadoop/hdfs
mkdir /opt/hadoop/hdfs/data
mkdir /opt/hadoop/hdfs/name
- 如果是在root用户下创建目录,则需要为hadoop用户追加读写权限
chown -R hadoop /opt/hadoop
- 设置环境变量,使用hadoop用户 --> $ <–,(此环境变量仅对hadoop用户生效)
vi ~/.bash_profile
- 进行如下配置
export HADOOP_HOME=/opt/hadoopexport PATH=$PATH:$HADOOP_HOME/bin
- 使环境变量生效,使用hadoop用户 --> $ <–
source ~/.bash_profile
- 测试环境变量是否有效
hadoop version
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /opt/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar
配置Hadoop
以下内容均可在hadoop用户下执行 --> $ <–
1. 配置hadoop-env.sh,
vi /opt/hadoop/etc/hadoop/hadoop-env.sh
将export JAVA_HOME=${JAVA_HOME}
修改为jdk的绝对路径export JAVA_HOME=/opt/jdk
这里也可以不设置,但有时在启动NameNode时会报错
Error: JAVA_HOME is not set and could not be found.
所以还是先设置一下
2. 配置core-site.xml
vi /opt/hadoop/etc/hadoop/core-site.xml
添加如下内容
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>HDFS的URI,文件系统://namenode标识:端口号</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
3. 配置hdfs-site.xml
vi /opt/hadoop/etc/hadoop/hdfs-site.xml
添加如下内容
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/hdfs/name</value>
<description>namenode上存储hdfs名字空间元数据 </description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/hdfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>副本个数,配置默认是3,应小于datanode机器数量</description>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
</configuration>
4. 配置mapred-site.xml
vi /opt/hadoop/etc/hadoop/mapred-site.xml
添加如下内容
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5. 配置yarn-site.xml
vi /opt/hadoop/etc/hadoop/yarn-site.xml
添加如下内容
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
启动Hadoop
- 格式化 NameNode
/opt/hadoop/bin/hdfs namenode -format
21/06/05 07:41:01 INFO common.Storage: Storage directory /opt/hadoop/hdfs/name has been successfully formatted.
21/06/05 07:41:01 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/06/05 07:41:01 INFO util.ExitUtil: Exiting with status 0
21/06/05 07:41:01 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
- 启动 NameNode
/opt/hadoop/sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /opt/hadoop/logs/hadoop-hadoop-namenode-ubuntu.out
localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-hadoop-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/logs/hadoop-hadoop-secondarynamenode-ubuntu.out
- 启动 Yarn
/opt/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/logs/yarn-hadoop-resourcemanager-ubuntu.out
localhost: starting nodemanager, logging to /opt/hadoop/logs/yarn-hadoop-nodemanager-ubuntu.out
- 查看是否正常启动
jps
49121 NodeManager
49329 Jps
48546 DataNode
48995 ResourceManager
48730 SecondaryNameNode
48395 NameNode
- Web中访问http://localhost:50070可以查看 NameNode、Datanode、HDFS的相关信息。
- Web中访问http://localhost:8088可以查看任务运行情况
Hive
Hive2.3.8安装
- 其他版本下载地址
- 下载&安装,使用root用户 --> # <–,目录:
/opt
wget http://archive.apache.org/dist/hive/hive-2.3.8/apache-hive-2.3.8-bin.tar.gz
tar -zxvf apache-hive-2.3.8-bin.tar.gz # 解压
mv apache-hive-2.3.8-bin/ hive # 改名
rm -f apache-hive-2.3.8-bin.tar.gz # 删除下载的tar包
chown -R hadoop /opt/hive # 为hadoop用户添加读写权限
- 配置,使用hadoop用户即可 --> $ <–
mv /opt/hive/conf/hive-env.sh.template /opt/hive/conf/hive-env.sh
vi /opt/hive/conf/hive-env.sh
- 追加以下两行,即hadoop的路径以及hive的配置文件路径
export HADOOP_HOME=/opt/hadoop
export HIVE_CONF_DIR=/opt/hive/conf
启动Hadoop,一定记得启动Hadoop
/opt/hadoop/sbin/start-dfs.sh
/opt/hadoop/sbin/start-yarn.sh
- 创建相关目录,附加相关权限,(这步必须驱动hadoop后执行)
/opt/hadoop/bin/hadoop fs -mkdir /tmp
/opt/hadoop/bin/hadoop fs -mkdir -p /user/hive/warehouse
/opt/hadoop/bin/hadoop fs -chmod g+w /tmp
/opt/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse
启动Hive
初始化默认derby数据库(如果使用MySQL则跳过这步)
/opt/hive/bin/schematool -initSchema -dbType derby
/opt/hive/bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.8.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
连接MySQL(8.0)数据库
- 安装MySQL
wget https://dev.mysql.com/get/mysql-apt-config_0.8.17-1_all.deb
sudo dpkg -i mysql-apt-config_0.8.17-1_all.deb
选择(其他选择OK
):MySQL Server & Cluster (Currently selected: mysql 8.0)
mysql-8.0
sudo apt update
sudo apt install mysql-server
Use Legacy Authentication Method (Retain MySQL 5.x ...
- 配置Metastore到MySQL
vi /opt/hive/conf/hive-site.xml
添加一以下内容
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=GMT&createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value>
<description>password to use against metastore database</description>
</property>
</configuration>
- 安装驱动
/opt/hive
目录
wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.11.tar.gz
tar -zxvf mysql-connector-java-8.0.11.tar.gz
mv /opt/hive/mysql-connector-java-8.0.11/mysql-connector-java-8.0.11.jar /opt/hive/lib/mysql-connector-java-8.0.11.jar
rm -f /opt/hive/mysql-connector-java-8.0.11.tar.gz
rm -rf /opt/hive/mysql-connector-java-8.0.11
- 初始化
/opt/hive/bin
目录
./schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=GMT&createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
- 设置环境变量,使用hadoop用户 --> $ <–,(此环境变量仅对hadoop用户生效)
vi ~/.bash_profile
- 进行如下配置
export HIVE_HOME=/opt/hive
export PATH=$PATH:$HIVE_HOME/bin
- 使环境变量生效,使用hadoop用户 --> $ <–
source ~/.bash_profile
- 启动hive
hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-2.3.8.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>