文章目录
- 注意事项
- Hadoop 和 Hive 整合
- 搭建Hive
- 1. 安装MySQL
- 2. 安装Hive
- 3. 启动Hive
- 4. DataGrip连接hive
注意事项
- Hive是一款基于Hadoop的数据仓库软件,不管使用何种方式配置Hive Metastore,必须先保证服务器的基础环境正常,Hadoop集群健康可用
- 服务器基础环境
- 集群时间同步、防火墙关闭、主机Host映射、免密登录、JDK安装
- Hadoop集群健康可用
- 启动Hive之前必须先启动Hadoop集群
- 特别要注意,需等待HDFS安全模式关闭之后再启动运行Hive
- Hive不是分布式安装运行的软件,其分布式的特性主要借由Hadoop完成。包括分布式存储、分布式计算
- metastore 服务配置模式
- 本次配置的是远程模式
Hadoop 和 Hive 整合
- 因为Hive需要把数据存储在HDFS上,并且通过MapReduce作为执行引擎处理数据
- 因此需要在Hadoop中添加相关配置属性,以满足Hive在Hadoop上运行
- 修改Hadoop中core-site.xml,并且Hadoop集群同步配置文件,重启生效
<!-- 整合hive 用户代理设置 -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
搭建Hive
1. 安装MySQL
注意MySQL只需要在一台机器安装并且需要授权远程访问
我选择把MySQL安装在node1上
- 卸载Centos7自带的mariadb
[root@node1 ~]# rpm -qa|grep mariadb
mariadb-libs-5.5.64-1.el7.x86_64
You have new mail in /var/spool/mail/root
[root@node1 ~]# rpm -e --nodeps mariadb-libs-5.5.64-1.el7.x86_64
[root@node1 ~]# rpm -qa|grep mariadb
rpm -qa | grep xxx
| 是管道技术,删选的意思 ; grep 查找——一种文本搜索工具
解读: 在-qa返回的信息中对xxx进行查找,并返回为最终结果
rpm -e --nodeps <rpm packagename>
不检查依赖而强制删除
- 安装MySQL
[root@node1 ~]# mkdir /export/software/mysql
# 上传mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar 到上述文件夹下 解压
[root@node1 ~]# cd /export/software/mysql
[root@node1 mysql]# ls
mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar
[root@node1 mysql]# tar -xvf mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar
[root@node1 mysql]# yum -y install libaio
[root@node1 mysql]# rpm -ivh mysql-community-common-5.7.29-1.el7.x86_64.rpm mysql-community-libs-5.7.29-1.el7.x86_64.rpm mysql-community-client-5.7.29-1.el7.x86_64.rpm mysql-community-server-5.7.29-1.el7.x86_64.rpm
- MySQL 初始化设置
#初始化
[root@node1 mysql]# mysqld --initialize
#更改所属组
[root@node1 mysql]# chown mysql:mysql /var/lib/mysql -R
#启动mysql
[root@node1 mysql]# systemctl start mysqld.service
#查看生成的临时root密码
[root@node1 mysql]# cat /var/log/mysqld.log
# 找到类似如下信息,这里 >kl-lOa!i6FB 就是生成的临时root密码
[Note] A temporary password is generated for root@localhost: >kl-lOa!i6FB
chown mysql:mysql /var/lib/mysql -R
递归地将/var/lib/mysql以及其子目录下的所有文件的所有者皆变更为用户mysql
- 修改root密码,授权远程访问,设置开机自启动
[root@node1 mysql]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.29
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
# 更新root密码,设置为hadoop
# 使用user()可以获取当前用户名
mysql> alter user user() identified by 'hadoop';
# 授权
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'hadoop' WITH GRANT OPTION;
# 刷新
mysql> FLUSH PRIVILEGES;
# ctrl+d 可以直接退出MySQL
# mysql的启动,关闭,状态查看 (这几个命令必须记住)
[root@node1 mysql]# systemctl stop mysqld
[root@node1 mysql]# systemctl status mysqld
[root@node1 mysql]# systemctl start mysqld
# 设置MySQL为开机自启
[root@node1 mysql]# systemctl enable mysqld
# 检查是否设置成功,有这么一行是enable就可以了
[root@node1 mysql]# systemctl list-unit-files | grep mysqld
mysqld.service enabled
- 补充——Centos7 干净卸载mysql 5.7
#关闭mysql服务
[root@node1 ~]# systemctl stop mysqld.service
#查找安装mysql的rpm包
[root@node1 ~]# rpm -qa | grep -i mysql
mysql-community-libs-5.7.29-1.el7.x86_64
mysql-community-common-5.7.29-1.el7.x86_64
mysql-community-client-5.7.29-1.el7.x86_64
mysql-community-server-5.7.29-1.el7.x86_64
#卸载
[root@node1 ~]# yum remove mysql-community-libs-5.7.29-1.el7.x86_64 mysql-community-common-5.7.29-1.el7.x86_64 mysql-community-client-5.7.29-1.el7.x86_64 mysql-community-server-5.7.29-1.el7.x86_64
#查看是否卸载干净
[root@node1 ~]# rpm -qa | grep -i mysql
#查找mysql相关目录 删除
[root@node1 ~]# find / -name mysql
/var/lib/mysql
/var/lib/mysql/mysql
/usr/share/mysql
[root@node1 ~]# rm -rf /var/lib/mysql
[root@node1 ~]# rm -rf /var/lib/mysql/mysql
[root@node1 ~]# rm -rf /usr/share/mysql
#删除默认配置 日志
[root@node1 ~]# rm -rf /etc/my.cnf
[root@node1 ~]# rm -rf /var/log/mysqld.log
2. 安装Hive
node1安装即可,因为Hive虽然不是分布式的软件,却具有分布式能力(借助Hadoop和其他分布式计算引擎)
- 上传、解压安装包
[root@node1 mysql]# cd ../../server/
[root@node1 server]# pwd
/export/server
[root@node1 server]# ls
apache-hive-3.1.2-bin.tar.gz hadoop-3.3.0 jdk1.8.0_241
[root@node1 server]# tar -zxvf apache-hive-3.1.2-bin.tar.gz
- 解决Hive与Hadoop之间guava版本差异
[root@node1 server]# cd apache-hive-3.1.2-bin/
[root@node1 apache-hive-3.1.2-bin]# rm -rf lib/guava-19.0.jar
[root@node1 apache-hive-3.1.2-bin]# cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar ./lib/
- 修改Hive的配置文件
- hive-env.sh
[root@node1 apache-hive-3.1.2-bin]# cd conf/
[root@node1 conf]# mv hive-env.sh.template hive-env.sh
[root@node1 conf]# vim hive-env.sh
export HADOOP_HOME=/export/server/hadoop-3.3.0
export HIVE_CONF_DIR=/export/server/apache-hive-3.1.2-bin/conf
export HIVE_AUX_JARS_PATH=/export/server/apache-hive-3.1.2-bin/lib
- hive-site.xml (新文件)
<configuration>
<!-- 存储元数据mysql相关配置 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node1:3306/hive3?createDatabaseIfNotExist=true&useSSL=false&useUnicode=true&characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hadoop</value>
</property>
<!-- H2S运行绑定host -->
<property>
<name>hive.server2.thrift.bind.host</name>
<value>node1</value>
</property>
<!-- 远程模式部署metastore metastore地址 -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://node1:9083</value>
</property>
<!-- 关闭元数据存储授权 -->
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
</configuration>
- 上传mysql jdbc驱动到hive的lib下
mysql-connector-java-5.1.32.jar
- 初始化元数据(确保Hadoop集群已经启动且健康可用)
[root@node1 conf]# cd ../
[root@node1 apache-hive-3.1.2-bin]# bin/schematool -initSchema --dbType mysql -verbos
#出现下面这两行表示初始化成功
Initialization script completed
schemaTool completed
#初始化成功会在mysql中创建74张表
- 在hdfs创建hive存储目录(如存在则不用操作)
[root@node1 conf]# hadoop fs -mkdir /tmp
[root@node1 conf]# hadoop fs -mkdir -p /user/hive/warehouse
[root@node1 conf]# [root@node1 conf]# hadoop fs -chmod g+w /tmp
[root@node1 conf]# hadoop fs -chmod g+w /user/hive/warehouse
3. 启动Hive
- 1、启动Metastore服务
#前台启动 关闭ctrl+c
[root@node1 conf]# /export/server/apache-hive-3.1.2-bin/bin/hive --service metastore
#前台启动开启debug日志
[root@node1 conf]# /export/server/apache-hive-3.1.2-bin/bin/hive --service metastore --hiveconf hive.root.logger=DEBUG,console
#后台启动 进程挂起 关闭使用jps+ kill -9
[root@node1 conf]# nohup /export/server/apache-hive-3.1.2-bin/bin/hive --service metastore &
[root@node1 apache-hive-3.1.2-bin]# jps
# 多出了一个RunJar进程
14958 RunJar
# 后台启动的日志在启动时的目录下的 nohup.out
[root@node1 apache-hive-3.1.2-bin]# cat nohup.out
- 2、启动HiveServer2服务
Hive 自带客户端
老版本 bin/hive
新版本 bin/beeline
启动HiveServer2之前必须先启动Metastore服务,服务启动到正常提供服务需要一定时间
[root@node1 conf]# nohup /export/server/apache-hive-3.1.2-bin/bin/hive --service hiveserver2 &
# 会出现两个RunJar
[root@node1 bin]# jps
3157 RunJar
2990 RunJar
- 3、beeline客户端连接
- 拷贝node1安装包到beeline客户端机器上(node3)
[root@node1 bin]# scp -r /export/server/apache-hive-3.1.2-bin/ root@node3:/export/server
- 4、第一代客户端访问
[root@node3 ~]# /export/server/apache-hive-3.1.2-bin/bin/hive
Logging initialized using configuration in jar:file:/export/server/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = ef5421b2-af79-486e-bd6f-d44ac5cd561f
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
- 5、第二代客户端访问
[root@node3 ~]# /export/server/apache-hive-3.1.2-bin/bin/beeline
beeline> ! connect jdbc:hive2://node1:10000
Enter username for jdbc:hive2://node1:10000: root
Enter password for jdbc:hive2://node1:10000: (不用输密码,直接回车)
- 6、如果发生如下错误
Error: Could not open client transport with JDBC Uri: jdbc:hive2://node1:10000: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root (state=08S01,code=0)
- 措施
在hadoop的配置文件core-site.xml中添加如下属性:
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
4. DataGrip连接hive
需要重新配置driver,导入hive的驱动