hadoop上安装hive2.3.2学习总结—hive安装+mysql以及碰到坑点

原创

wx604f04a92c6fd 2022-09-27 11:28:08 博主文章分类：hadoop ©著作权

文章标签 hive2.3安装 hadoop hive2.3部署 hive2.3部署 hive2.3 mysql 安装hive2.3 文章分类 后端开发

©著作权归作者所有：来自51CTO博客作者wx604f04a92c6fd的原创作品，请联系作者获取转载授权，否则将追究法律责任

一、环境准备

hapdoop版本：基于hadoop2.8.1，本教程是基于hadoop2.8.1上安装的hive

hive版本：apache-hive-2.3.2-bin.tar.gz

二、安装配置hive+mysql（远程模式）

1、首先安装好mysql数据库

2、去hive官网下载hive安装包：apache-hive-2.3.2-bin.tar.gz

tar -zxvf apache-hive-2.3.2-bin.tar.gz
cd apache-hive-2.3.2-bin
[chenxun@chen.local 22:13 ~/software/apache-hive-2.3.2-bin]$ll
total 6840
-rw-r--r--    1 chenxun  staff    20798 Nov 10 00:26 LICENSE
-rw-r--r--    1 chenxun  staff      230 Nov 10 00:26 NOTICE
-rw-r--r--    1 chenxun  staff     1979 Nov 10 00:58 RELEASE_NOTES.txt
drwxr-xr-x   13 chenxun  staff      442 Jan 28 01:14 bin
drwxr-xr-x   21 chenxun  staff      714 Jan 25 22:08 binary-package-licenses
drwxr-xr-x   13 chenxun  staff      442 Jan 28 01:47 conf
drwxr-xr-x    4 chenxun  staff      136 Jan 25 22:08 examples
drwxr-xr-x    7 chenxun  staff      238 Jan 25 22:08 hcatalog
drwxr-xr-x    3 chenxun  staff      102 Jan 25 22:08 jdbc
drwxr-xr-x  256 chenxun  staff     8704 Jan 25 22:41 lib
drwxr-xr-x    9 chenxun  staff      306 Nov 22 15:31 mysql-connector-java-5.1.45
-rw-r--r--@   1 chenxun  staff  3467861 Nov 22 14:31 mysql-connector-java-5.1.45.tar.gz
drwxr-xr-x    4 chenxun  staff      136 Jan 25 22:08 scripts
drwxr-xr-x   16 chenxun  staff      544 Jan 28 02:03

与 Hadoop 类似，Hive 也有 3 种运行模式：

内嵌模式
将元数据保存在本地内嵌的 Derby 数据库中，这是使用 Hive 最简单的方式。但是这种方式缺点也比较明显，因为一个内嵌的 Derby 数据库每次只能访问一个数据文件，这也就意味着它不支持多会话连接。
本地模式
这种模式是将元数据保存在本地独立的数据库中（一般是 MySQL），这用就可以支持多会话和多用户连接了。
远程模式
此模式应用于 Hive 客户端较多的情况。把 MySQL 数据库独立出来，将元数据保存在远端独立的 MySQL 服务中，避免了在每个客户端都安装 MySQL 服务从而造成冗余浪费的情况。

hive的配置文件在conf目录下面：

cp hive-default.xml.template hive-site.xml

vim hive-site.xml文件：

1、把{system:java.io.tmpdir} 改成 /Users/chenxun/software/apache-hive-2.3.2-bin/tmp

/Users/chenxun/software/apache-hive-2.3.2-bin/tmp 这个路劲是自己建立一个路劲

2、把所有{system:user.name} 改成 {user.name}

3、配置mysql

找到下面的xml选项依次修改value

其中连接mysql的数据库的密码和用户都是hive

其中hivedb是创建的数据库名字，特别注意配置前后要一直

<name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://172.16.110.140:3306/hivedb?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>

<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>

<name>javax.jdo.option.ConnectionUserName</name>
 <value>hive</value>

<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>

在mysql端执行如下命令授权hive用户以及密码登陆访问权限;

create database
grant all on hivedb.* to hive@'%'  identified by 'hive';
grant all on hivedb.* to hive@'localhost'  identified by 'hive';
flush privileges;

下载jdbc connector：(mysql-connector-java-5.1.45-bin.jar放到lib目录下) https://dev.mysql.com/downloads/connector/j/

wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.45.tar.gz

tar -zxvf mysql-connector-java-5.1.45

mv mysql-connector-java-5.1.45-bin.jar  ../lib

执行初始化工作：在bin目录下执行下面的命令

schematool

然后在mysql端可以查看是否成功：

use hivedb

show tables

如果你按照本文下面的方法把t1和t2表创建成功你可以查看其中的元数据的一些信息：

mysql> select * from TBLS;
+--------+-------------+-------+------------------+---------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER   | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+---------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
|      1 |  1516893558 |     1 |                0 | chenxun |         0 |     1 | t1       | MANAGED_TABLE | NULL               | NULL               |                    |
|      6 |  1516969402 |     1 |                0 | chenxun |         0 |     6 | t2       | MANAGED_TABLE | NULL               | NULL               |                    |
+--------+-------------+-------+------------------+---------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
2 rows in set (0.00 sec)

mysql> select * from DBS;
+-------+-----------------------+-----------------------------------------------------------+--------------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                                           | NAME         | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+-----------------------------------------------------------+--------------+------------+------------+
|     1 | Default Hive database | hdfs://localhost:9000/user/hive/warehouse                 | default      | public     | ROLE       |
|     6 | NULL                  | hdfs://localhost:9000/user/hive/warehouse/db_hive_test.db | db_hive_test | chenxun    | USER       |
+-------+-----------------------+-----------------------------------------------------------+--------------+------------+------------+

在hive端建立表t1和t2

hive>CREATE TABLE t1(id int); // 创建内部表t1，只有一个int类型的id字段

hive>CREATE TABLE t2(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; // 创建内部表t2，有两个字段，它们之间通过tab分隔

其中t1.txt和t2.txt文件内容是：（在本地建立好这两个两个文件）

[root@master temp]# cat t1.txt
1      
2      
3      
4      
5      
6      
7      
9

[root@master temp]# cat t2.txt
1   a
2   b
3   c
9

把t2.txt传到hadoop目录下：

[chenxun@chen.local 23:12 ~]$hadoop fs -put -f /Users/chenxun/t2.txt /
[chenxun@chen.local 23:12 ~]$hadoop fs -ls /
Found 3 items
-rw-r--r--   1 chenxun supergroup         16 2018-01-28 23:09

加载数据到hive:

hive>LOAD DATA LOCAL INPATH '/Users/chenxun/t1.txt' INTO TABLE t1; // 从本地文件加载
hive>LOAD DATA INPATH '/t2.txt' INTO TABLE t1; // 从HDFS中加载

三、启动服务hiveserver2（如果碰到问题和错误后面有解决方法）

配置hive中hiveserver2选项：

<name>hive.server2.thrift.bind.host</name>
 <value>127.0.0.1<value/>

<name>hive.server2.thrift.port</name>
 <value>10000</value>

在后台启动hiveserver2服务：

hive --service hiveserver2 &

启动beeline：只用!connect jdbc:hive2://127.0.0.1:10000连接hive

[chenxun@chen.local 23:24 ~/software/apache-hive-2.3.2-bin/bin]$beeline 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/chenxun/software/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/chenxun/software/hadoop-2.8.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.2 by Apache Hive
beeline> !connect jdbc:hive2://127.0.0.1:10000 
Connecting to jdbc:hive2://127.0.0.1:10000
Enter username for jdbc:hive2://127.0.0.1:10000: 
Enter password for jdbc:hive2://127.0.0.1:10000: 
Connected to: Apache Hive (version 2.3.2)
Driver: Hive JDBC (version 2.3.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://127.0.0.1:10000> show databases;
OK
+----------------+
| database_name  |
+----------------+
| db_hive_test   |
| default        |
+----------------+
2 rows selected (2.367 seconds)
0: jdbc:hive2://127.0.0.1:10000> select * from t1
. . . . . . . . . . . . . . . .> ;
OK
+--------+
| t1.id  |
+--------+
| 1      |
| 2      |
| 3      |
| 4      |
| 5      |
| 6      |
| 7      |
| 9      |
| NULL   |
+--------+
9 rows selected (2.416 seconds)
0: jdbc:hive2://127.0.0.1:10000> select * from t2;
OK
+--------+----------+
| t2.id  | t2.name  |
+--------+----------+
| 1      | a        |
| 2      | b        |
| 3      | c        |
| 9      | x        |
+--------+----------+

此时可以用一些简单的查询语句来查询hive,但是为了生成MapReduce作业,我们将语句写得稍微复杂些:

0: jdbc:hive2://127.0.0.1:10000> select t2.name from t1 left join t2 on t1.id = t2.id;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = chenxun_20180128232948_92e7d150-7613-43e0-9e14-fefcecdd3aff
Total jobs = 1
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/chenxun/software/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/chenxun/software/hadoop-2.8.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/chenxun/software/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/chenxun/software/hadoop-2.8.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2018-01-28 23:29:55 Starting to launch local task to process map join;  maximum memory = 477626368
2018-01-28 23:29:55 Starting to launch local task to process map join;  maximum memory = 477626368
2018-01-28 23:29:56 Dump the side-table for tag: 1 with group count: 4 into file: file:/Users/chenxun/software/apache-hive-2.3.2-bin/tmp/chenxun/cc45ef96-0ff8-4482-aca9-8552cba0b159/hive_2018-01-28_23-29-48_650_9212450484633339980-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2018-01-28 23:29:57 Uploaded 1 File to: file:/Users/chenxun/software/apache-hive-2.3.2-bin/tmp/chenxun/cc45ef96-0ff8-4482-aca9-8552cba0b159/hive_2018-01-28_23-29-48_650_9212450484633339980-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (344 bytes)
2018-01-28 23:29:57 End of local task; Time Taken: 1.493 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
2018-01-28 23:29:56 Dump the side-table for tag: 1 with group count: 4 into file: file:/Users/chenxun/software/apache-hive-2.3.2-bin/tmp/chenxun/cc45ef96-0ff8-4482-aca9-8552cba0b159/hive_2018-01-28_23-29-48_650_9212450484633339980-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2018-01-28 23:29:57 Uploaded 1 File to: file:/Users/chenxun/software/apache-hive-2.3.2-bin/tmp/chenxun/cc45ef96-0ff8-4482-aca9-8552cba0b159/hive_2018-01-28_23-29-48_650_9212450484633339980-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (344 bytes)
2018-01-28 23:29:57 End of local task; Time Taken: 1.493 sec.
Starting Job = job_1517075946129_0001, Tracking URL = http://chen.local:8088/proxy/application_1517075946129_0001/
Kill Command = /Users/chenxun/software/hadoop-2.8.1/bin/hadoop job  -kill job_1517075946129_0001
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2018-01-28 23:30:17,415 Stage-3 map = 0%,  reduce = 0%
2018-01-28 23:30:26,953 Stage-3 map = 100%,  reduce = 0%
Ended Job = job_1517075946129_0001
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1   HDFS Read: 5584 HDFS Write: 218 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
+----------+
| t2.name  |
+----------+
| a        |
| b        |
| c        |
| NULL     |
| NULL     |
| NULL     |
| NULL     |
| x        |
| NULL     |
+----------+
9 rows selected (40.835 seconds)
0: jdbc:hive2://127.0.0.1:10000>

三、碰到的坑点：

1、使用HiveServer2 and Beeline模式运行时，启动好HiveServer2后运行碰到下面的错误

java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): 
User root is not allowed to impersonate anonymous 错误。

在hadoop目录etc/hadoop目录下core-site.xml中添加下面的内容赋予用户权限

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

因为我的用户是root所以是hadoop.proxyuser.root.hosts和hadoop.proxyuser.root.groups

如果提示是其他用户比如说是chen 那么就要把上面内容中的root改成chen

2、错误2：!connect jdbc:hive2://127.0.0.1:10000出现错误

Error: Could not open client transport with JDBC Uri: jdbc:hive2://127.0.0.1:10000: Failed to open new session:

Permission denied: user=anonymous, access=EXECUTE, inode=”/tmp”
由于Hive没有hdfs:/tmp目录的权限，赋权限即可：
hadoop fs  -chmod -R 777

beeline> !connect jdbc:hive2://127.0.0.1:10000 
Connecting to jdbc:hive2://127.0.0.1:10000
Enter username for jdbc:hive2://127.0.0.1:10000: 
Enter password for jdbc:hive2://127.0.0.1:10000: 
18/01/28 01:59:20 [main]: WARN jdbc.HiveConnection: Failed to connect to 127.0.0.1:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://127.0.0.1:10000: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp":chenxun:supergroup:drwx------
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:310)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:271)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:206)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:189)
  at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:499)
  at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1603)
  at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1621)
  at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:542)
  at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:110)
  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2929)
  at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1106)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:858)
  at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455) (state=08S01,code=0)

上一篇：go lang学习笔记——channel机理及调度理解

下一篇：Spark RDD使用详解--RDD原理

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯