hive数据库系统表 hive数据库开发规范

转载

karen 2023-07-13 15:30:14

文章标签 hive数据库系统表大数据 hive 元数据 mysql 文章分类 Hive 大数据

文章目录

一.Hive简介
二.Hive的特点
三.Hive使用场景
四.Hive命令
五.Hive参数

5.1 Hivevar参数示例
5.2 Hiveconf参数示例
5.3 System变量
5.4 Env变量

六.Hive系统架构

6.1 Hive接口
6.2 元数据存储
6.3 驱动器（Driver）

一.Hive简介

Hive是基于Hadoop的一个数据仓库工具。
可以将结构化的数据文件映射为一张数据库表，并提供简单的类SQL(HQL)查询功能，可以将HQL语句转换为MapReduce任务进行运行。
学习成本低，可以通过类SQL语句快速实现简单的MapReduce统计，不必开发与门的MapReduce应用。
适合数据仓库的ETL和统计分析。

二.Hive的特点

简单易用
基于SQL表达式语法，兼容大部分SQL-92语义和部分SQL-2003扩展语义
可扩展
Hive基于Hadoop实现，可以自由的扩展集群的规模，一般情况下丌需要重启服务
延展性
Hive支持用户自定义函数，用户可以根据自己的需求来实现自己的函数
容错性
Hadoop良好的容错性，节点出现问题SQL仍可完成执行

三.Hive使用场景

最佳使用场合
大数据集的批处理作业，例如：网络日志分析
不适用于
不能在大规模数据集上实现低延迟快速的查询，例如：Hive 在几百MB 的数据集上执行查询一般有分钟级的时间延迟。
不支持联机事务处理（OLTP）
Hive 不提供基于行级的数据更新操作（2.0版本开始支持Update）

四.Hive命令

hive命令如下图所示:

hive数据库系统表 hive数据库开发规范_hive数据库系统表

测试记录:

hive> set mapred.reduce.tasks=32;
  hive> set;
  hive> select a.* from tab1;
  hive> !ls;
  hive> dfs -ls;

Hive CLI使用

[root@hp1 libext]# hive -H
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
usage: hive
 -d,--define <key=value>          Variable subsitution to apply to hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable subsitution to apply to hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)

Hive执行sql脚本

hive -e 'select count(*) from test.emp'
hive -f a.sql

测试记录:

[root@hp1 ~]# more a.sql 
select count(*) from test.emp;
[root@hp1 ~]# 
[root@hp1 ~]# hive -f a.sql
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-common-2.1.1-cdh6.3.1.jar!/hive-log4j2.properties Async: false
Query ID = root_20201127172235_292c9382-f5e4-46fd-ab9b-5660eeba4f44
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/11/27 17:22:38 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1606444720955_0003, Tracking URL = http://hp3:8088/proxy/application_1606444720955_0003/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606444720955_0003
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2020-11-27 17:22:46,594 Stage-1 map = 0%,  reduce = 0%
2020-11-27 17:22:51,815 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.99 sec
2020-11-27 17:22:52,850 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.91 sec
2020-11-27 17:22:58,031 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.93 sec
MapReduce Total cumulative CPU time: 5 seconds 930 msec
Ended Job = job_1606444720955_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2  Reduce: 1   Cumulative CPU: 5.93 sec   HDFS Read: 14178 HDFS Write: 102 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 930 msec
OK
14
Time taken: 23.234 seconds, Fetched: 1 row(s)

五.Hive参数

hive参数概述:

hive数据库系统表 hive数据库开发规范_元数据_02

5.1 Hivevar参数示例

代码:

hive -d foo=bar
set foo;
set hivevar:empno=7369;
select * from test.emp where empno=${hivevar:empno};

测试记录:

[root@hp1 ~]# hive -d foo=bar
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-common-2.1.1-cdh6.3.1.jar!/hive-log4j2.properties Async: false

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> set foo;
foo=bar
    > set hivevar:empno=7369;
hive> select * from test.emp where empno=${hivevar:empno};
Query ID = root_20201127173211_9f830148-5427-4b15-b356-32d2ab9b3c74
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
20/11/27 17:32:12 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1606444720955_0004, Tracking URL = http://hp3:8088/proxy/application_1606444720955_0004/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606444720955_0004
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-11-27 17:32:20,448 Stage-1 map = 0%,  reduce = 0%
2020-11-27 17:32:26,683 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.61 sec
MapReduce Total cumulative CPU time: 6 seconds 610 msec
Ended Job = job_1606444720955_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.61 sec   HDFS Read: 11789 HDFS Write: 231 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 610 msec
OK
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
Time taken: 17.64 seconds, Fetched: 1 row(s)

5.2 Hiveconf参数示例

代码:
···
hive --hiveconf hive.cli.print.header=true
select * from test.emp where empno=7369;
set hive.cli.print.header;
set hive.cli.print.header=false;
set hive.cli.print.header;
select * from test.emp where empno=7369;
···

测试记录:

[root@hp1 ~]# hive --hiveconf hive.cli.print.header=true
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-common-2.1.1-cdh6.3.1.jar!/hive-log4j2.properties Async: false

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> 
    > select * from test.emp where empno=7369;
Query ID = root_20201127173938_ca672a76-b8d5-4281-86a1-f8305b202ad8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
20/11/27 17:39:40 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1606444720955_0005, Tracking URL = http://hp3:8088/proxy/application_1606444720955_0005/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606444720955_0005
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-11-27 17:39:48,730 Stage-1 map = 0%,  reduce = 0%
2020-11-27 17:39:54,961 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.4 sec
MapReduce Total cumulative CPU time: 6 seconds 400 msec
Ended Job = job_1606444720955_0005
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.4 sec   HDFS Read: 11789 HDFS Write: 231 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 400 msec
OK
emp.empno       emp.ename       emp.job emp.mgr emp.hiredate    emp.sal emp.comm        emp.deptno
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
Time taken: 17.515 seconds, Fetched: 1 row(s)
hive> 
    > set hive.cli.print.header;
hive.cli.print.header=true
hive> set hive.cli.print.header=false;
hive> set hive.cli.print.header;
hive.cli.print.header=false
hive> select * from test.emp where empno=7369;
Query ID = root_20201127174246_26ec846e-1d16-4cf2-a9d4-6436632dc587
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
20/11/27 17:42:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1606444720955_0006, Tracking URL = http://hp3:8088/proxy/application_1606444720955_0006/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606444720955_0006
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-11-27 17:42:53,878 Stage-1 map = 0%,  reduce = 0%
2020-11-27 17:43:01,096 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.51 sec
MapReduce Total cumulative CPU time: 6 seconds 510 msec
Ended Job = job_1606444720955_0006
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.51 sec   HDFS Read: 11969 HDFS Write: 231 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 510 msec
OK
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
Time taken: 15.72 seconds, Fetched: 1 row(s)
hive>

5.3 System变量

代码:

set system:user.name;
set system:user.name=7369;
set system:user.name;
select * from test.emp where empno=${system:user.name};

测试记录:

[root@hp1 ~]# hive
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-common-2.1.1-cdh6.3.1.jar!/hive-log4j2.properties Async: false

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> set system:user.name;
system:user.name=root
hive> set system:user.name=7369;
hive> set system:user.name;
system:user.name=7369
hive> select * from test.emp where empno=${system:user.name};
Query ID = 7369_20201127180045_9f902d3a-070d-4e12-8641-38063197eafc
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
20/11/27 18:00:47 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1606444720955_0010, Tracking URL = http://hp3:8088/proxy/application_1606444720955_0010/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606444720955_0010
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-11-27 18:00:55,955 Stage-1 map = 0%,  reduce = 0%
2020-11-27 18:01:02,281 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 3.11 sec
2020-11-27 18:01:03,313 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.29 sec
MapReduce Total cumulative CPU time: 6 seconds 290 msec
Ended Job = job_1606444720955_0010
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.29 sec   HDFS Read: 11789 HDFS Write: 231 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 290 msec
OK
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
Time taken: 19.033 seconds, Fetched: 1 row(s)
hive>

5.4 Env变量

代码:

export empno=7369
hive
set env:empno;
select * from test.emp where empno=${env:empno};

测试记录:

[root@hp1 ~]# export empno=7369
[root@hp1 ~]# hive
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/jars/hive-common-2.1.1-cdh6.3.1.jar!/hive-log4j2.properties Async: false

WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> set env:empno;
env:empno=7369
hive> select * from test.emp where empno=${env:empno};
Query ID = root_20201127180306_e2344df7-49f4-4746-9c60-6f62f81d36bb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
20/11/27 18:03:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1606444720955_0011, Tracking URL = http://hp3:8088/proxy/application_1606444720955_0011/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606444720955_0011
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-11-27 18:03:16,404 Stage-1 map = 0%,  reduce = 0%
2020-11-27 18:03:22,675 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.41 sec
MapReduce Total cumulative CPU time: 6 seconds 410 msec
Ended Job = job_1606444720955_0011
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 6.41 sec   HDFS Read: 11789 HDFS Write: 231 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 410 msec
OK
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
Time taken: 17.48 seconds, Fetched: 1 row(s)
hive>

六.Hive系统架构

Hive系统架构图

hive数据库系统表 hive数据库开发规范_元数据_03

6.1 Hive接口

用户接口主要有三个：CLI，ThriftServer和HWI。
最常用的是CLI，Cli启动的时候，会同时启动一个Hive Driver。
ThriftServer是以Thrift协议封装的Hive服务化接口，可以提供跨语言的访问如Python、C++等。并实现了JDBC/ODBC协议
HWI 提供了一个基于浏览器访问Hive的途径。

6.2 元数据存储

Hive 将元数据存储在数据库中，如mysql、oracle、derby。
Hive 中的元数据包括表的名字，表的列和分区及其属性，表的属性（是否为外部表等），表的数据所在目录等

6.3 驱动器（Driver）

编译器
完成词法分析、语法分析，将HQL查询解析成AST
AST生成逻辑执行计划
逻辑执行计划生成物理MR执行计划
优化器
对逻辑执行计划进行优化
对物理执行计划进行优化
执行器
生成的物理执行计划转变成MR Job
提交到Hadoop上面执行

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：cluster可靠性 redis redis keys*性能问题解决方案

下一篇：hive是什么网 hive 什么意思

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯