1.Hive Lateral View

Lateral View 用于和 UDTF 函数(explode、split)结合来使用。
首先通过 UDTF 函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。
主要解决在 select 使用 UDTF 做查询过程中,查询只能包含单个 UDTF,不能包含其 他字段、以及多个 UDTF 的问题
语法:

LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)
hive> select explode(likes) from person7;
OK
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
hive> select id,name,mycol1,mycol2,mycol3 from person7 lateral view explode(likes) mytb1 as mycol1 lateral view explode(address) mytb2 as mycol2,mycol3;
OK
1 小明 1 lol beijing xisanqi
1 小明 1 lol shanghai pudong
1 小明 1 book beijing xisanqi
1 小明 1 book shanghai pudong
1 小明 1 movie beijing xisanqi
1 小明 1 movie shanghai pudong
2 小明 2 lol beijing xisanqi
2 小明 2 lol shanghai pudong
2 小明 2 book beijing xisanqi
2 小明 2 book shanghai pudong
2 小明 2 movie beijing xisanqi
2 小明 2 movie shanghai pudong
3 小明 3 lol beijing xisanqi
3 小明 3 lol shanghai pudong
3 小明 3 book beijing xisanqi
3 小明 3 book shanghai pudong
3 小明 3 movie beijing xisanqi
3 小明 3 movie shanghai pudong
4 小明 4 lol beijing xisanqi
4 小明 4 lol shanghai pudong
4 小明 4 book beijing xisanqi
4 小明 4 book shanghai pudong
4 小明 4 movie beijing xisanqi
4 小明 4 movie shanghai pudong
5 小明 5 lol beijing xisanqi
hive> select count(distinct(mycol2)),count(distinct(mycol3)) from person7 lateral view explode(address) mytb2 as mycol2,mycol3;
Query ID = root_20211112124842_626f1f8e-0764-4306-ab9d-dc404771b732
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1636690643978_0001, Tracking URL = http://node3:8088/proxy/application_1636690643978_0001/
Kill Command = /opt/hadoop-2.6.5/bin/hadoop job -kill job_1636690643978_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-11-12 12:49:07,938 Stage-1 map = 0%, reduce = 0%
2021-11-12 12:49:29,804 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.54 sec
2021-11-12 12:49:46,263 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.42 sec
MapReduce Total cumulative CPU time: 5 seconds 420 msec
Ended Job = job_1636690643978_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.42 sec HDFS Read: 12719 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 420 msec
OK
2 3
Time taken: 66.565 seconds, Fetched: 1 row(s)
hive> create table yyy(
> id int,
> name string);
OK
Time taken: 2.269 seconds

2.Hive 视图

Hive Lateral View、视图与索引_hdfs
和关系型数据库中的普通视图一样,hive 也支持视图
特点:
 不支持物化视图
 只能查询,不能做加载数据操作
 视图的创建,只是保存一份元数据,查询视图时才执行对应的子查询
 view定义中若包含了ORDER BY/LIMIT语句,当查询视图时也进行ORDER BY/LIMIT 语句操作,view 当中定义的优先级更高
 view 支持迭代视图

#mysql 中支持视图删除:
CREATE VIEW v_users AS SELECT * FROM myusers;
DELETE FROM v_users WHERE id = '1316403900579872';

创建视图:

hive> create view v_psn as select * from person5;
OK
Time taken: 0.275 seconds

查询视图:
在对应元数据库中的 TBLS 中多出一条记录:
Hive Lateral View、视图与索引_hive_02
删除视图:

hive> drop view v_psn;
OK
Time taken: 1.22 seconds

3.Hive 索引

目的:优化查询以及检索性能
创建索引:

hive> create index t1_index
> on table person5(name)
> as "org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler"
> with deferred rebuild
> in table t1_index_table
> comment "table person5 name index comment";
OK
Time taken: 1.732 seconds

as:指定索引器;
in table:指定索引表,若不指定默认生成在 default__person5_t1_index__表中

重建索引(建立索引之后必须重建索引才能生效)

ALTER INDEX t1_index ON person5 REBUILD;

重建完毕之后,再次查询有索引数据:select * from t1_index_table;

hive> select * from t1_index_table;
OK
NULL hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [513] 10
小明 1 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [0] 10
小明 2 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [59] 10
小明 3 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [118] 10
小明 4 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [177] 10
小明 5 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [236] 10
小明 6 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [290] 10
小明 7 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [349] 10
小明 8 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [402] 10
小明 9 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [455] 10

删除索引

DROP INDEX IF EXISTS t1_index ON person5;