Hive Lateral View、视图与索引

原创

wx62be9d88ce294 2022-07-01 20:47:56 博主文章分类：大数据 ©著作权

文章标签 hive hadoop 数据仓库 hdfs mapreduce 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者wx62be9d88ce294的原创作品，请联系作者获取转载授权，否则将追究法律责任

1.Hive Lateral View

Lateral View 用于和 UDTF 函数（explode、split）结合来使用。
首先通过 UDTF 函数拆分成多行，再将多行结果组合成一个支持别名的虚拟表。
主要解决在 select 使用 UDTF 做查询过程中，查询只能包含单个 UDTF，不能包含其他字段、以及多个 UDTF 的问题
语法：

LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

hive> select explode(likes) from person7;
OK
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
hive> select id,name,mycol1,mycol2,mycol3 from person7 lateral view explode(likes) mytb1 as mycol1 lateral view explode(address) mytb2 as mycol2,mycol3;
OK
1 小明 1  lol beijing xisanqi
1 小明 1  lol shanghai  pudong 
1 小明 1  book  beijing xisanqi
1 小明 1  book  shanghai  pudong 
1 小明 1  movie beijing xisanqi
1 小明 1  movie shanghai  pudong 
2 小明 2  lol beijing xisanqi
2 小明 2  lol shanghai  pudong 
2 小明 2  book  beijing xisanqi
2 小明 2  book  shanghai  pudong 
2 小明 2  movie beijing xisanqi
2 小明 2  movie shanghai  pudong 
3 小明 3  lol beijing xisanqi
3 小明 3  lol shanghai  pudong 
3 小明 3  book  beijing xisanqi
3 小明 3  book  shanghai  pudong 
3 小明 3  movie beijing xisanqi
3 小明 3  movie shanghai  pudong 
4 小明 4  lol beijing xisanqi
4 小明 4  lol shanghai  pudong 
4 小明 4  book  beijing xisanqi
4 小明 4  book  shanghai  pudong 
4 小明 4  movie beijing xisanqi
4 小明 4  movie shanghai  pudong 
5 小明 5  lol beijing xisanqi
hive> select count(distinct(mycol2)),count(distinct(mycol3)) from person7 lateral view explode(address) mytb2 as mycol2,mycol3;
Query ID = root_20211112124842_626f1f8e-0764-4306-ab9d-dc404771b732
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1636690643978_0001, Tracking URL = http://node3:8088/proxy/application_1636690643978_0001/
Kill Command = /opt/hadoop-2.6.5/bin/hadoop job  -kill job_1636690643978_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-11-12 12:49:07,938 Stage-1 map = 0%,  reduce = 0%
2021-11-12 12:49:29,804 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.54 sec
2021-11-12 12:49:46,263 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.42 sec
MapReduce Total cumulative CPU time: 5 seconds 420 msec
Ended Job = job_1636690643978_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.42 sec   HDFS Read: 12719 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 420 msec
OK
2 3
Time taken: 66.565 seconds, Fetched: 1 row(s)
hive> create table yyy(
    > id int,
    > name string);
OK
Time taken: 2.269 seconds

2.Hive 视图

Hive Lateral View、视图与索引_hdfs
和关系型数据库中的普通视图一样，hive 也支持视图
特点：
 不支持物化视图
 只能查询，不能做加载数据操作
 视图的创建，只是保存一份元数据，查询视图时才执行对应的子查询
 view定义中若包含了ORDER BY/LIMIT语句，当查询视图时也进行ORDER BY/LIMIT 语句操作，view 当中定义的优先级更高
 view 支持迭代视图

#mysql 中支持视图删除：
 CREATE VIEW v_users AS SELECT * FROM myusers; 
 DELETE FROM v_users WHERE id = '1316403900579872';

创建视图：

hive> create view v_psn as select * from person5;
OK
Time taken: 0.275 seconds

查询视图：
在对应元数据库中的 TBLS 中多出一条记录：
Hive Lateral View、视图与索引_hive_02
删除视图：

hive> drop view v_psn;
OK
Time taken: 1.22 seconds

3.Hive 索引

目的：优化查询以及检索性能
创建索引：

hive> create index t1_index
    > on table person5(name)
    > as "org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler"
    > with deferred rebuild
    > in table t1_index_table
    > comment "table person5 name index comment";
OK
Time taken: 1.732 seconds

as：指定索引器；
in table：指定索引表，若不指定默认生成在 default__person5_t1_index__表中

重建索引（建立索引之后必须重建索引才能生效）

ALTER INDEX t1_index ON person5 REBUILD;

重建完毕之后，再次查询有索引数据：select * from t1_index_table;

hive> select * from t1_index_table;
OK
NULL  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [513] 10
小明 1  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [0] 10
小明 2  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [59]  10
小明 3  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [118] 10
小明 4  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [177] 10
小明 5  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [236] 10
小明 6  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [290] 10
小明 7  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [349] 10
小明 8  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [402] 10
小明 9  hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [455] 10

删除索引

DROP INDEX IF EXISTS t1_index ON person5;

上一篇：字符流实现文本文件的拷贝处理

下一篇：浅拷贝和深拷贝

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯