Hive分区表与分桶

原创

吃果冻不吐果冻皮 2022-02-17 17:17:17 ©著作权

文章标签 分区表 hive 分桶细粒度 文章分类 代码人生

©著作权归作者所有：来自51CTO博客作者吃果冻不吐果冻皮的原创作品，请联系作者获取转载授权，否则将追究法律责任

分区表

在Hive Select查询中，一般会扫描整个表内容，会消耗很多时间做没必要的工作。

分区表指的是在创建表时，指定partition的分区空间。

分区语法

create table tablename
name string
)
partitioned by(key type,…）

create table if not exists employees(
name string,
salary string,
subordinates array<string>,
deductions map<string,float>,
address struct<street:string,city:string,state:string,zip:int>
)
partitioned by (dt string,type string)
row format delimited fields terminated by '\t' 
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n' 
stored as textfile
;

Hive分区表与分桶_分区表

分区表操作

增加分区

Alter table employees add if not exists partition(country='xxx'[,state='yyyy'])

Alter table employees add if not exists partition(dt='20140715',type='test');

Hive分区表与分桶_分区表_02

删除分区

Alter table employees drop if exists partition(country='xxx'[,state='yyyy’)

Hive分桶

对于每一个表（table）或者分区，Hive可以进一步组织成桶，也就是说捅是更为细粒度的数据范困划分。

Hive是针对某一列进行分捅。

Hive采用对列值哈希，然后除以捅的个数求余的方式决定该条记录存放在哪个桶当中。

好处

获得更高的查询处理效率。

使取样（sampling）更高效

分桶语法

create table bucketed_user(
id string ,
name string
)
clustered by (id) sorted by (name) into 4 buckets
row format delimited fields terminated by '\t' 
stored as textfile;

设置

set hive.enforce.bucketing = true;

插入数据

insert overwrite table bucketed_user select addr ,name from testtable;

Hive分区表与分桶_hive_03

Hive分区与分桶比较

Hive分区表与分桶_分区表_04

上一篇：Hive Cilent数据操作

下一篇：Hive简述及几种访问方式

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯