hive compact major hive compact major 分区表

转载

mob64ca13f63f2c 2023-10-02 19:34:56

文章标签 hive compact major hive 数据分区表 文章分类 Hive 大数据

文章目录

1. 分区表

1.1 创建分区表、加载数据
1.2 增加删除多个分区、查看表的分区
1.3 二级分区
1.4 把数据直接上传到分区目录上，让分区表和数据关联
1.5 动态分区

2. 分桶表

2.1 概念
2.2 创建分桶表、导入数据

3. 抽样查询

1. 分区表

1.1 创建分区表、加载数据

（1）创建分区表语法

hive (default)> create table dept_partition(
deptno int, dname string, loc string
)
partitioned by (day string)
row format delimited fields terminated by '\t';

注：分区字段不能是表中已经存在的字段，可将分区字段视为表的伪列。
（2）加载本地文件数据到指定分区

load data local inpath '/home/hdfs/data/dept/dept_20200401.log' into table dept_partition partition(day='20200401');

1.2 增加删除多个分区、查看表的分区

（1）增加分区

alter table dept_partition add partition(day='20200404');

alter table dept_partition add partition(day='20200406') partition(day='20200405');

注：无需,隔开
（2）删除分区

alter table dept_partition drop partition(day='20200406'),partition(day='20200405');

（3）查看分区表的分区

show partitions dept_partition;

hive compact major hive compact major 分区表_hive

1.3 二级分区

（1）创建分区表语法

create table dept_partition2(
               deptno int, dname string, loc string
               )
               partitioned by (day string, hour string)
               row format delimited fields terminated by '\t';

（2）加载本地文件数据到指定分区

load data local inpath '/home/hdfs/data/dept/dept_20200403.log' into table dept_partition2 partition(day='20200403',hour='12');

（3）查询分区数据

select * from dept_partition2 where day='20200403' and hour='12';

hive compact major hive compact major 分区表_hive compact major_02

1.4 把数据直接上传到分区目录上，让分区表和数据关联

msck repair table 表名命令用于修复表分区，常用于手动复制分区数据到hive表location下，此分区没有记录到Hive元数据，所以查不到数据

如：
上传数据：

dfs -mkdir -p /apps/hive/warehouse/dept_partition2/day=20200402/hour=12;

dfs -put /home/hdfs/data/dept/dept_20200402.log /apps/hive/warehouse/dept_partition2/day=20200402/hour=12;

查询不到结果：

hive compact major hive compact major 分区表_hive compact major_03

执行修复命令：

msck repair  table dept_partition2;

hive compact major hive compact major 分区表_hive compact major_04

查询数据：

hive compact major hive compact major 分区表_数据_05

1.5 动态分区

关系型数据库中，对分区表Insert数据时，数据库自动会根据分区字段的值，将数据插入到相应的分区中，Hive中提供了类似的机制，即动态分区(Dynamic Partition)

（1）开启动态分区参数设置

开启动态分区功能（默认true，开启）

hive.exec.dynamic.partition=true

设置为非严格模式（动态分区的模式，默认strict，表示必须指定至少一个分区为静态分区，nonstrict模式表示允许所有的分区字段都可以使用动态分区。）

hive.exec.dynamic.partition.mode=nonstrict

在所有执行MR的节点上，最大一共可以创建多少个动态分区。默认1000

hive.exec.max.dynamic.partitions=1000

在每个执行MR的节点上，最大可以创建多少个动态分区。该参数需要根据实际的数据来设定。如：源数据中包含了一年的数据，即day字段有365个值，那么该参数就需要设置成大于365，如果使用默认值100，则会报错。

hive.exec.max.dynamic.partitions.pernode=400

整个MR Job中，最大可以创建多少个HDFS文件。默认100000

hive.exec.max.created.files=100000

当有空分区生成时，是否抛出异常。一般不需要设置。默认false

hive.error.on.empty.partition=false

（2）案例

hive compact major hive compact major 分区表_数据_06

创建目标分区表

create table dept_partition_dy(id int, name string) partitioned by (loc int) row format delimited fields terminated by '\t';

设置动态分区

insert into table dept_partition_dy partition(loc) select deptno, dname, loc from dept;

hive compact major hive compact major 分区表_hive_07

查看目标分区表的分区情况

show partitions dept_partition;

hive compact major hive compact major 分区表_hive_08

2. 分桶表

2.1 概念

并非所有的数据集都可形成合理的分区。对于一张表或者分区，Hive 可以进一步组织成桶，进行更为细粒度的数据范围划分。分区针对的是数据的存储路径；分桶针对的是数据文件

2.2 创建分桶表、导入数据

创建分桶表：

create table stu_bucket(id int, name string)
clustered by(id) 
into 4 buckets
row format delimited fields terminated by '\t';

直接使用Load语句向分桶表加载数据，数据时可以加载成功的，但是数据不会分桶。这是由于分桶的实质是对指定字段做了hash散列然后存放到对应文件中，这意味着向分桶表中插入数据是必然要通过MR，且Reducer的数量必须等于分桶的数量。

insert into table stu_bucket select * from stu_insert;

hive compact major hive compact major 分区表_分区表_09

hive compact major hive compact major 分区表_hive compact major_10

3. 抽样查询

对于非常大的数据集，有时用户需要使用的是一个具有代表性的查询结果而不是全部结果。Hive可以通过对表进行抽样来满足这个需求。语法: TABLESAMPLE(BUCKET x OUT OF y)

select * from stu_bucket tablesample(bucket 1 out of 4 on id);

hive compact major hive compact major 分区表_hive_11

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：jquery 让函数执行完后再执行另一个函数 javascript执行函数

下一篇：IOS 正则表达式不支持前瞻后顾正则表达式前端

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯