hive怎么统计每个分区的数据行数 hive分区表查询数据

转载

mob6454cc73e9a6 2023-09-18 16:54:45

文章标签 hive怎么统计每个分区的数据行数分区表 hive 数据 文章分类 Hive 大数据

文章目录

1.概念
2.创建分区表
3.加载数据到分区表
4.分区数据查询
5.添加分区
6.删除分区
7.查看分区表有多少个分区
8.二级分区表
9.把数据直接上传到分区目录上，让分区表和数据产生关联的三种方式

1.概念

hive的分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。

2.创建分区表

create table dept_partition(
id int, name string
)
partitioned by (month string)
row format delimited fields terminated by '\t'
stored as textfile;

hive怎么统计每个分区的数据行数 hive分区表查询数据_hive

3.加载数据到分区表

在/home/hive下有一个srcdata.txt:

hive怎么统计每个分区的数据行数 hive分区表查询数据_分区表_02

加载数据：

load data local inpath '/home/hive/srcdata.txt' into table default.dept_partition partition(month='202001');

hive怎么统计每个分区的数据行数 hive分区表查询数据_hive怎么统计每个分区的数据行数_03

hive怎么统计每个分区的数据行数 hive分区表查询数据_hive_04

4.分区数据查询

单个分区查询：

select * from dept_partition where month = '202001';

hive怎么统计每个分区的数据行数 hive分区表查询数据_hive_05

多分区联合查询：

select * from dept_partition where month = '202001'
 union
 select * from dept_partition where month = '202002';

hive怎么统计每个分区的数据行数 hive分区表查询数据_分区表_06

5.添加分区

创建单个分区：

alter table dept_partition add partition(month='202008');

创建多个分区：

alter table dept_partition add partition(month='202005') partition(month='202006');

hive怎么统计每个分区的数据行数 hive分区表查询数据_分区表_07

6.删除分区

删除单个分区

alter table dept_partition drop partition (month='202008');

hive怎么统计每个分区的数据行数 hive分区表查询数据_分区表_08

删除多个分区

alter table dept_partition drop partition(month='202005'),partition(month='202006');

hive怎么统计每个分区的数据行数 hive分区表查询数据_分区表_09

注意：

删除分区的时候多个分区需要加上逗号，添加多个分区的时候则不需要。

7.查看分区表有多少个分区

show partitions dept_partition;

hive怎么统计每个分区的数据行数 hive分区表查询数据_hive怎么统计每个分区的数据行数_10

8.查看分区表结构

show partitions dept_partition;

hive怎么统计每个分区的数据行数 hive分区表查询数据_数据_11

8.二级分区表

创建二级分区表：

create table dept_partition2(
id int, name string
)
partitioned by (month string, day string)
row format delimited fields terminated by '\t'
stored as textfile;

hive怎么统计每个分区的数据行数 hive分区表查询数据_数据_12

加载数据到分区表

load data local inpath '/home/hive/srcdata.txt' into table dept_partition2 partition(month='202011', day='01');

hive怎么统计每个分区的数据行数 hive分区表查询数据_数据_13

hive怎么统计每个分区的数据行数 hive分区表查询数据_数据_14

查询分区数据

select * from dept_partition2 where month='202011' and day='01';

hive怎么统计每个分区的数据行数 hive分区表查询数据_数据_15

9.把数据直接上传到分区目录上，让分区表和数据产生关联的三种方式

上传数据后修复

先创建分区表：

create table dept_partition2(
id int, name string
)
partitioned by (month string, day string)
row format delimited fields terminated by '\t'
stored as textfile;

添加分区：

alter table dept_partition2 add partition(month='202001',day='01');

上传数据：

hadoop fs -put srcdata.txt /user/hive/warehouse/dept_partition2/month=202001/day=01

直接查询：

hive怎么统计每个分区的数据行数 hive分区表查询数据_hive怎么统计每个分区的数据行数_16

2. 上传数据后添加分区

也可以按照步骤一里面，先在指定位置创建文件夹，在上传数据，最后添加分区也可以；

dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=202002/day=02;

hadoop fs -put srcdata.txt /user/hive/warehouse/dept_partition2/month=202002/day=02/

alter table dept_partition2 add partition(month='202002',day='02');

hive怎么统计每个分区的数据行数 hive分区表查询数据_hive_17

3. 创建文件夹后load数据到分区

alter table dept_partition2 add partition(month='202003',day='03');

load data local inpath '/home/hive/srcdata.txt' into table dept_partition2 partition(month='202003',day='03');

hive怎么统计每个分区的数据行数 hive分区表查询数据_数据_18

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：python 网络编程recvfrom python网络编程教程

下一篇：python打印日志语句 python打印日志到文件

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯