hive partition变量 hive的partition

转载

mob6454cc636c54 2023-07-12 18:49:21

文章标签 hive partition变量 hive hadoop 数据 文章分类 Hive 大数据

hive 分区：

1、在Hive Select查询中一般会扫描整个表内容，会消耗很多时间做没必要的工作。有时候只需要扫描表中关心的一部分数据，因此建表时引入了partition概念。

2、分区表指的是在创建表时指定的partition的分区空间。

3、如果需要创建有分区的表，需要在create表的时候调用可选参数partitioned by，详见表创建的语法结构。

创建分区(内部表)：创建表时通过partitioned by (fielName dataType) 来指定

hive>create table partition_test(id int,foo_id int) partitioned by(ds string) row format delimited fields terminated by ' ' stored as textfile;

导入hdfs数据：

hive> load data inpath '/input/data/price.txt' overwrite into table partition_test partition(ds='20151105');

查看数据：

hive> select * from partition_test;

查看目录变化：

[root@hadoop02 ~]# hadoop fs -ls /user/hive/warehouse

drwxr-xr-x - root supergroup 0 2015-11-04 18:53 /user/hive/warehouse/partition_test

再进：

[root@hadoop02 ~]# hadoop fs -ls /user/hive/warehouse/partition_test
drwxr-xr-x - root supergroup 0 2015-11-04 18:53 /user/hive/warehouse/partition_test/ds=20151105

从上发现：我们建立的分区就是向 partition_test下面放了一个文件ds=20151105 下面我们查看里面的内容

[root@hadoop02 jdk]# hadoop fs -ls /user/hive/warehouse/partition_test/ds=20151105

-rwxr-xr-x 2 root supergroup 70 2015-10-13 22:57 /user/hive/warehouse/partition_test/ds=20151105/price.txt

可以看到price.txt数据在分区文件下面,可以看到分区其实就是在对应的表文件中新建了一个文件而已（文件名称就是加载数据时指定的那个文件）

查看数据

[root@hadoop02 jdk]# hadoop fs -text /user/hive/warehouse/partition_test/ds=20151105/price.txt

在hive中查询：

hive> select * from partition_test p where ds='20151105';

分区二（外部表）：

hive> create external table partition_test2(id int,context string) partitioned by (ds string) row format delimited fields terminated by ' ' stored as textfile;

倒入hdfs数据：

hive> load data local inpath '/root/Downloads/hive_big2' overwrite into table partition_test2 partition (ds='20151105');

查看目录变化：和内部表上面操作时一样的

hive中查询数据:

hive> select * from partition_test2;

数据如下（只截了一部分）：

分区（一张表的数据拆分成分区）

条件：1.要拆分的表

2.有个承接分区的表

创建要拆分的表：

hive> create external table p_test3(id int,context string) row format delimited fields terminated by ' ' stored as textfile;

倒入数据：

hive> load data local inpath '/root/Downloads/hive_big2' overwrite into table p_test3;

创建承接的表：

hive> create table p_test4(id int,context string) partitioned by (ds string) row format delimited fields terminated by ' ' stored as textfile;

向承接的表中插入数据：

hive> insert into table p_test4 partition(ds='1') select id ,contex from p_test3 where id='1';

hive> insert into table p_test4 partition(ds='2') select id ,contex from p_test3 where id='2';

查询承接表的数据：

hive> select * from p_test4;

看p_test4的目录变化：

[root@hadoop02 Downloads]# hadoop fs -ls /user/hive/warehouse/p_test4

分区操作：

添加分区语法（参照官网）：

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;

partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)

The location must be a directory inside of which data files reside. (ADD PARTITION changes the table metadata, but does not load data. If the data does not exist in the partition's location, queries will not return any results.)

使用：alter table table_name add partition 方式添加分区，数据可以加载到表中（但是分区目录找不到在哪使用drop partition 时会把目录drop掉）

使用：hive> load data inpath '/input/data/add_partition.txt' into table p_test4 partition(ds='5');

可看到目录变化：[root@hadoop02 Downloads]# hadoop fs -ls /user/hive/warehouse/p_test4

可看到新创建出来的文件ds=5

注意外部表添加分区步骤（如果还像上面那样来增加会有问题，外部表要先指定位置）:

向hdfs上传数据，为下面

[root@hadoop02 Downloads]# hadoop fs -mkdir -p /data/input

[root@hadoop02 Downloads]# hadoop fs -put add_partition.txt /data/input/

创建外部表：

hive> create external table p_t3(id int,context string) partitioned by (dt string) row format delimited fields terminated by ' ' stored as textfile location '/input/data';

这时候查看数据是没有的，因为还没指定分区：

下面指定分区：

hive> alter table p_t3 add partition(dt='20151107') location '/input/data/';

hive> alter table p_t3 add partition(dt='20151108') location '/data/input/';

查看目录变化：

[root@hadoop02 Downloads]# hadoop fs -ls /user/hive/warehouse

没有对应的p_t3目录。这里可以发现外部表并不是移动数据而是形成一种映射

查看hdfs上的数据：

[root@hadoop02 Downloads]# hadoop fs -ls /data/input/

-rw-r--r-- 2 root supergroup 21 2015-11-06 05:03 /data/input/add_partition.txt

[root@hadoop02 Downloads]# hadoop fs -ls /input/data

-rw-r--r-- 2 root supergroup 21 2015-11-06 04:53 /input/data/add_partition.txt

从上面也可以看出数据并没有进行移动

一次添加多分区：

ALTER TABLE page_view ADD PARTITION (dt= '2008-08-08' , country= 'us' ) location '/path/to/us/part080808'

PARTITION (dt= '2008-08-09' , country= 'us' ) location '/path/to/us/part080809' ;

移除所有分区：

ALTER TABLE table_name RECOVER PARTITIONS;

删除指定分区：

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]

重命名分区：

ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：html5写的网页 html制作网页

下一篇：java tostring 括号乱码 java提示大括号错误

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

hive partition变量 hive的partition

hive partition变量 hive的partition

51CTO博客