hive 的load和外部表 hive中load命令

转载

mob64ca140d61c6 2023-09-08 14:45:48

文章标签 hive 的load和外部表 hive 大数据数据 HDFS 文章分类 Hive 大数据

本文目录如下：

第5章 DML 数据操作

5.1 数据导入

5.1.1 向表中装载数据（Load）

5.1.1.1 语法
5.1.1.2 实操案例

5.1.2 通过查询语句向表中插入数据（Insert）
5.1.3 查询语句中创建表并加载数据（As Select）
5.1.4 创建表时通过 Location 指定加载数据路径
5.1.5 Import 数据到指定 Hive 表中

5.2 数据导出

5.2.1 Insert 导出
5.2.2 Hadoop 命令导出到本地（常用）
5.2.3 Hive Shell 命令导出
5.2.4 Export 导出到 HDFS 上
5.2.5 Sqoop 导出
5.2.6 清除表中数据（Truncate）

第5章 DML 数据操作

5.1 数据导入

数据导入共有四种方法: 其中第 5.1.1小节(Load) 和第 5.1.2小节(Insert) 所描述的方法比较常用，其他方法了解即可。

5.1.1 向表中装载数据（Load）

5.1.1.1 语法

hive> load data [local] inpath '数据的路径' [overwrite] into table student [partition (partcol1=val1,…)];

(1) local:表示从本地加载数据到 hive 表；否则从 HDFS 加载数据到 hive 表
(2) overwrite:表示覆盖表中已有数据，否则表示追加
(3) partition:表示上传到指定分区

5.1.1.2 实操案例

(0) 创建一张表: student

hive (default)> create table student(id int, name string) 
			  > row format delimited fields terminated by '\t';

(1) 加载本地文件到 hive (文件在本地)

hive (default)> load data local inpath 'datas/student.txt' into table student;

(2) 加载 HDFS 文件到 hive (文件在 HDFS)

上传文件到 HDFS

hive (default)> dfs -put datas/student.txt /user/xqzhao/hive;

加载 HDFS 上数据

hive (default)> load data inpath '/student.txt' into table student;

(3) 加载数据覆盖表中已有的数据

hive (default)> load data inpath '/student.txt' overwrite into table student;

5.1.2 通过查询语句向表中插入数据（Insert）

(1) 创建一张表

hive (default)> create table student2(id int, name string) 
              > row format delimited fields terminated by '\t';

(2) 基本插入数据

hive (default)> insert into table student2 
              > values(1,'xqzhao'),(2,'xinge');

(3) 基本模式插入（根据单张表查询结果）(常用)

hive (default)> insert overwrite table student 
              > select id, name from student where month='201709';

insert into：以追加数据的方式插入到表或分区，原有数据不会删除
insert overwrite：会覆盖表中已存在的数据
注意：insert 不支持插入部分字段

(4) 多表（多分区）插入模式（根据多张表查询结果）(这个并不实用)

hive (default)> from student
			  > insert overwrite table student partition(month='201707')
			  > select id, name where month='201709'
			  > insert overwrite table student partition(month='201706')
			  > select id, name where month='201709';

5.1.3 查询语句中创建表并加载数据（As Select）

详见 4.5.1 章创建表。
根据查询结果创建表（查询的结果会添加到新创建的表中）

hive (default)> create table if not exists student3
			  > as 
			  > select id, name from student;

5.1.4 创建表时通过 Location 指定加载数据路径

(1) 上传数据到 hdfs 上

`创建 student4 文件夹`
[xqzhao@hadoop100 hive]$ hadoop fs -mkdir /student4

`将 student.txt 文件上传至 student4 文件夹`
[xqzhao@hadoop100 hive]$ hadoop fs -put datas/student.txt /student4

(2) 创建表，并指定在 hdfs 上的位置

hive (default)> create external table if not exists student4(id int,name string)
		      > row format delimited fields terminated by '\t'
			  > location '/student4;

(3) 查询数据

hive (default)> select * from student5;

5.1.5 Import 数据到指定 Hive 表中

注意：先用 export 导出后，再将数据导入。

hive (default)> import table student2
		      > from '/user/hive/warehouse/export/student';

5.2 数据导出

5.2.1 Insert 导出

(1) 将查询的结果导出到本地

hive (default)> insert overwrite local directory 'datas/export/student'
			  > select * from student;

(2) 将查询的结果格式化导出到本地

hive(default)> insert overwrite local directory 'datas/export/student'
			 > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
   			 > select * from student;

(3) 将查询的结果导出到 HDFS 上(没有 local) (不太常用)

hive (default)> insert overwrite directory '/student'
			  > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
			  > select * from student;

注：使用这个命令将查询结果导出到 HDFS 上会经过 MR 处理，效率不高。其实将结果导入到 HDFS，可以直接使用 cp 命令，因此这个命令不太常用。

5.2.2 Hadoop 命令导出到本地（常用）

hive (default)> dfs -get /user/hive/warehouse/student/student.txt /opt/module/data/export/student3.txt;

5.2.3 Hive Shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）

[xqzhao@hadoop100 hive]$ bin/hive -e 'select * from default.student;' > /opt/module/hive/data/export/student4.txt;

5.2.4 Export 导出到 HDFS 上

(defahiveult)> export table default.student 
			 > to '/user/hive/warehouse/export/student';

export 和 import 主要用于两个 Hadoop 平台集群之间 Hive 表迁移。

5.2.5 Sqoop 导出

后续课程专门讲。

5.2.6 清除表中数据（Truncate）

注意：Truncate 只能删除管理表，不能删除外部表中数据

hive (default)> truncate table student;

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：android 发送广播连接蓝牙安卓广播实现原理

下一篇：java用代码生成properties文件 java 生成serialversionuid

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

hive 的load和外部表 hive中load命令

hive 的load和外部表 hive中load命令

本文目录如下：

第5章 DML 数据操作

5.1 数据导入

5.1.1 向表中装载数据（Load）

5.1.1.1 语法

5.1.1.2 实操案例

5.1.2 通过查询语句向表中插入数据（Insert）

5.1.3 查询语句中创建表并加载数据（As Select）

5.1.4 创建表时通过 Location 指定加载数据路径

5.1.5 Import 数据到指定 Hive 表中

5.2 数据导出

5.2.1 Insert 导出

5.2.2 Hadoop 命令导出到本地（常用）

5.2.3 Hive Shell 命令导出

5.2.4 Export 导出到 HDFS 上

5.2.5 Sqoop 导出

5.2.6 清除表中数据（Truncate）

51CTO博客