Hive基础（十九）：面试题:如何用sqoop将hive中分区表的分区字段导入到MySQL中

转载

mob604756ef35df 2020-12-21 19:28:00

文章标签 hive mysql 分区表 sqoop 字段 文章分类 MySQL 数据库

问题分析：

hive中分区表其底层就是HDFS中的多个目录下的单个文件，hive导出数据本质是将HDFS中的文件导出
hive中的分区表，因为分区字段（静态分区）不在文件中，所以在sqoop导出的时候，无法将分区字段进行直接导出

思路：在hive中创建一个临时表，将分区表复制过去后分区字段转换为普通字段，然后再用sqoop将tmp表导出即实现需求

步凑如下：

1.创建目标表（分区表）

hive> CREATE TABLE `dept_partition`(                     
`deptno` int,                                    
`dname` string,                                  
`loc` string)                                    
PARTITIONED BY (`month` string) row format delimited fields terminated by '\t';

1.1查看表结构

hive> show create table dept_partition;

+----------------------------------------------------+--+
|                   createtab_stmt                   |
+----------------------------------------------------+--+
| CREATE TABLE `dept_partition`(                     |
|   `deptno` int,                                    |
|   `dname` string,                                  |
|   `loc` string)                                    |
| PARTITIONED BY (                                   |
|   `month` string)

2.导入数据

hive> load data inpath '/user/hive/hive_db/data/dept.txt' into table dept_partition;

10    ACCOUNTING    1700
20    RESEARCH    1800
30    SALES    1900
40    OPERATIONS    1700

3.查询表dept_partition

hive> select * from dept_partition;

+------------------------+-----------------------+---------------------+-----------------------+--+
| dept_partition.deptno  | dept_partition.dname  | dept_partition.loc  | dept_partition.month  |
+------------------------+-----------------------+---------------------+-----------------------+--+
| 10                     | ACCOUNTING            | 1700                | 2019-10-19            |
| 20                     | RESEARCH              | 1800                | 2019-10-19            |
| 30                     | SALES                 | 1900                | 2019-10-19            |
| 40                     | OPERATIONS            | 1700                | 2019-10-19            |
| 10                     | ACCOUNTING            | 1700                | 2019-10-20            |
| 20                     | RESEARCH              | 1800                | 2019-10-20            |
| 30                     | SALES                 | 1900                | 2019-10-20            |
| 40                     | OPERATIONS            | 1700                | 2019-10-20            |
+------------------------+-----------------------+---------------------+-----------------------+--+

4.创建临时表 tmp_dept_partition

hive> create table tmp_dept_partition as select * from dept_partition;

5.查询临时表

hive> select * from tmp_dept_partition;

+----------------------------+---------------------------+-------------------------+---------------------------+--+
| tmp_dept_partition.deptno  | tmp_dept_partition.dname  | tmp_dept_partition.loc  | tmp_dept_partition.month  |
+----------------------------+---------------------------+-------------------------+---------------------------+--+
| 10                         | ACCOUNTING                | 1700                    | 2019-10-19                |
| 20                         | RESEARCH                  | 1800                    | 2019-10-19                |
| 30                         | SALES                     | 1900                    | 2019-10-19                |
| 40                         | OPERATIONS                | 1700                    | 2019-10-19                |
| 10                         | ACCOUNTING                | 1700                    | 2019-10-20                |
| 20                         | RESEARCH                  | 1800                    | 2019-10-20                |
| 30                         | SALES                     | 1900                    | 2019-10-20                |
| 40                         | OPERATIONS                | 1700                    | 2019-10-20                |
+----------------------------+---------------------------+-------------------------+---------------------------+--+

6.查看表结构(这个时候分区表已经转换为非分区表了)

hive> show create table tmp_dept_partition;

+----------------------------------------------------+--+
|                   createtab_stmt                   |
+----------------------------------------------------+--+
| CREATE TABLE `tmp_dept_partition`(                 |
|   `deptno` int,                                    |
|   `dname` string,                                  |
|   `loc` string,                                    |
|   `month` string)

7.MySQL中建表 dept_partition

mysql> drop table if exists dept_partition;
create table dept_partition(
`deptno` int,        
`dname` varchar(20),      
`loc` varchar(20),               
`month` varchar(50))

8.使用sqoop导入到MySQL

bin/sqoop export \
--connect jdbc:mysql://hadoop01:3306/partitionTb \
--username root \
--password 123456 \
--table dept_partition \
--num-mappers 1 \
--export-dir /user/hive/warehouse/hive_db.db/tmp_dept_partition \
--input-fields-terminated-by "\001"

9.Mysql查询验证是否成功导出

mysql> select * from dept_partition;

+--------+------------+------+------------+
| deptno | dname      | loc  | month      |
+--------+------------+------+------------+
|     10 | ACCOUNTING | 1700 | 2019-10-19 |
|     20 | RESEARCH   | 1800 | 2019-10-19 |
|     30 | SALES      | 1900 | 2019-10-19 |
|     40 | OPERATIONS | 1700 | 2019-10-19 |
|     10 | ACCOUNTING | 1700 | 2019-10-20 |
|     20 | RESEARCH   | 1800 | 2019-10-20 |
|     30 | SALES      | 1900 | 2019-10-20 |
|     40 | OPERATIONS | 1700 | 2019-10-20 |
+--------+------------+------+------------+

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：java-如何保证线程T1，T2，T3 按照顺序执行

下一篇：PostgreSQL处理xml数据初步

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

Hive基础（十九）：面试题:如何用sqoop将hive中分区表的分区字段导入到MySQL中

Hive基础（十九）：面试题:如何用sqoop将hive中分区表的分区字段导入到MySQL中

1.创建目标表（分区表）

1.1查看表结构

2.导入数据

3.查询表dept_partition

4.创建临时表 tmp_dept_partition

5.查询临时表

6.查看表结构(这个时候分区表已经转换为非分区表了)

7.MySQL中建表 dept_partition

8.使用sqoop导入到MySQL

9.Mysql查询验证是否成功导出

51CTO博客