问题分析:

  1. hive中分区表其底层就是HDFS中的多个目录下的单个文件,hive导出数据本质是将HDFS中的文件导出
  2. hive中的分区表,因为分区字段(静态分区)不在文件中,所以在sqoop导出的时候,无法将分区字段进行直接导出

思路:在hive中创建一个临时表,将分区表复制过去后分区字段转换为普通字段,然后再用sqoop将tmp表导出即实现需求

步凑如下:

1.创建目标表(分区表)



hive> CREATE TABLE `dept_partition`(                     
`deptno` int,
`dname` string,
`loc` string)
PARTITIONED BY (`month` string) row format delimited fields terminated by '\t';


1.1查看表结构



hive> show create table dept_partition;



+----------------------------------------------------+--+
| createtab_stmt |
+----------------------------------------------------+--+
| CREATE TABLE `dept_partition`( |
| `deptno` int, |
| `dname` string, |
| `loc` string) |
| PARTITIONED BY ( |
| `month` string)


2.导入数据



hive> load data inpath '/user/hive/hive_db/data/dept.txt' into table dept_partition;



10    ACCOUNTING    1700
20 RESEARCH 1800
30 SALES 1900
40 OPERATIONS 1700


3.查询表dept_partition



hive> select * from dept_partition;



+------------------------+-----------------------+---------------------+-----------------------+--+
| dept_partition.deptno | dept_partition.dname | dept_partition.loc | dept_partition.month |
+------------------------+-----------------------+---------------------+-----------------------+--+
| 10 | ACCOUNTING | 1700 | 2019-10-19 |
| 20 | RESEARCH | 1800 | 2019-10-19 |
| 30 | SALES | 1900 | 2019-10-19 |
| 40 | OPERATIONS | 1700 | 2019-10-19 |
| 10 | ACCOUNTING | 1700 | 2019-10-20 |
| 20 | RESEARCH | 1800 | 2019-10-20 |
| 30 | SALES | 1900 | 2019-10-20 |
| 40 | OPERATIONS | 1700 | 2019-10-20 |
+------------------------+-----------------------+---------------------+-----------------------+--+


4.创建临时表 tmp_dept_partition



hive> create table tmp_dept_partition as select * from dept_partition;


5.查询临时表



hive> select * from tmp_dept_partition;



+----------------------------+---------------------------+-------------------------+---------------------------+--+
| tmp_dept_partition.deptno | tmp_dept_partition.dname | tmp_dept_partition.loc | tmp_dept_partition.month |
+----------------------------+---------------------------+-------------------------+---------------------------+--+
| 10 | ACCOUNTING | 1700 | 2019-10-19 |
| 20 | RESEARCH | 1800 | 2019-10-19 |
| 30 | SALES | 1900 | 2019-10-19 |
| 40 | OPERATIONS | 1700 | 2019-10-19 |
| 10 | ACCOUNTING | 1700 | 2019-10-20 |
| 20 | RESEARCH | 1800 | 2019-10-20 |
| 30 | SALES | 1900 | 2019-10-20 |
| 40 | OPERATIONS | 1700 | 2019-10-20 |
+----------------------------+---------------------------+-------------------------+---------------------------+--+


6.查看表结构(这个时候分区表已经转换为非分区表了)



hive> show create table tmp_dept_partition;



+----------------------------------------------------+--+
| createtab_stmt |
+----------------------------------------------------+--+
| CREATE TABLE `tmp_dept_partition`( |
| `deptno` int, |
| `dname` string, |
| `loc` string, |
| `month` string)


7.MySQL中建表 dept_partition



mysql> drop table if exists dept_partition;
create table dept_partition(
`deptno` int,
`dname` varchar(20),
`loc` varchar(20),
`month` varchar(50))


8.使用sqoop导入到MySQL



bin/sqoop export \
--connect jdbc:mysql://hadoop01:3306/partitionTb \
--username root \
--password 123456 \
--table dept_partition \
--num-mappers 1 \
--export-dir /user/hive/warehouse/hive_db.db/tmp_dept_partition \
--input-fields-terminated-by "\001"


9.Mysql查询验证是否成功导出



mysql> select * from dept_partition;



+--------+------------+------+------------+
| deptno | dname | loc | month |
+--------+------------+------+------------+
| 10 | ACCOUNTING | 1700 | 2019-10-19 |
| 20 | RESEARCH | 1800 | 2019-10-19 |
| 30 | SALES | 1900 | 2019-10-19 |
| 40 | OPERATIONS | 1700 | 2019-10-19 |
| 10 | ACCOUNTING | 1700 | 2019-10-20 |
| 20 | RESEARCH | 1800 | 2019-10-20 |
| 30 | SALES | 1900 | 2019-10-20 |
| 40 | OPERATIONS | 1700 | 2019-10-20 |
+--------+------------+------+------------+