Sqoop的使用

原创

DanielMaster 2022-03-01 14:24:42 ©著作权

文章标签 Sqoop mysql hive sqoop 文章分类 代码人生

©著作权归作者所有：来自51CTO博客作者DanielMaster的原创作品，请联系作者获取转载授权，否则将追究法律责任

文章目录

数据导入

1.mysql -> hdfs

1.1导入到hdfs的默认路径下
1.2指定hdfs的路径和分隔符
1.3.指定字段
1.4指定过滤条件
1.5指定sql查询语句导入

2.mysql -> hive

2.1导入hive的默认库和表下
2.2指定hive的库和表

3.mysql -> hbase
4.增量数据导入

数据导出

1.hdfs -> mysql
2.hive -> mysql
3.hbase -> mysql

数据导入

1.mysql -> hdfs

1.1导入到hdfs的默认路径下

导入mysql中的help_keyword表到hdfs的默认路径下
默认的hdfs的路径：/user/hadoop/help_keyword/part-m-00000
默认字段之间的分隔符： ,

sqoop import \
--connect jdbc:mysql://hadoop03:3306/mysql \
--username root  \
--password 123456   \
--table help_keyword   \
-m 1

1.2指定hdfs的路径和分隔符

target-dir 指定hdfs的目标路径
fields-terminated-by 指定字段之间的分隔符

sqoop import   \
--connect jdbc:mysql://hadoop03:3306/mysql   \
--username root  \
--password 123456   \
--table help_keyword   \
--target-dir /user/sqoop/data/my_help_keyword1  \
--fields-terminated-by '\t'  \
-m 2

1.3.指定字段

columns 指定mysql表字段

sqoop import   \
--connect jdbc:mysql://hadoop03:3306/mysql   \
--username root  \
--password 123456   \
--columns "name" \
--table help_keyword  \
--target-dir /user/sqoop/data/my_help_keyword2  \
-m 1

1.4指定过滤条件

where 指定mysql的过滤条件
比如这个sql

select name from help_keyword where help_keyword_id>150;

sqoop import   \
--connect jdbc:mysql://hadoop03:3306/mysql   \
--username root  \
--password 123456   \
--columns "name"   \
--where "help_keyword_id>150" \
--table help_keyword   \
--target-dir /user/sqoop/data/my_help_keyword3  \
-m 1

1.5指定sql查询语句导入

query 指定sql查询语句
split-by 指定逻辑切片切分标准（尽量指定主键）

sqoop import   \
--connect jdbc:mysql://hadoop03:3306/mysql  \
--username root  \
--password 123456   \
--target-dir /user/sqoop/data/my_help_keyword4  \
--query 'select help_keyword_id,name from help_keyword where help_keyword_id>200 and $CONDITIONS' \
--split-by  help_keyword_id \
--fields-terminated-by '\t'  \
-m 4

2.mysql -> hive

2.1导入hive的默认库和表下

内部导入过程为：

先将数据导入到hdfs的默认路径下/user/hadoop/help_keyword
hive中建表，默认在default数据库下
将hdfs数据加载到hive的表中，默认在default中自动创建help_keyword（与mysql中同名的表）
所以这里要先删除刚刚创建的文件夹
hadoop fs -rm -r -f /user/hadoop/help_keywordsqoop import \ --connect jdbc:mysql://hadoop03:3306/mysql \ --username root \ --password 123456 \ --table help_keyword \ --hive-import \ -m 1

2.2指定hive的库和表

需要在hive中先创建好数据库，表可以自动创建

hive-overwrite 覆盖导入
create-hive-table 创建hive中的表
hive-database 指定hive的数据库名
hive-table 指定hive的表名
delete-target-dir 删除目标路径

sqoop import  \
--connect jdbc:mysql://hadoop03:3306/mysql  \
--username root  \
--password 123456  \
--table help_keyword  \
--fields-terminated-by "\t"  \
--lines-terminated-by "\n"  \
--hive-import  \
--hive-overwrite  \
--create-hive-table  \
--delete-target-dir \
--hive-database  mydb_test \
--hive-table new_help_keyword

3.mysql -> hbase

需要手动创建hbase的表

create "test_sqoop","info"

hbase-table 指定hbase的表名
column-family 指定列族的
hbase-row-key 指定hbase的数据rk

sqoop import \
--connect jdbc:mysql://hadoop03:3306/mysql \
--username root \
--password 123456 \
--table help_keyword \
--hbase-table test_sqoop \
--column-family info \
--hbase-row-key help_keyword_id

4.增量数据导入

全量数据导入：每次导入一个表的全部的数据
增量数据导入：每次只导入新增的数据
check-column 指定增量的参考字段一般字段取主键
incremental 指定参考字段的上次的最后一个值， ‘append’ 代表追加
last-value 指定字段的上一次最后一个值
类似于这个sql

select * from help_keyword where help_keyword_id>300;

sqoop import   \
--connect jdbc:mysql://hadoop03:3306/mysql   \
--username root  \
--password 123456   \
--table help_keyword  \
--target-dir /user/sqoop/data/my_help_keyword5  \
--incremental  append  \
--check-column  help_keyword_id \
--last-value 300  \
-m 1

数据导出

1.hdfs -> mysql

mysql中的库和表都需要手动进行创建

create database test_sqoop;
use test_sqoop;
CREATE TABLE sqoopstudent ( 
   id INT NOT NULL PRIMARY KEY, 
   name VARCHAR(20), 
   sex VARCHAR(20),
   age INT,
   department VARCHAR(20)
);

export-dir 指定hdfs的文件的路径
fields-terminated-by hdfs文件的字段之间的分隔符

sqoop export \
--connect jdbc:mysql://hadoop03:3306/test_sqoop  \
--username root \
--password 123456 \
--table sqoopstudent \
--export-dir /hive_data/student.txt \
--fields-terminated-by ','

2.hive -> mysql

手动创建mysql表

CREATE TABLE uv_info ( 
   id INT NOT NULL PRIMARY KEY, 
   name VARCHAR(200)
);

export-dir 指定hive的表在hdfs的存储路径
input-fields-terminated-by 指定hive的hdfs的文件的字段分割符

sqoop export \
--connect jdbc:mysql://hadoop03:3306/test_sqoop \
--username root \
--password 123456 \
--table uv_info \
--export-dir /user/hive/warehouse/mydb_test.db/new_help_keyword \
--input-fields-terminated-by '\t'