本文主要描述了HDFS和RDBMS数据库之间的数据迁移。 一、创建Link 这里主要描述下Oracle、PostgreSQL、MySQL以及HDFS的link创建。 1.1 JDBC驱动安装 从官网下载Oracle和PostgreSQL的jdbc驱动,拷贝到$SQOOP_HOME/server/lib,然后重启sqoop2服务器。我这里用到的Oracle jdbc驱动为ojdbc6.jar,PostgreSQL为postgresql-42.1.4.jar,MySQL为mysql-connector-java-5.1.44-bin.jar。 1.2 创建HDFS Link

sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: HDFS_Link

HDFS cluster

URI: hdfs://192.168.120.96:9000
Conf directory: /u01/hadoop/etc/hadoop
Additional configs:: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and name HDFS_Link

1.3 创建MySQL Link

sqoop:000> create link -c  generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: MySQL_Link

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://192.168.120.92:3306/smsqw?useSSL=false
Username: root
Password: *********
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose: 
New link was successfully created with validation status OK and name MySQL_Link

1.4 创建PostgreSQL link

sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: PostgreSQL_Link

Database connection

Driver class: org.postgresql.Driver
Connection String: jdbc:postgresql://192.168.120.93:5432/rhndb
Username: rhnuser
Password: ******
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose:  
New link was successfully created with validation status OK and name PostgreSQL_Link

1.5 创建Oracle Link

sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: Oracle_link

Database connection

Driver class: oracle.jdbc.driver.OracleDriver
Connection String: jdbc:oracle:thin:@192.168.120.121:1521:rhndb
Username: spwuser
Password: ******
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose:                                            
New link was successfully created with validation status OK and name Oracle_link

> 注意:关系型数据库在Identifier enclose这里,必须先空格再回车。 二、创建关系型数据库到HDFS作业 2.1 MySQL To HDFS作业

sqoop:000> create job -f MySQL_Link -t HDFS_Link
Creating job for links with from name MySQL_Link and to name HDFS_Link
Please fill following values to create new job object
Name: MySQL_To_HDFS

Database source

Schema name: smsqw
Table name: tbMessage
SQL statement: 
Column names: 
There are currently 0 values in the list:
element# 
Partition column: 
Partition column nullable: 
Boundary query: 

Incremental read

Check column: 
Last value: 

Target configuration

Override null value: 
Null value: 
File format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 1
Custom codec: 
Output directory: /user/DataSource
Append mode: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name MySQL_To_HDFS

2.2 PostgreSQL To HDFS作业

sqoop:000> create job -f PostgreSQL_Link -t HDFS_Link
Creating job for links with from name PostgreSQL_Link and to name HDFS_Link
Please fill following values to create new job object
Name: PostgreSQL_to_HDFS

Database source

Schema name: rhndb
Table name: rhnpackagechangelogdata
SQL statement: 
Column names: 
There are currently 0 values in the list:
element# 
Partition column: id					--如果表没有主键,就指定一个字段
Partition column nullable: 
Boundary query: 

Incremental read

Check column: 
Last value: 

Target configuration

Override null value: 
Null value: 
File format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 1
Custom codec: 
Output directory: /user/DataSource/PostgreSQL/RHNPACKAGECHANGELOGDATA
Append mode: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name PostgreSQL_to_HDFS

2.3 创建Oracle To HDFS作业

sqoop:000> create job -f Oracle_link -t HDFS_Link
Creating job for links with from name Oracle_link and to name HDFS_Link
Please fill following values to create new job object
Name: Oracle_To_HDFS

Database source

Schema name: spwuser
Table name: RHNCHANNELPACKAGE
SQL statement: 
Column names: 
There are currently 0 values in the list:
element# 
Partition column: 
Partition column nullable: 
Boundary query: 

Incremental read

Check column: 
Last value: 

Target configuration

Override null value: 
Null value: 
File format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 1
Custom codec: 
Output directory: /user/DataSource/Oracle/RHNCHANNELPACKAGE
Append mode: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name Oracle_To_HDFS

> 注意:如果表没有主键,则需在"Partition column: "处指定一个字段。 三、创建HDFS To 关系型数据库作业 3.1 HDFS To MySQL作业

sqoop:000> create job -f HDFS_Link -t MySQL_Link
Creating job for links with from name HDFS_Link and to name MySQL_Link
Please fill following values to create new job object
Name: HDFS_To_MySQL

Input configuration

Input directory: /user/DataSource/MySQL/tbDeliverMsg
Override null value: 
Null value: 

Incremental import

Incremental type: 
  0 : NONE
  1 : NEW_FILES
Choose: 0
Last imported date: 

Database target

Schema name: smsqw
Table name: tbDeliverMsg2
Column names: 
There are currently 0 values in the list:
element# 
Staging table: 
Clear stage table: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name HDFS_To_MySQL

3.2 HDFS To PostgreSQL作业

sqoop:000> create job -f HDFS_Link -t PostgreSQL_Link
Creating job for links with from name HDFS_Link and to name PostgreSQL_Link
Please fill following values to create new job object
Name: HDFS_To_PostgreSQL

Input configuration

Input directory: /user/DataSource/PostgreSQL/RHNPACKAGECHANGELOGDATA
Override null value: 
Null value: 

Incremental import

Incremental type: 
  0 : NONE
  1 : NEW_FILES
Choose: 0
Last imported date: 

Database target

Schema name: public
Table name: rhnpackagechangelogdata
Column names: 
There are currently 0 values in the list:
element# 
Staging table: 
Clear stage table: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name HDFS_To_PostgreSQL

3.3 HDFS To Oracle作业

sqoop:000> create job -f HDFS_Link -t Oracle_link
Creating job for links with from name HDFS_Link and to name Oracle_link
Please fill following values to create new job object
Name: HDFS_To_Oracle

Input configuration

Input directory: /user/DataSource/Oracle/QRTZ_CRON_TRIGGERS
Override null value: 
Null value: 

Incremental import

Incremental type: 
  0 : NONE
  1 : NEW_FILES
Choose: 0
Last imported date: 

Database target

Schema name: spwuser
Table name: QRTZ_CRON_TRIGGERS
Column names: 
There are currently 0 values in the list:
element# 
Staging table: 
Clear stage table: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name create job -f HDFS_Link -t Oracle_link

另外,可以查看指定作业的详细信息,如下: 四、运行作业 4.1 MySQL_To_HDFS 4.2 PostgreSQL_To_HDFS 4.3 Oracle_To_HDFS 4.4 HDFS_To_MySQL 4.5 HDFS_To_PostgreSQL 4.6 HDFS_To_Oracle 参考文献: 1、Sqoop2 User Guide