目录
- sqoop参数
- mysql中创建表过程
- 查看sqoop命令详细帮助
- 查看本地虚拟机mysql数据库中的数据库的名称
- 查看本地虚拟机mysql数据库中的数据库中表名
- Mysql 中导入到HDFS中并查看数据
- Mysql 中导入到Hbase中并查看数据
- Mysql 中导入到hive中并查看数据
- 使用Sqoop将HDFS/Hive/HBase中的数据导出到MySQL
sqoop参数
-m 指定启动map进程个数,默认是4个
--delete-target-dir 删除目标目录
--mapreduce-job-name 指定mapreduce的job的名字
--target-dir 导入到指定目录
--fields-terminated-by 指定字段之间的分隔符
--null-string 含义是 string类型的字段,当Value是NULL,替换成指定的字符
--null-non-string 含义是非string类型的字段,当Value是NULL,替换成指定字符
--columns 导入表中的部分字段
--where 按条件导入数据
--query 按照sql语句进行导入 使用--query关键字,就不能使用--table和--columns
--options-file 在文件中执行
>>>>>HDFS数据导出到MySQL或Hive中的数据导入到MySQL
--table 指定导出表的名称
--input-fields-terminated-by 指定hdfs上文件的分隔符,默认是逗号
--export-dir 导出数据的目录
--columns 指定导出的字段
>>>数据导入到Hive中
--create-hive-table 创建目标表,如果有会报错
--hive-database 指定hive数据库
--hive-import 指定导入hive(没有这个条件导入到hdfs中)
--hive-overwrite 覆盖
--hive-table 指定hive中表的名字,如果不指定使用导入的表的表名
--hive-partition-key 指定Hive分区表字段
--hive-partition-value 指定导入的分区值
mysql中创建表过程
#查看mysql中数据库
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hivedb |
| mysql |
| performance_schema |
+--------------------+
4 rows in set (0.11 sec)
#创建数据库
mysql> create database sqoopdb;
Query OK, 1 row affected (0.04 sec)
#检查所创建的数据库
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hivedb |
| mysql |
| performance_schema |
| sqoopdb |
+--------------------+
5 rows in set (0.00 sec)
#使用数据库(意思就是想在那个数据库中创建表)
mysql> use sqoopdb;
#看到 Database changed 标志代表可以在它里面创建表
Database changed
#创建表(表的大小写要一致,避免之后出现问题 亲身体验过)
mysql> create table user( ID INT(4) NOT NULL AUTO_INCREMENT, ACCOUNT VARCHAR(255) DEFAULT NULL, PASSWD VARCHAR(255) DEFAULT NULL, PRIMARY KEY(ID) );
Query OK, 0 rows affected (0.15 sec)
#插入数据
mysql> INSERT INTO user VALUES("1","admin",'admin');
Query OK, 1 row affected (0.03 sec)
mysql> INSERT INTO user VALUES("2","root",'root');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO user VALUES("3","zfx",'zfx');
Query OK, 1 row affected (0.00 sec)
#查看插入表中的数据
mysql> select * from user;
+----+---------+--------+
| ID | ACCOUNT | PASSWD |
+----+---------+--------+
| 1 | admin | admin |
| 2 | root | root |
| 3 | zfx | zfx |
+----+---------+--------+
3 rows in set (0.00 sec)
查看sqoop命令详细帮助
[root@hadoop01 ~]$ sqoop help
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
# 这里提示我们使用sqoop help command(要查询的命令)进行该命令的详细查询
查看本地虚拟机mysql数据库中的数据库的名称
#查看本地虚拟机mysql数据库中的数据库的名称
[root@hadoop01 ~]# sqoop list-databases --connect jdbc:mysql://hadoop01:3306/ --username root --password root
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:26:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:26:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:26:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hivedb
mysql
performance_schema
sqoopdb
查看本地虚拟机mysql数据库中的数据库中表名
[root@hadoop01 ~]# sqoop list-tables --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:27:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:27:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:27:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
user
Mysql 中导入到HDFS中并查看数据
sqoop语法:
sqoop import
--connect jdbc:mysql://ip:3306/databasename #指定JDBC的URL 其中database指的是(Mysql或者Oracle)中的数据库名
--table tablename #要读取数据库database中的表名
--username root #用户名
--password 123456 #密码
--target-dir /path #指的是HDFS中导入表的存放目录(注意:是目录)
--fields-terminated-by '\t' #设定导入数据后每个字段的分隔符,默认;分隔
--lines-terminated-by '\n' #设定导入数据后每行的分隔符
--m 1 #并发的map数量1,如果不设置默认启动4个map task执行数据导入,则需要指定一个列来作为划分map task任务的依据
-- where ’查询条件‘ #导入查询出来的内容,表的子集
--incremental append #增量导入
--check-column:column_id #指定增量导入时的参考列
--last-value:num #上一次导入column_id的最后一个值
--null-string ‘’ #导入的字段为空时,用指定的字符进行替换
命令详解
sqoop import #表示导入
--connect jdbc:mysql://hadoop01:3306/sqoopdb #连接mysql
--username root #mysql用户
--password root #mysql密码
--table user #要读取数据库database中的表名
--columns 'id ,ACCOUNT,PASSWD' #导入mysql表中的部分字段(我这里是导入全部字段)
-m 1 #指定启动map进程个数
--target-dir '/sqoopdb/' #导入到hdfs指定目录
实践:
需求:Mysql 中表user 导入到HDFS的sqoopdb目录下 ,并用hdfs命令查看导入的数据
sqoop import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table user --columns 'id ,ACCOUNT,PASSWD' -m 1 --target-dir '/sqoopdb/'
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:39:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:39:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:39:34 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/05/20 09:39:34 INFO tool.CodeGenTool: Beginning code generation
20/05/20 09:39:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
20/05/20 09:39:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
20/05/20 09:39:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/app/hadoop
注: /tmp/sqoop-root/compile/5ab33079a88c16d4be68a133b8c67593/user.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
20/05/20 09:39:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/5ab33079a88c16d4be68a133b8c67593/user.jar
20/05/20 09:39:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/05/20 09:39:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/05/20 09:39:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/05/20 09:39:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/05/20 09:39:40 INFO mapreduce.ImportJobBase: Beginning import of user
20/05/20 09:39:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/05/20 09:39:42 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/05/20 09:39:42 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.40.128:8032
20/05/20 09:39:49 INFO db.DBInputFormat: Using read commited transaction isolation
20/05/20 09:39:49 INFO mapreduce.JobSubmitter: number of splits:1
20/05/20 09:39:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1589936728054_0001
20/05/20 09:39:50 INFO impl.YarnClientImpl: Submitted application application_1589936728054_0001
20/05/20 09:39:50 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1589936728054_0001/
20/05/20 09:39:50 INFO mapreduce.Job: Running job: job_1589936728054_0001
20/05/20 09:40:03 INFO mapreduce.Job: Job job_1589936728054_0001 running in uber mode : false
20/05/20 09:40:03 INFO mapreduce.Job: map 0% reduce 0%
20/05/20 09:40:34 INFO mapreduce.Job: map 100% reduce 0%
20/05/20 09:40:36 INFO mapreduce.Job: Job job_1589936728054_0001 completed successfully
20/05/20 09:40:36 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=124476
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=36
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=24069
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=24069
Total vcore-seconds taken by all map tasks=24069
Total megabyte-seconds taken by all map tasks=24646656
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=52
CPU time spent (ms)=610
Physical memory (bytes) snapshot=84381696
Virtual memory (bytes) snapshot=2078760960
Total committed heap usage (bytes)=16961536
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=36
20/05/20 09:40:36 INFO mapreduce.ImportJobBase: Transferred 36 bytes in 54.4531 seconds (0.6611 bytes/sec)
20/05/20 09:40:36 INFO mapreduce.ImportJobBase: Retrieved 3 records.
使用hdfs 命令查看数据
hadoop fs -ls 显示的是当前的用户目录
hadoop fs -ls / 显示的是HDFS根目录
hdfs dfs -cat/text 显示内容
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb/
Found 2 items
-rw-r--r-- 3 root supergroup 0 2020-05-20 09:40 /sqoopdb/_SUCCESS
-rw-r--r-- 3 root supergroup 36 2020-05-20 09:40 /sqoopdb/part-m-00000
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb//sqoopdb/part-m-00000
ls: `/sqoopdb//sqoopdb/part-m-00000': No such file or directory
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb/part-m-00000
-rw-r--r-- 3 root supergroup 36 2020-05-20 09:40 /sqoopdb/part-m-00000
#这就是导入hdfs中数据
[root@hadoop01 ~]# hdfs dfs -cat /sqoopdb/part-m-00000
1,admin,admin
2,root,root
3,zfx,zfx
Mysql 中导入到Hbase中并查看数据
命令详解
sqoop import #导入数据
--connect jdbc:mysql://hadoop01:3306/sqoopdb #连接mysql
--username root #mysql用户
--password root #mysql密码
--table MYUSER #mysql中导入habse的表名
--columns 'ID ,ACCOUNT,PASSWD' #导入mysql表中的部分字段
-m 1 #指定启动map进程个数
--hbase-table user #指定hbase中表的名字
--hbase-row-key ID #可以你的mysql数据库主键
--column-family info # habse的'列簇名:region'
实践:
需求:
Mysql 中表MYUSER导入到hbase表user中(需要提前在hbase表中创建),之后到hbase表中查看
[root@hadoop01 ~]# sqoop import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns 'ID ,ACCOUNT,PASSWD' -m 1 --hbase-table user --hbase-row-key ID --column-family info
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
................
20/05/20 18:51:50 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 167.842 seconds (0 bytes/sec)
20/05/20 18:51:50 INFO mapreduce.ImportJobBase: Retrieved 2 records.
Hbase命令查看数据
#列出habse中表
hbase(main):003:0> list
TABLE
scores
user
2 row(s) in 6.7600 seconds
=> ["scores", "user"]
#查看通过sqoop 从mysql导入habse数据
hbase(main):004:0> scan 'user'
ROW COLUMN+CELL
1 column=info:ACCOUNT, timestamp=1589971903320, value=admin
1 column=info:PASSWD, timestamp=1589971903320, value=admin
2 column=info:ACCOUNT, timestamp=1589971903320, value=ROOT
2 column=info:PASSWD, timestamp=1589971903320, value=ROOT
2 row(s) in 5.8630 seconds
Mysql 中导入到hive中并查看数据
命令详解
sqoop import #导入数据
--hive-import #指定导入hive
--connect jdbc:mysql://hadoop01:3306/sqoopdb #连接mysql
--username root #mysql用户
--password root #mysql 密码
--table MYUSER #mysql中要往hive导的表
--columns'ID,ACCOUNT,PASSWD' #导入mysql表中的部分字段
-m 1 #指定启动map进程个数
--hive-table myuser #指定hive中表的名字,如果没有话自动创建
实践
需求:将mysql中表MYUSER导入hive表myuser中,并使用hive命令查看
[root@hadoop01 ~]# sqoop import --hive-import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns'ID,ACCOUNT,PASSWD' -m 1 --hive-table myuser
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/21 12:31:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/21 12:31:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: --columnsID,ACCOUNT,PASSWD
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: -m
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: 1
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: USER
Try --help for usage instructions.
[root@hadoop01 ~]# sqoop import --hive-import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns 'ID,ACCOUNT,PASSWD' -m 1 --hive-table USER
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Transferred 26 bytes in 36.2284 seconds (0.7177 bytes/sec)
20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Retrieved 2 records.
20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table MYUSER
20/05/21 12:33:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `MYUSER` AS t LIMIT 1
20/05/21 12:33:27 INFO hive.HiveImport: Loading uploaded data into Hive
20/05/21 12:33:45 INFO hive.HiveImport:
20/05/21 12:33:45 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/hive-common-1.2.1.jar!/hive-log4j.properties
.......
20/05/21 12:33:58 INFO hive.HiveImport: OK
20/05/21 12:33:58 INFO hive.HiveImport: Time taken: 2.803 seconds
20/05/21 12:33:58 INFO hive.HiveImport: Loading data to table default.user
20/05/21 12:33:59 INFO hive.HiveImport: Table default.user stats: [numFiles=1, totalSize=26]
20/05/21 12:33:59 INFO hive.HiveImport: OK
20/05/21 12:33:59 INFO hive.HiveImport: Time taken: 1.299 seconds
20/05/21 12:34:00 INFO hive.HiveImport: Hive import complete.
20/05/21 12:34:00 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
查看mysql中导入数据
#查看数据库
hive> show tables;
OK
mysql1
myuser
stu2
student
user
Time taken: 0.035 seconds, Fetched: 5 row(s)
#查看详细数据
hive> select * from myuser;
OK
1 admin admin
2 ROOT ROOT
Time taken: 0.74 seconds, Fetched: 2 row(s)
hive>
使用Sqoop将HDFS/Hive/HBase中的数据导出到MySQL
sqoop export #导出
--connect jdbc:mysql://hadoop01:3306/sqoopdb #连接mysql
--username root #mysql用户
--password root #mysql密码
-m 1 #指定启动map进程个数
--table user1 #hdfs导入mysql当中那个表(需要提前创建)
--export-dir /sqoop/user #要导出数据的目录
sqoop export --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root -m 1 --table user1 --export-dir /sqoop/user