1.  参考说明


参考文档:


 


​http://sqoop.apache.org/​


 


​http://sqoop.apache.org/docs/1.99.7/admin/Installation.html​


 


2.  安装环境说明


2.1.  环境说明


 


CentOS7.4+ Hadoop2.7.5的伪分布式环境


 


主机名



NameNode



SecondaryNameNode



DataNodes



centoshadoop.smartmap.com



192.168.1.80



192.168.1.80



192.168.1.80



 



 



 



 



 


Hadoop的安装目录为:/opt/hadoop/hadoop-2.7.5


 


3.  安装


Sqoop由两部分组成:客户端(client)和服务端(server)。需要在集群的其中某个节点上安装server,该节点的服务端可以作为其他Sqoop客户端的入口点。在服务端的节点上必须安装有Hadoop。客户端可以安装在任意数量的机子上。在装有客户端的机子上不需要安装Hadoop。


3.1.  Sqoop下载


​http://sqoop.apache.org/​


 


3.2.  Sqoop解压


 


将下载的sqoop-1.99.7-bin-hadoop200.tar.gz解压到/opt/hadoop/sqoop-1.99.7目录下


 


4.  配置


4.1.  修改Hadoop配置


4.1.1.  修改core-site.xml文件


[root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/core-site.xml


 


<property>


       <name>hadoop.proxyuser.sqoop2.hosts</name>


       <value>*</value>


</property>


<property>


       <name>hadoop.proxyuser.sqoop2.groups</name>


       <value>*</value>


</property>


<property>


       <name>hadoop.proxyuser.root.hosts</name>


       <value>*</value>


</property>


<property>


       <name>hadoop.proxyuser.root.groups</name>


       <value>*</value>


</property>


 


4.1.2.  修改container-executor.cfg文件


[root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/container-executor.cfg


 


allowed.system.users=sqoop2,hadoop,root


 


4.1.3.  修改yarn-site.xml文件


 


[root@centoshadoop hadoop]# vi /opt/hadoop/hadoop-2.7.5/etc/hadoop/yarn-site.xml


 


4.2.  创建第三方数据库驱动目录


 


[root@centoshadoop sqoop-1.99.7]# mkdir /opt/hadoop/sqoop-1.99.7/jdbcDriver


 


[root@centoshadoop sqoop-1.99.7]# chmod g+rwx,o+rwx /opt/hadoop/sqoop-1.99.7/jdbcDriver


 


4.3.  上传驱动到存放JDBC驱动的目录下


上传软件包mysql-connector-java-5.1.46.jar到/opt/hadoop/sqoop-1.99.7/jdbcDriver目录下


 


4.4.  修改profile文件


vi /etc/profile


 


# export JAVA_HOME=/opt/java/jdk1.7.0_45


export JAVA_HOME=/opt/java/jdk1.8.0_171


# export CLASSPATH=.:$JAVA_HOME/lib


export HADOOP_HOME=/opt/hadoop/hadoop-2.7.5


export HADOOP_INSTALL=$HADOOP_HOME


export HADOOP_MAPRED_HOME=$HADOOP_HOME


export HADOOP_COMMON_HOME=$HADOOP_HOME


export HADOOP_HDFS_HOME=$HADOOP_HOME


export YARN_HOME=$HADOOP_HOME


export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native


 


export PIG_HOME=/opt/hadoop/pig-0.17.0


export HIVE_HOME=/opt/hadoop/hive-2.3.3


 


export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin


export PATH=$PATH:$PIG_HOME/bin


export PATH=$PATH:$HIVE_HOME/bin


 


export SQOOP_HOME=/opt/hadoop/sqoop-1.99.7


export PATH=$PATH:$SQOOP_HOME/bin


export CLASSPATH=$CLASSPATH:$SQOOP_HOME/server/lib:$SQOOP_HOME/shell/lib:$SQOOP_HOME/tools/lib


export SQOOP_SERVER_EXTRA_LIB=/opt/hadoop/sqoop-1.99.7/jdbcDriver/


 


export CLASSPATH=.:$JAVA_HOME/lib:$HIVE_HOME/lib


export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:/usr/local/lib:$HADOOP_HOME/lib/native


export JAVA_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_LIBRARY_PATH


export HADOOP_USER_NAME=hadoop


 


4.5.  将JDK升级为1.8版本


 


将JDK切换成1.8的版本,并修改所有与JAVA_HOME相关的变量


 


4.6.  修改Sqoop的配置文件


 


[root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/conf/sqoop.properties


 


org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/hadoop-2.7.5/etc/hadoop/


 


 


[root@centoshadoop bin]# vi /opt/hadoop/sqoop-1.99.7/bin/sqoop.sh


 


export HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common


export HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs


export HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce


export HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn


 


5.  启动Hadoop


 


5.1.  启动YARN与HDFS


cd /opt/hadoop/hadoop-2.7.5/sbin


 


start-all.sh


 


5.2.  启动historyserver


 


cd /opt/hadoop/hadoop-2.7.5/sbin


 


mr-jobhistory-daemon.sh start historyserver


 


6.  验证Sqoop安装与配置


 


[root@centoshadoop sqoop-1.99.7]# sqoop2-tool verify


Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf


Sqoop home directory: /opt/hadoop/sqoop-1.99.7


Sqoop tool executor:


        Version: 1.99.7


        Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb


        Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine


Running tool: class org.apache.sqoop.tools.tool.VerifyTool


1   [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.


6   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread


SLF4J: Class path contains multiple SLF4J bindings.


SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: Found binding in [jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.


Verification was successful.


Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.


7.  初始化Sqoop的Repository


[root@centoshadoop sqoop-1.99.7]# sqoop2-tool upgrade


Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf


Sqoop home directory: /opt/hadoop/sqoop-1.99.7


Sqoop tool executor:


        Version: 1.99.7


        Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb


        Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine


Running tool: class org.apache.sqoop.tools.tool.UpgradeTool


0   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread


SLF4J: Class path contains multiple SLF4J bindings.


SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: Found binding in [jar:file:/opt/hadoop/hive-2.3.3/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.


Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.


[root@centoshadoop sqoop-1.99.7]#


 


 


8.  应用Sqoop工具


8.1.  服务端


8.1.1.  启动与关闭sqoop2-server


[root@centoshadoop bin]# cd /opt/hadoop/sqoop-1.99.7/bin/


[root@centoshadoop bin]# sqoop2-server start


Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf


Sqoop home directory: /opt/hadoop/sqoop-1.99.7


Starting the Sqoop2 server...


 


[root@centoshadoop bin]# sqoop2-server stop


Setting conf dir: /opt/hadoop/sqoop-1.99.7/bin/../conf


Sqoop home directory: /opt/hadoop/sqoop-1.99.7


Stopping the Sqoop2 server...


Sqoop2 server stopped.


 


[root@centoshadoop bin]# jps


29780 NameNode


30085 SecondaryNameNode


30247 ResourceManager


31767 SqoopJettyServer


29913 DataNode


32107 Jps


30364 NodeManager


29245 JobHistoryServer


[root@centoshadoop bin]#


 


8.2.  客户端


8.2.1.  启动运行Sqoop的交互式Shell环境


 


[root@centoshadoop bin]# cd /opt/hadoop/sqoop-1.99.7/bin/


[root@centoshadoop bin]# sqoop2-shell


sqoop:000> :exit


 


8.2.2.  设置错误显示


sqoop:000> set option --name verbose --value true


Verbose option was changed to true


 


8.2.3.  连接Sqoop服务端


 


sqoop:000> set server -host 192.168.1.80


Server is set successfully


 


# set server --host ubuntu02 --port 12000 --webapp sqoop


 


sqoop:000> show version --all


client version:


 Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb


 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016


server version:


 Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb


 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016


API versions:


  [v1]


 


8.2.4.  创建MySQL的Link对象


 


sqoop:000> show connector


+------------------------+---------+------------------------------------------------------------+----------------------+


|          Name          | Version |                           Class                            | Supported Directions |


+------------------------+---------+------------------------------------------------------------+----------------------+


| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |


| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |


| oracle-jdbc-connector  | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |


| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |


| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |


| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |


| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |


+------------------------+---------+------------------------------------------------------------+----------------------+


sqoop:000>


 


sqoop:000> create link -c generic-jdbc-connector


Creating link for connector with name generic-jdbc-connector


Please fill following values to create new link object


Name: mysql-link


 


Database connection


 


Driver class: com.mysql.jdbc.Driver


Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false


Username: root


Password: ***


Fetch Size:


Connection Properties:


There are currently 0 values in the map:


entry#


SQL Dialect


 


Identifier enclose:       # 空格


New link was successfully created with validation status OK and name mysql-link


sqoop:000>


 


8.2.5.  创建HDFS的Link对象


 


sqoop:000> create link -connector hdfs-connector


Creating link for connector with name hdfs-connector


Please fill following values to create new link object


Name: hdfs-link


 


HDFS cluster


 


URI: hdfs://192.168.1.80:9000


Conf directory:


Additional configs: /opt/hadoop/hadoop-2.7.5/etc/hadoop


There are currently 0 values in the map:


entry#


New link was successfully created with validation status OK and name hdfs-link


sqoop:000>


 


8.2.6.  显示Link对象


 


sqoop:000> show link --all


2 link(s) to show:


link with name mysql-link (Enabled: true, Created by root at 5/23/18 3:48 PM, Updated by root at 5/23/18 3:48 PM)


Using Connector generic-jdbc-connector with name {1}


 Database connection


   Driver class: com.mysql.jdbc.Driver


   Connection String: jdbc:mysql://192.168.1.80:3306/sqooptest?useSSL=false


   Username: root


   Password:


   Fetch Size:


   Connection Properties:


     protocol = tcp


  SQL Dialect


   Identifier enclose: `


link with name hdfs-link (Enabled: true, Created by root at 5/23/18 4:00 PM, Updated by root at 5/23/18 4:00 PM)


Using Connector hdfs-connector with name {1}


  HDFS cluster


   URI: hdfs://192.168.1.80:9000


   Conf directory:


   Additional configs::


sqoop:000>


 


 


8.2.7.  创建MySQL导数据到HDFS的Job


 


8.2.7.1. 准备数据


 


create database sqooptest character set utf8;


 


CREATE TABLE topic (


         id int(11) NOT NULL AUTO_INCREMENT,


         name varchar(50) DEFAULT NULL,


         address varchar(100) DEFAULT NULL,


         time datetime DEFAULT NULL,


         content varchar(500) DEFAULT NULL,


         PRIMARY KEY(id)


)ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=UTF8;


 


INSERT INTO topic(name, address, time, content) values('11', '111', '2018-05-01 08:00:00', 'hello world');


 


INSERT INTO topic(name, address, time, content) values('11', '111', '2018-05-01 08:00:00', NULL);


 


8.2.7.2. 创建任务(MySQLàHDFS)


 


sqoop:000> create job -f mysql-link -t hdfs-link


Creating job for links with from name mysql-link and to name hdfs-link


Please fill following values to create new job object


Name:  mysql2hdfs


 


Database source


 


Schema name: sqooptest   #必填,数据库名称


Table name: topic  #必填,表名


SQL statement:


Column names:


There are currently 0 values in the list:


element#


Partition column: id


Partition column nullable:


Boundary query:


 


Incremental read


 


Check column:


Last value:


 


Target configuration


 


Override null value:


Null value:


File format:


  0 : TEXT_FILE


  1 : SEQUENCE_FILE


  2 : PARQUET_FILE


Choose: 0


Compression codec:


  0 : NONE


  1 : DEFAULT


  2 : DEFLATE


  3 : GZIP


  4 : BZIP2


  5 : LZO


  6 : LZ4


  7 : SNAPPY


  8 : CUSTOM


Choose: 0


Custom codec:


Output directory:/sqoop/sqooptest/mysqltable/    #必填


Append mode:


 


Throttling resources


 


Extractors:


Loaders:


 


Classpath configuration


 


Extra mapper jars:


There are currently 0 values in the list:


element#


New job was successfully created with validation status OK  and name  mysql2hdfs


sqoop:000>


 


8.2.7.3. 启动Job


 


[root@centoshadoop sbin]# hadoop fs –mkdir -p /sqoop/sqooptest/mysqltable


[root@centoshadoop sbin]# hadoop fs –chmod –R 777 /sqoop


 


sqoop:000> show job


+----+-------------------------+-------------------------------------+----------------------------+---------+


| Id |          Name           |           From Connector            |        To Connector        | Enabled |


+----+-------------------------+-------------------------------------+----------------------------+---------+


| 1  |  mysql2hdfs | mysql-link (generic-jdbc-connector) | hdfs-link (hdfs-connector) | true    |


+----+-------------------------+-------------------------------------+----------------------------+---------+


sqoop:000>


 


sqoop:000> set option --name verbose --value true


Verbose option was changed to true


sqoop:000> start job -name mysql2hdfs


Submission details


Job Name: mysql2hdfs


Server URL: http://localhost:12000/sqoop/


Created by: root


Creation date: 2018-05-24 08:57:14 CST


Lastly updated by: root


External ID: job_1527123411152_0002


        http://centoshadoop:8088/proxy/application_1527123411152_0002/


Source Connector schema: Schema{name=  sqooptest  .  topic  ,columns=[


        FixedPoint{name=id,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},


        Text{name=name,nullable=true,type=TEXT,charSize=null},


        Text{name=address,nullable=true,type=TEXT,charSize=null},


        Date{name=time,nullable=true,type=DATE_TIME,hasFraction=true,hasTimezone=false},


        Text{name=content,nullable=true,type=TEXT,charSize=null}]}


2018-05-24 08:57:14 CST: BOOTING  - Progress is not available


sqoop:000>


 


8.2.7.4. 查看结果


 


[root@centoshadoop sbin]# hadoop fs -ls /sqoop/sqooptest/mysqltable


Found 1 items


-rw-r--r--   1 root supergroup         97 2018-05-24 08:57 /sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt


[root@centoshadoop sbin]# hadoop fs -cat /sqoop/sqooptest/mysqltable/363e7159-49c3-40b2-937e-897acf2ad80b.txt


3,'11','111','2018-05-01 08:00:00.000','hello world'


4,'11','111','2018-05-01 08:00:00.000',NULL


[root@centoshadoop sbin]#