Sqoop安装、使用手册
1 Sqoop简介
Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如: MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。
2 准备
2.1 环境说明
1 Sqoop安装采用tar包的方式.
2 Hadoop版本为hadoop-2.5.0-cdh5.3.3,Hive版本为hive-0.13.1-cdh5.3.3
3 Sqoop下载地址:
a) CDH版
http://archive-primary.cloudera.com/cdh5/cdh/5/
b) Apache版
http://archive.apache.org/dist/sqoop/
2.1.1虚拟机里Linux系统版本
[root@localhost ranger-0.5.0-usersync]# cat /etc/issue | grep Linux
Red Hat Enterprise Linux Server release 6.5 (Santiago)
2.1.2 JDK版本
[root@localhost native]# java -version
java version "1.7.0_67"
Java(TM) SE RuntimeEnvironment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-BitServer VM (build 24.65-b04, mixed mode)
2.1.3mysql版本
[root@localhost native]# mysql -uroot -proot-e"select version()";
Warning: Using a password onthe command line interface can be insecure.
+-----------+
| version() |
+-----------+
| 5.6.14 |
+-----------+
注:
1 Mysql 驱动为mysql-connector-java-5.1.31-bin.jar
2 Oracle驱动为ojdbc14.jar
3
3.1安装Sqoop
1)安装mysql相关服务
rpm tar –zxvf sqoop-1.4.5-cdh5.3.3.tar.gz
vi ~/.bash_profile
#新增环境变量
exportSQOOP_HOME=/root/sqoop-1.4.5-cdh5.3.3
PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SQOOP_HOME/bin
source ~/.bash_profile
2)配置数据库驱动
#拷贝mysql驱动到sqoop的lib内
cp mysql-connector-java-5.1.31-bin.jar $SQOOP_HOME/lib/
#拷贝oracle驱动到sqoop的lib内
cp ojdbc14.jar $SQOOP_HOME/lib/
3.2 安装Sqoop插件(Oraoop)
1)方法一) sqoop-site里配置参数
<property>
<name>sqoop.connection.factories</name>
<value>org.apache.sqoop.manager.oracle.OraOopManagerFactory,com.cloudera.sqoop.manager.DefaultManagerFactory</value>
</property>
#测试sqoop import时不指定split-by 但m为2个以上时是否报错。
2) 方法二) 如果方法一未生效需要按照插件的方式安装。
tar -zxvf oraoop-1.6.0-cdh4.tgz
chmod +x install.sh
./install.sh
#按照提示完成sqoop安装
#测试通方法一
3.3 安装问题
暂无.
4 配置
4.1 Oraoop插件oralce侧配置
配置主要是在启用插件的sqoop。这里需要在oracle那边开启访问系统表的权限.
#oracle里给如下权限或则角色,推荐采用方法二
方法一)
createsession
selectany dictionary(权限)
方法二)
createsession
select_catalog_role(角色)
方法三)
selecton v_$instance
selecton dba_tables
selecton dba_tab_columns
selecton dba_objects
selecton dba_extents
selecton dba_segments — Required for Sqoop imports only
selecton v_$database — Required for Sqoop imports only
selecton v_$parameter — Required for Sqoop imports only
.
5 使用
5.1 Sqoop常用命令
1) 显示mysql数据库表的信息,一般sqoop安装测试用
sqoop list-databases --connect jdbc:mysql://192.168.56.10:3306/FenSheng_ling_test-username root --password 123456
2)sqoop import导入表到hdfs
sqoop import --verbose --fields-terminated-by ','--connect jdbc:mysql://192.168.56.10:3306/test --username root --password123456 --table test_shen --target-dir /home/hadoop/john/ --split-by'NAME' -m 2
3)sqoop导入视图到hdfs
sqoop import --verbose --fields-terminated-by '~'--connect jdbc:mysql://192.168.56.10:3306/Test --username root --password123456
--query 'SELECT * FROM view_a WHERE $CONDITIONS' -m1 target-dir /home/hadoop/test --delete-target-dir
4)sqoop导入hbase
sqoop import --connect jdbc:oracle:thin:@192.56.56.10:1521:STBIGDB --username BIG --password 123456 --query "select ROWKEY, "a" from(select id||'_'||order_id as ROWKEY, area_id ||'^|'||cust_id '^|' || saler as "a" from BIG.TB_HB where \$CONDITIONS)" --split-by ROWKEY --hbase-create-table --hbase-table test_biguser2 --column-family f1 --hbase-row-key ROWKEY
5)sqoop rac方式导入表至hdfs
sqoop import -D oraoop.oracle.rac.service.name=ods.ahdx-Dmapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom"--connect jdbc:oracle:thin:@"(DESCRIPTION = (ADDRESS_LIST = (ADDRESS= (PROTOCOL = TCP)(HOST = 192.168.56.11)(PORT = 1521)) (ADDRESS = (PROTOCOL =TCP)(HOST = 192.168.56.10)(PORT = 1521)) (LOAD_BALANCE = yes) (FAILOVER = on) )(CONNECT_DATA = (SERVICE_NAME = xman.orcl) (SRVR = DEDICATED) ))"--username INNER --password 123456 --table PUT.ORDER --delete-target-dir--target-dir /user/shenl/order --fetch-size 5000 --fields-terminated-by '|'--lines-terminated-by '\001' --null-string '\\N' --null-non-string '\\N' --direct-m 21;
6)sqoop oracle null值转义、特殊字符(\r,\n,\0)过滤
sqoop import --connectjdbc:oracle:thin:@192.168.56.10:1521:EDW --username EDA --password edwedaw--table SHENL -m 1 --warehouse-dir /user/hive/warehouse/ --hive-import--hive-table shenl --as-textfile --hive-drop-import-delims --null-string '\\N'--null-non-string '\\N'
7)sqoop export HDFS数据到oracle
sqoop export -Doraoop.nologging=true-Dsqoop.export.records.per.statement=10 -Dsqoop.export.statements.per.transaction=10--connect jdbc:oracle:thin:@192.168.56.10:1521:EDW --username INT --password123456 --table SHENL --export-dir /user/shenl/ --input-fields-terminated-by'\001' --lines-terminated-by '\006' --input-null-non-string '\\N' --input-null-string'\\N' -m 4
8)sqoop export部分字段到关系型数据库
sqoop export --connect 'jdbc:sqlserver://192.168.56.10;username=sa;password=sasa;database=Test'--table test_shen --export-dir /home/hadoop/test/code.txt--input-fields-terminated-by '~' --columns date,NAME
9)sqoop eval验证工具,可执行SQL返回结果集
sqoop eval --connect"jdbc:oracle:thin:@192.168.56.10:1521:orcl" --username shenl--password shenl -query "select count(*) from ORDER "
10)sqoop oraoop插件之指定oralce分区导入
sqoop import -Dmapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom"-Doraoop.import.partitions=T1,T2,T3 --connectjdbc:oracle:thin:@"(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS =(PROTOCOL = TCP)(HOST = 192.168.56.10)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME= orcl)
)
)
" --username shenl --password shenl --table MYTEST--delete-target-dir --target-dir /sqoopTest2 --fields-terminated-by ','--lines-terminated-by '\n' --fetch-size 5000 --hive-drop-import-delims--null-string '\\N' --null-non-string '\\N' -m 2
6 总结
仔细跟踪log,具体问题具体分析.