Data Pump(数据抽取)介绍

转载

mb5fdb0f269f12c 2016-11-16 22:51:00

文章标签 数据库 sql 数据 oracle 数据文件 文章分类 代码人生

从10g开始，Oracle提供更高效的Data Pump（即expdp/impdp）来进行数据的导入和导出，老的exp/imp还可以用，但已经不建议使用。注意：expdp/impdp和exp/imp之间互不兼容，也就是说exp导出的文件只能用imp导入，expdp导出的文件只能用impdp导入。

Data Pump的组成部分

Data Pump有以下三个部分组成：

客户端工具：expdp/impdp
Data Pump API (即DBMS_DATAPUMP)
Metadata API（即DMBS_METADATA)

通常情况下，我们都把expdp/impdp等同于Data Pump，但从上面可以知道，实际上它只是Data Pump的一个组成部分，其实真正干活的是两个API，只是它们隐藏在后台，平时很少被注意，但如果出现一些莫名其妙的错误（如internal error等），通常是因为这两个API损坏，跑脚本重新编译它们即可。

Data Pump相关的角色

默认情况下，用户可以导出/导入自己schema下的数据，但如果要导出/导入其它schema下的数据，必须要把以下两个角色赋予该用户：

DATAPUMP_EXP_FULL_DATABASE
DATAPUMP_IMP_FULL_DATABASE

当然，sys,system账户和dba角色默认拥有以上两个角色。

Data Pump数据导入方法

数据文件拷贝：这种是最快的方法，dumpfile里只包含元数据，在操作系统层面拷贝数据文件，相关参数有：TRANSPORT_TABLESPACES,TRANSPORTABLE=ALWAYS
直接路径加载：这是除了文件拷贝之外最快的方法，除非无法用（比如BFILE），否则都用这种方法
外部表：第1,2种无法用的情况下，才会使用外部表
传统路径加载：只有在以上所有方法都不可用的情况下，才会使用传统路径加载，这种方法性能很差

Data Pump Job

当执行expdp/impdp时，其实是起了job执行导出导入工作，一个Data Pump job由以下三部分组成：

主进程(master process）：控制整个job，是整个job的协调者。
主表(master table）：记录dumpfile里数据库对象的元信息，expdp结束时将它写入dumpfile里，impdp开始时读取它，这样才能知道dumpfile里的内容。
工作进程(worker processes）：执行导出导入工作，根据实际情况自动创建多个工作进程并行执行，但不能超过参数PARALLEL定义的个数。

监控Job状态

在屏幕的输出、logfile里都能看到当前Data Pump Job的运行情况，在数据库里也可以查询视图DBA_DATAPUMP_JOBS,USER_DATAPUMP_JOBS, or DBA_DATAPUMP_SESSIONS。

对于时间比较长的Job，可以在动态视图V$SESSION_LONGOPS查看当前Job完成情况以及预估多久能全部完成，具体字段的意义如下：

[plain] view plain copy print?

USERNAME - job owner
OPNAME - job name
TARGET_DESC - job operation
SOFAR - megabytes transferred thus far during the job
TOTALWORK - estimated number of megabytes in the job
UNITS - megabytes (MB)
MESSAGE - a formatted status message of the form:
'job_name: operation_name : nnn out of mmm MB done'

创建Directory

Data Pump不像exp/imp可以在客户端执行，它必须得在服务器端执行，它生成的所有文件都放在服务器端，因此在Oracle里必须得先创建directory对象，下面是一个例子：

[sql] view plain copy print?

SQL> CREATE DIRECTORY dpump_dir1 AS '/usr/apps/datafiles';

创建了directory对象之后，还要把读写权限赋给执行Data Pump的用户，如下所示：

[sql] view plain copy print?

SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir1 TO hr;

导出模式

有以下5种导出模式，它们之间是互斥的，不可以同时使用，注意：有些schemas是不能被导出的，如SYS, ORDSYS, and MDSYS等。

Full模式

设置Full=y（默认为n），导出全库，例子：

Activity	Command Used
Add additional dump files.	ADD_FILE
Exit interactive mode and enter logging mode.	CONTINUE_CLIENT
Stop the export client session, but leave the job running.	EXIT_CLIENT
Redefine the default size to be used for any subsequent dump files.	FILESIZE
Display a summary of available commands.	HELP
Detach all currently attached client sessions and terminate the current job.	KILL_JOB
Increase or decrease the number of active worker processes for the current job. This command is valid only in the Enterprise Edition of Oracle Database 11g.	PARALLEL
Restart a stopped job to which you are attached.	START_JOB
Display detailed status for the current job and/or set status interval.	STATUS
Stop the current job for later restart.	STOP_JOB

Activity	Command Used
Exit interactive-command mode.	CONTINUE_CLIENT
Stop the import client session, but leave the current job running.	EXIT_CLIENT
Display a summary of available commands.	HELP
Detach all currently attached client sessions and terminate the current job.	KILL_JOB
Increase or decrease the number of active worker processes for the current job. This command is valid only in Oracle Database Enterprise Edition.	PARALLEL
Restart a stopped job to which you are attached.	START_JOB
Display detailed status for the current job.	STATUS
Stop the current job.	STOP_JOB