Oracle-expdp/impdp 中的并行度PARALLEL参数

转载

小宝大人 2022-10-28 08:35:48 博主文章分类：oracle

文章标签 数据泵 EXPDP IMPDP PARALLEL 文章分类 Oracle 数据库 yyds干货盘点

如果设置 EXPDP parallel=4 必须要设置4个EXPDP文件，不然PARALLEL是有问题的，同时EXPDP会使用一个WORKER进程导出METADATA，其他WORKER进程会同时导出数据，如果EXPDP作业大于250M 只会启动一个WORKER进程如果是500M会启动2个，1000M及会启动4个WOKER进程，一般来说加上%U来设置多个文件。

而IMPDP有所不同，会先启动一个WOKER进程METADATA导入,然后启动多个WORKER进程导入，所以在前期只会看到WOKER在导入METADATA，而且IMPDP如果PARALLE=4也需要>=4个DMP文件，也可以使用%U来进行导入。

nohup expdp system/****  PARALLEL=4 JOB_NAME=full_bak_job full=y dumpfile=exptest:back_%U.dmp logfile=exptest:back.log;
impdp system/*** PARALLEL=4 EXCLUDE=STATISTICS JOB_NAME=full_imp cluster=no full=y dumpfile=test:back_%U.dmp  logfile=test:back_imp.log;

而在11GR2后EXPDP 和 IMDP的WORKER进程会在多个INSTANCE启动，所以DIRECTORY必须在共享磁盘上，如果没有设置共享磁盘还是指定cluster=no 来防止报错。

当观察EXPDP/IMPDP WORKER的时候如下：

Import> status
Job: FULL_IMP
  Operation: IMPORT                        
  Mode: FULL                          
  State: EXECUTING                     
  Bytes Processed: 150,300,713,536
  Percent Done: 80
  Current Parallelism: 6
  Job Error Count: 0
  Dump File: /expdp/back_%u.dmp
  Dump File: /expdp/back_01.dmp
  Dump File: /expdp/back_02.dmp
  Dump File: /expdp/back_03.dmp
  Dump File: /expdp/back_04.dmp
  Dump File: /expdp/back_05.dmp
  Dump File: /expdp/back_06.dmp
  Dump File: /expdp/back_07.dmp
  Dump File: /expdp/back_08.dmp
 
Worker 1 Status:
  Process Name: DW00
  State: EXECUTING                     
  Object Schema: ACRUN
  Object Name: T_PLY_UNDRMSG
  Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
  Completed Objects: 3
  Completed Rows: 3,856,891
  Completed Bytes: 1,134,168,200
  Percent Done: 83
  Worker Parallelism: 1
 
Worker 2 Status:
  Process Name: DW01
  State: EXECUTING                     
  Object Schema: ACRUN
  Object Name: T_FIN_PAYDUE
  Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
  Completed Objects: 5
  Completed Rows: 2,646,941
  Completed Bytes: 1,012,233,224
  Percent Done: 93
  Worker Parallelism: 1
 
Worker 3 Status:
  Process Name: DW02
  State: EXECUTING                     
  Object Schema: ACRUN
  Object Name: MLOG$_T_FIN_CLMDUE
  Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
  Completed Objects: 6
  Completed Bytes: 382,792,584
  Worker Parallelism: 1
 
Worker 4 Status:
  Process Name: DW03
  State: EXECUTING                     
  Object Schema: ACRUN
  Object Name: T_PAY_CONFIRM_INFO
  Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
  Completed Objects: 5
  Completed Rows: 2,443,790
  Completed Bytes: 943,310,104
  Percent Done: 83
  Worker Parallelism: 1
 
Worker 5 Status:
  Process Name: DW04
  State: EXECUTING                     
  Object Schema: ACRUN
  Object Name: T_PLY_TGT
  Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
  Completed Objects: 6
  Completed Rows: 2,285,353
  Completed Bytes: 822,501,496
  Percent Done: 64
  Worker Parallelism: 1
 
Worker 6 Status:
  Process Name: DW05
  State: EXECUTING                     
  Object Schema: ACRUN
  Object Name: T_FIN_PREINDRCT_CLMFEE
  Object Type: DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
  Completed Objects: 5
  Completed Rows: 6,042,384
  Completed Bytes: 989,435,088
  Percent Done: 79
  Worker Parallelism: 1

译文如下：

对于数据泵导出，为并行参数指定的值应小于或等于转储文件集中的文件数。每个工作进程或并行执行进程都需要对转储文件进行独占访问，因此转储文件少于并行度将意味着某些工作进程或PX进程将无法写入它们正在导出的信息。如果发生这种情况，工作进程将进入空闲状态，并且在向作业添加更多文件之前不会执行任何工作。有关如何为数据泵导出作业指定多个转储文件的详细信息，请参阅《数据库实用程序》指南中的DUMPFILE参数说明。

对于数据泵导入，工作进程和PX进程都可以从相同的文件中读取。但是，如果没有足够的转储文件，性能可能不是最佳的，因为多个执行线程将尝试访问同一个转储文件。多个进程共享转储文件对性能的影响取决于包含转储文件的I/O子系统。因此，数据泵导入的PARALLEL参数值不应明显大于转储文件集中的文件数。

在包含数据和元数据的典型导出中，第一个工作进程将卸载元数据：表空间、架构、授权、角色、表、索引等。这一个工作进程卸载元数据，其余所有工作进程同时卸载数据。如果元数据工作器完成并且仍有数据对象要卸载，它也将开始卸载数据。本文档中的示例假定总是有一个工作人员忙于卸载元数据，而其他工作人员则忙于卸载表数据对象。

如果选择了外部表方法，Data Pump将确定可以处理表数据对象的最大PX进程数。它通过将表数据对象的估计大小除以250MB并将结果向下舍入来实现。如果结果为0或1，则PX进程不用于卸载表。

PARALLEL参数在导入和导出中的工作方式略有不同。因为在导入期间创建对象时存在各种依赖关系，所以必须按顺序完成所有操作。对于“导入”，在创建表之前不能进行数据加载，因为无法将数据加载到尚未存在的表中。

原文如下：

For Data Pump Export, the value that is specified for the parallel parameter should be less than or equal to the number of files in the dump file set. Each worker or Parallel Execution Process requires exclusive access to the dump file, so having fewer dump files than the degree of parallelism will mean that some workers or PX processes will be unable to write the information they are exporting. If this occurs, the worker processes go into an idle state and will not be doing any work until more files are added to the job. See the explanation of the DUMPFILE parameter in the Database Utilities guide for details on how to specify multiple dump files for a Data Pump export job.
For Data Pump Import, the workers and PX processes can all read from the same files. However, if there are not enough dump files, the performance may not be optimal because multiple threads of execution will be trying to access the same dump file. The performance impact of multiple processes sharing the dump files depends on the I/O subsystem containing the dump files. For this reason, Data Pump Import should not have a value for the PARALLEL parameter that is significantly larger than the number of files in the dump file set.

In a typical export that includes both data and metadata, the first worker process will unload the metadata: tablespaces, schemas, grants, roles, tables, indexes, and so on. This single worker unloads the metadata, and all the rest unload the data, all at the same time. If the metadata worker finishes and there are still data objects to unload, it will start unloading the data too. The examples in this document assume that there is always one worker busy unloading metadata while the rest of the workers are busy unloading table data objects.

If the external tables method is chosen, Data Pump will determine the maximum number of PX processes that can work on a table data object. It does this by dividing the estimated size of the table data object by 250 MB and rounding the result down. If the result is zero or one, then PX processes are not used to unload the table.

The PARALLEL parameter works a bit differently in Import than Export. Because there are various dependencies that exist when creating objects during import, everything must be done in order. For Import, no data loading can occur until the tables are created because data cannot be loaded into tables that do not yet exist.