The Data Pump Export and Import utilities are designed especially for very large databases. If you have large quantities of data versus metadata, then you should experience increased data performance compared to the original Export and Import utilities. (Performance of metadata extraction and database object creation in Data Pump Export and Import remains essentially equivalent to that of the original Export and Import utilities.)
Topics that will help you to understand why Data Pump performance is better and that also suggest specific steps you can take to enhance performance of Data Pump export and import operations are:
The improved performance of the Data Pump Export and Import utilities is attributable to several factors, including the following:
Multiple worker processes can perform intertable and interpartition parallelism to load and unload tables in multiple, parallel, direct-path streams.
For very large tables and partitions, single worker processes can choose intrapartition parallelism through multiple parallel queries and parallel DML I/O server processes when the external tables method is used to access data.
Data Pump uses parallelism to build indexes and load package bodies.
Dump files are read and written directly by the server and, therefore, do not require any data movement to the client.
The dump file storage format is the internal stream format of the direct path API. This format is very similar to the format stored in Oracle database data files inside of tablespaces. Therefore, no client-side conversion to
INSERT statement bind variables is performed.
The supported data access methods, direct path and external tables, are faster than conventional SQL. The direct path API provides the fastest single-stream performance. The external tables feature makes efficient use of the parallel queries and parallel DML capabilities of the Oracle database.
Metadata and data extraction can be overlapped during export.
Data Pump is designed to fully use all available resources to maximize throughput and minimize elapsed job time. For this to happen, a system must be well balanced across CPU, memory, and I/O. In addition, standard performance tuning principles apply. For example, for maximum performance you should ensure that the files that are members of a dump file set reside on separate disks, because the dump files are written and read in parallel. Also, the disks should not be the same ones on which the source or target tablespaces reside.
Any performance tuning activity involves making trade-offs between performance and resource consumption.
The following topics are discussed in this section:
The Data Pump Export and Import utilities let you dynamically increase and decrease resource consumption for each job. This is done using the Data Pump
PARALLEL parameter to specify a degree of parallelism for the job. For maximum throughput, do not set
PARALLEL to much more than twice the number of CPUs (two workers for each CPU).
As you increase the degree of parallelism, CPU usage, memory consumption, and I/O bandwidth usage also increase. You must ensure that adequate amounts of these resources are available. If necessary, you can distribute files across different disk devices or channels to get the needed I/O bandwidth.
To maximize parallelism, you must supply at least one file for each degree of parallelism. The simplest way of doing this is to use substitution variables in your file names (for example,
file%u.dmp). However, depending upon your disk set up (for example, simple, non-striped disks), you might not want to put all dump files on one device. In this case, it is best to specify multiple file names using substitution variables, with each in a separate directory resolving to a separate disk. Even with fast CPUs and fast disks, the path between the CPU and the disk may be the constraining factor in the degree of parallelism that can be sustained.
The Data Pump
PARALLEL parameter is valid only in the Enterprise Edition of Oracle Database 11g or later.
The use of Data Pump parameters related to compression and encryption can have a positive effect on performance, particularly in the case of jobs performed in network mode. But you should be aware that there can also be a negative effect on performance because of the additional CPU resources required to perform transformations on the raw data. There are trade-offs on both sides.
Data Pump Export dump files that are created with a release prior to 12.1, and that contain large amounts of statistics data, can cause an import operation to use large amounts of memory. To avoid running out of memory during the import operation, be sure to allocate enough memory before beginning the import. The exact amount of memory needed will depend upon how much data you are importing, the platform you are using, and other variables unique to your configuration.
One way to avoid this problem altogether is to set the Data Pump
EXCLUDE=STATISTICS parameter on either the export or import operation. You can then use the
DBMS_STATS PL/SQL package to regenerate the statistics on the target database after the import has completed.
Oracle Database SQL Tuning Guide for information about manual statistics collection using the
DBMS_STATS PL/SQL package
Oracle Database PL/SQL Packages and Types Reference for more information about the
DBMS_STATS PL/SQL package
The Data Pump Export EXCLUDE parameter
The Data Pump Import EXCLUDE parameter
The settings for certain Oracle Database initialization parameters can affect the performance of Data Pump Export and Import. In particular, you can try using the following settings to improve performance, although the effect may not be the same on all platforms.
The following initialization parameters must have values set high enough to allow for maximum parallelism:
UNDO_TABLESPACE initialization parameters should be generously sized. The exact values depend upon the size of your database.
Oracle Data Pump uses Streams functionality to communicate between processes. If the
SGA_TARGET initialization parameter is set, then the
STREAMS_POOL_SIZE initialization parameter is automatically set to a reasonable value.
SGA_TARGET initialization parameter is not set and the
STREAMS_POOL_SIZE initialization parameter is not defined, then the size of the streams pool automatically defaults to 10% of the size of the shared pool.
When the streams pool is created, the required SGA memory is taken from memory allocated to the buffer cache, reducing the size of the cache to less than what was specified by the
DB_CACHE_SIZE initialization parameter. This means that if the buffer cache was configured with only the minimal required SGA, then Data Pump operations may not work properly. A minimum size of 10 MB is recommended for
STREAMS_POOL_SIZE to ensure successful Data Pump operations.