Oracle Data Pump technology enables very high-speed movement of data and metadata from one database to another. Oracle Data Pump is available only on Oracle Database 10g release 1 (10.1) and later.
This chapter discusses the following topics:
Oracle Data Pump is made up of three distinct parts:
The command-line clients,
DBMS_DATAPUMP PL/SQL package (also known as the Data Pump API)
DBMS_METADATA PL/SQL package (also known as the Metadata API)
The Data Pump clients,
impdp, invoke the Data Pump Export utility and Data Pump Import utility, respectively. They provide a user interface that closely resembles the original export (
exp) and import (
impdp clients use the procedures provided in the
DBMS_DATAPUMP PL/SQL package to execute export and import commands, using the parameters entered at the command-line. These parameters enable the exporting and importing of data and metadata for a complete database or subsets of a database.
Note:All Data Pump Export and Import processing, including the reading and writing of dump files, is done on the system (server) selected by the specified database connect string. This means that, for nonprivileged users, the database administrator (DBA) must create directory objects for the Data Pump files that are read and written on that server file system. For privileged users, a default directory object is available. See Default Locations for Dump, Log, and SQL Files for more information about directory objects.
When data is moved, Data Pump automatically uses either direct path load (or unload) or the external tables mechanism, or a combination of both. When metadata is moved, Data Pump uses functionality provided by the
DBMS_METADATA PL/SQL package. The
DBMS_METADATA package provides a centralized facility for the extraction, manipulation, and resubmission of dictionary metadata.
DBMS_METADATA PL/SQL packages can be used independently of the Data Pump clients.
See Also:Oracle Database PL/SQL Packages and Types Reference for descriptions of the
The new Data Pump Export and Import utilities (invoked with the
impdp commands, respectively) have a similar look and feel to the original Export (
exp) and Import (
imp) utilities, but they are completely separate. Dump files generated by the new Data Pump Export utility are not compatible with dump files generated by the original Export utility. Therefore, files generated by the original Export (
exp) utility cannot be imported with the Data Pump Import (
Oracle recommends that you use the new Data Pump Export and Import utilities because they support all Oracle Database 10g features, except for XML schemas and XML schema-based tables. Original Export and Import support the full set of Oracle database release 9.2 features. Also, the design of Data Pump Export and Import results in greatly enhanced data movement performance over the original Export and Import utilities.
Note:See Chapter 19, "Original Export and Import" for information about situations in which you should still use the original Export and Import utilities.
The following are the major new features that provide this increased performance, as well as enhanced ease of use:
The ability to specify the maximum number of threads of active execution operating on behalf of the Data Pump job. This enables you to adjust resource consumption versus elapsed time. See PARALLEL for information about using this parameter in export. See PARALLEL for information about using this parameter in import. (This feature is available only in the Enterprise Edition of Oracle Database 10g.)
The ability to detach from and reattach to long-running jobs without affecting the job itself. This allows DBAs and other operations personnel to monitor jobs from multiple locations. The Data Pump Export and Import utilities can be attached to only one job at a time; however, you can have multiple clients or jobs running at one time. (If you are using the Data Pump API, the restriction on attaching to only one job at a time does not apply.) You can also have multiple clients attached to the same job. See ATTACH for information about using this parameter in export. See ATTACH for information about using this parameter in import.
Support for export and import operations over the network, in which the source of each operation is a remote instance. See NETWORK_LINK for information about using this parameter in export. See NETWORK_LINK for information about using this parameter in import.
The ability, in an import job, to change the name of the source datafile to a different name in all DDL statements where the source datafile is referenced. See REMAP_DATAFILE.
Enhanced support for remapping tablespaces during an import operation. See REMAP_TABLESPACE.
Support for filtering the metadata that is exported and imported, based upon objects and object types. For information about filtering metadata during an export operation, see INCLUDE and EXCLUDE. For information about filtering metadata during an import operation, see INCLUDE and EXCLUDE.
Support for an interactive-command mode that allows monitoring of and interaction with ongoing jobs. See Commands Available in Export's Interactive-Command Mode and Commands Available in Import's Interactive-Command Mode.
The ability to estimate how much space an export job would consume, without actually performing the export. See ESTIMATE_ONLY.
The ability to specify the version of database objects to be moved. In export jobs,
VERSION applies to the version of the database objects to be exported. See VERSION for more information about using this parameter in export.
In import jobs,
VERSION applies only to operations over the network. This means that
VERSION applies to the version of database objects to be extracted from the source database. See VERSION for more information about using this parameter in import.
For additional information about using different versions, see Moving Data Between Different Database Versions.
Most Data Pump export and import operations occur on the Oracle database server. (This contrasts with original export and import, which were primarily client-based.) See Default Locations for Dump, Log, and SQL Files for information about some of the implications of server-based operations.
The remainder of this chapter discusses Data Pump technology as it is implemented in the Data Pump Export and Import utilities. To make full use of Data Pump technology, you must be a privileged user. Privileged users have the
IMP_FULL_DATABASE roles. Nonprivileged users have neither.
Privileged users can do the following:
Export and import database objects owned by others
Export and import nonschema-based objects such as tablespace and schema definitions, system privilege grants, resource plans, and so forth
Attach to, monitor, and control Data Pump jobs initiated by others
Perform remapping operations on database datafiles
Perform remapping operations on schemas other than their own
Data Pump supports two access methods to load and unload table row data: direct path and external tables. Because both methods support the same external data representation, data that is unloaded with one method can be loaded using the other method. Data Pump automatically chooses the fastest method appropriate for each table.
Note:Data Pump will not load tables with disabled unique indexes. If the data needs to be loaded into the table, the indexes must be either dropped or reenabled.
The Oracle database has provided direct path unload capability for export operations since Oracle release 7.3 and a direct path loader API for OCI since Oracle8i. Data Pump technology enhances direct path technology in the following ways:
Improved performance through elimination of unnecessary conversions. This is possible because the direct path internal stream format is used as the format stored in the Data Pump dump files.
Support of additional datatypes and transformations.
The default method that Data Pump uses for loading and unloading data is direct path, when the structure of a table allows it. Note that if the table has any columns of datatype
LONG, then direct path must be used.
The following sections describe situations in which direct path cannot be used for loading and unloading.
Situations in Which Direct Path Load Is Not Used
A global index on multipartition tables exists during a single-partition load. This includes object tables that are partitioned.
A domain index exists for a LOB column.
A table is in a cluster.
There is an active trigger on a pre-existing table.
Fine-grained access control is enabled in insert mode on a pre-existing table.
A table contains
BFILE columns or columns of opaque types.
A referential integrity constraint is present on a pre-existing table.
A table contains
VARRAY columns with an embedded opaque type.
The table has encrypted columns
The table into which data is being imported is a pre-existing table and at least one of the following conditions exists:
There is an active trigger
The table is partitioned
fgac is in insert mode
A referential integrity constraint exists
A unique index exists
Supplemental logging is enabled and the table has at least 1 LOB column.
Situations in Which Direct Path Unload Is Not Used
If any of the following conditions exist for a table, Data Pump uses the external table method to unload data, rather than direct path:
Fine-grained access control for
SELECT is enabled.
The table is a queue table.
The table contains one or more columns of type
BFILE or opaque, or an object type containing opaque columns.
The table contains encrypted columns.
A column of an evolved type that needs upgrading.
A column of type
LONG RAW that is not last.
The Oracle database has provided an external tables capability since Oracle9i that allows reading of data sources external to the database. As of Oracle Database 10g, the external tables feature also supports writing database data to destinations external to the database. Data Pump provides an external tables access driver (
ORACLE_DATAPUMP) that reads and writes files. The format of the files is the same format used with the direct path method. This allows for high-speed loading and unloading of database tables. Data Pump uses external tables as the data access mechanism in the following situations:
Loading and unloading very large tables and partitions in situations where parallel SQL can be used to advantage
Loading tables with global or domain indexes defined on them, including partitioned object tables
Loading tables with active triggers or clustered tables
Loading and unloading tables with encrypted columns
Loading tables with fine-grained access control enabled for inserts
Loading tables that are partitioned differently at load time and unload time
Note:When Data Pump uses external tables as the data access mechanism, it uses the
When you perform an export over a database link, the data from the source database instance is written to dump files on the connected database instance. In addition, the source database can be a read-only database.
When you perform an import over a database link, the import source is a database, not a dump file set, and the data is imported to the connected database instance.
Because the link can identify a remotely networked database, the terms database link and network link are used interchangeably.
Data Pump jobs use a master table, a master process, and worker processes to perform the work and keep track of progress.
For every Data Pump Export job and Data Pump Import job, a master process is created. The master process controls the entire job, including communicating with the clients, creating and controlling a pool of worker processes, and performing logging operations.
While the data and metadata are being transferred, a master table is used to track the progress within a job. The master table is implemented as a user table within the database. The specific function of the master table for export and import jobs is as follows:
For export jobs, the master table records the location of database objects within a dump file set. Export builds and maintains the master table for the duration of the job. At the end of an export job, the content of the master table is written to a file in the dump file set.
For import jobs, the master table is loaded from the dump file set and is used to control the sequence of operations for locating objects that need to be imported into the target database.
The master table is created in the schema of the current user performing the export or import operation. Therefore, that user must have sufficient tablespace quota for its creation. The name of the master table is the same as the name of the job that created it. Therefore, you cannot explicitly give a Data Pump job the same name as a preexisting table or view.
For all operations, the information in the master table is used to restart a job.
The master table is either retained or dropped, depending on the circumstances, as follows:
Upon successful job completion, the master table is dropped.
If a job is stopped using the
STOP_JOB interactive command, the master table is retained for use in restarting the job.
If a job is killed using the
KILL_JOB interactive command, the master table is dropped and the job cannot be restarted.
If a job terminates unexpectedly, the master table is retained. You can delete it if you do not intend to restart the job.
If a job stops before it starts running (that is, it is in the Defining state), the master table is dropped.
See Also:JOB_NAME for more information about how job names are formed.
Within the master table, specific objects are assigned attributes such as name or owning schema. Objects also belong to a class of objects (such as
DIRECTORY). The class of an object is called its object type. You can use the
INCLUDE parameters to restrict the types of objects that are exported and imported. The objects can be based upon the name of the object or the name of the schema that owns the object. You can also specify data-specific filters to restrict the rows that are exported and imported.
When you are moving data from one database to another, it is often useful to perform transformations on the metadata for remapping storage between tablespaces or redefining the owner of a particular set of objects. This is done using the following Data Pump Import parameters:
To improve throughput of a job, you can use the
PARALLEL parameter to set a degree of parallelism that takes maximum advantage of current conditions. For example, to limit the effect of a job on a production system, the database administrator (DBA) might wish to restrict the parallelism. The degree of parallelism can be reset at any time during a job. For example,
PARALLEL could be set to 2 during production hours to restrict a particular job to only two degrees of parallelism, and during nonproduction hours it could be reset to 8. The parallelism setting is enforced by the master process, which allocates work to be executed to worker processes that perform the data and metadata processing within an operation. These worker processes operate in parallel. In general, the degree of parallelism should be set to more than twice the number of CPUs on an instance.
Note:The ability to adjust the degree of parallelism is available only in the Enterprise Edition of Oracle Database.
The worker processes are the ones that actually unload and load metadata and table data in parallel. Worker processes are created as needed until the number of worker processes is equal to the value supplied for the
PARALLEL command-line parameter. The number of active worker processes can be reset throughout the life of a job.
Note:The value of
When a worker process is assigned the task of loading or unloading a very large table or partition, it may choose to use the external tables access method to make maximum use of parallel execution. In such a case, the worker process becomes a parallel execution coordinator. The actual loading and unloading work is divided among some number of parallel I/O execution processes (sometimes called slaves) allocated from the instancewide pool of parallel I/O execution processes.
The Data Pump Export and Import utilities can be attached to a job in either interactive-command mode or logging mode. In logging mode, real-time detailed status about the job is automatically displayed during job execution. The information displayed can include the job and parameter descriptions, an estimate of the amount of data to be exported, a description of the current operation or item being processed, files used during the job, any errors encountered, and the final job state (Stopped or Completed).
Job status can be displayed on request in interactive-command mode. The information displayed can include the job description and state, a description of the current operation or item being processed, files being written, and a cumulative status.
A log file can also be optionally written during the execution of a job. The log file summarizes the progress of the job, lists any errors that were encountered along the way, and records the completion status of the job.
An alternative way to determine job status or to get other information about Data Pump jobs, would be to query the
DBA_DATAPUMP_SESSIONS views. See Oracle Database SQL Reference for descriptions of these views.
Data Pump operations that transfer table data (export and import) maintain an entry in the
V$SESSION_LONGOPS dynamic performance view indicating the job progress (in megabytes of table data transferred). The entry contains the estimated transfer size and is periodically updated to reflect the actual amount of data transferred.
Note:The usefulness of the estimate value for export operations depends on the type of estimation requested when the operation was initiated, and it is updated as required if exceeded by the actual transfer amount. The estimate value for import operations is exact.
SESSION_LONGOPS columns that are relevant to a Data Pump job are as follows:
USERNAME - job owner
OPNAME - job name
TARGET_DESC - job operation
SOFAR - megabytes (MB) transferred thus far during the job
TOTALWORK - estimated number of megabytes (MB) in the job
UNITS - 'MB'
MESSAGE - a formatted status message of the form:
'<job_name>: <operation_name> : nnn out of mmm MB done'
There are three types of files managed by Data Pump jobs:
Dump files to contain the data and metadata that is being moved
Log files to record the messages associated with an operation
SQL files to record the output of a SQLFILE operation. A SQLFILE operation is invoked using the Data Pump Import
SQLFILE parameter and results in all of the SQL DDL that Import will be executing based on other parameters, being written to a SQL file. See SQLFILE for more information.
An understanding of how Data Pump allocates and handles these files will help you to use Export and Import to their fullest advantage.
For export operations, you can specify dump files at the time the job is defined, as well as at a later time during the operation. For example, if you discover that space is running low during an export operation, you can add additional dump files by using the Data Pump Export
ADD_FILE command in interactive mode.
For import operations, all dump files must be specified at the time the job is defined.
Log files and SQL files will overwrite previously existing files. Dump files will never overwrite previously existing files. Instead, an error will be generated.
Because Data Pump is server-based, rather than client-based, dump files, log files, and SQL files are accessed relative to server-based directory paths. Data Pump requires you to specify directory paths as directory objects. A directory object maps a name to a directory path on the file system.
For example, the following SQL statement creates a directory object named
dpump_dir1 that is mapped to a directory located at
SQL> CREATE DIRECTORY dpump_dir1 AS '/usr/apps/datafiles';
The reason that a directory object is required is to ensure data security and integrity. For example:
If you were allowed to specify a directory path location for an input file, you might be able to read data that the server has access to, but to which you should not.
If you were allowed to specify a directory path location for an output file, the server might overwrite a file that you might not normally have privileges to delete.
On Unix and Windows NT systems, a default directory object,
DATA_PUMP_DIR, is created at database creation or whenever the database dictionary is upgraded. By default, it is available only to privileged users.
If you are not a privileged user, before you can run Data Pump Export or Data Pump Import, a directory object must be created by a database administrator (DBA) or by any user with the
After a directory is created, the user creating the directory object needs to grant
WRITE permission on the directory to other users. For example, to allow the Oracle database to read and write files on behalf of user
hr in the directory named by
dpump_dir1, the DBA must execute the following command:
SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir1 TO hr;
WRITE permission to a directory object only means that the Oracle database will read or write that file on your behalf. You are not given direct access to those files outside of the Oracle database unless you have the appropriate operating system privileges. Similarly, the Oracle database requires permission from the operating system to read and write files in the directories.
Data Pump Export and Import use the following order of precedence to determine a file's location:
If a directory object is specified as part of the file specification, then the location specified by that directory object is used. (The directory object must be separated from the filename by a colon.)
If a directory object is not specified for a file, then the directory object named by the
DIRECTORY parameter is used.
If a directory object is not specified, and if no directory object was named by the
DIRECTORY parameter, then the value of the environment variable,
DATA_PUMP_DIR, is used. This environment variable is defined using operating system commands on the client system where the Data Pump Export and Import utilities are run. The value assigned to this client-based environment variable must be the name of a server-based directory object, which must first be created on the server system by a DBA. For example, the following SQL statement creates a directory object on the server system. The name of the directory object is
DUMP_FILES1, and it is located at
SQL> CREATE DIRECTORY DUMP_FILES1 AS '/usr/apps/dumpfiles1';
Then, a user on a UNIX-based client system using
csh can assign the value
DUMP_FILES1 to the environment variable
DIRECTORY parameter can then be omitted from the command line. The dump file
employees.dmp, as well as the log file
export.log, will be written to
%setenv DATA_PUMP_DIR DUMP_FILES1 %expdp hr/hr TABLES=employees DUMPFILE=employees.dmp
If none of the previous three conditions yields a directory object and you are a privileged user, then Data Pump attempts to use the value of the default server-based directory object,
DATA_PUMP_DIR. This directory object is automatically created at database creation or when the database dictionary is upgraded. You can use the following SQL query to see the path definition for
SQL> SELECT directory_name, directory_path FROM dba_directories 2 WHERE directory_name='DATA_PUMP_DIR';
If you are not a privileged user, access to the
DATA_PUMP_DIR directory object must have previously been granted to you by a DBA.
Do not confuse the default
DATA_PUMP_DIR directory object with the client-based environment variable of the same name.
If you use Data Pump Export or Import with Automatic Storage Management (ASM) enabled, you must define the directory object used for the dump file so that the ASM disk-group name is used (instead of an operating system directory path). A separate directory object, which points to an operating system directory path, should be used for the log file. For example, you would create a directory object for the ASM dump file as follows:
SQL> CREATE or REPLACE DIRECTORY dpump_dir as '+DATAFILES/';
Then you would create a separate directory object for the log file:
SQL> CREATE or REPLACE DIRECTORY dpump_log as '/homedir/user1/';
To enable user
hr to have access to these directory objects, you would assign the necessary privileges, for example:
SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir TO hr; SQL> GRANT READ, WRITE ON DIRECTORY dpump_log TO hr;
You would then use the following Data Pump Export command:
> expdp hr/hr DIRECTORY=dpump_dir DUMPFILE=hr.dmp LOGFILE=dpump_log:hr.log
For export and import operations, the parallelism setting (specified with the
PARALLEL parameter) should be less than or equal to the number of dump files in the dump file set. If there are not enough dump files, the performance will not be optimal because multiple threads of execution will be trying to access the same dump file.
PARALLEL parameter is valid only in the Enterprise Edition of Oracle Database 10g.
Instead of, or in addition to, listing specific filenames, you can use the
DUMPFILE parameter during export operations to specify multiple dump files, by using a substitution variable (
%U) in the filename. This is called a dump file template. The new dump files are created as they are needed, beginning with
%U, then using
03, and so on. Enough dump files are created to allow all processes specified by the current setting of the
PARALLEL parameter to be active. If one of the dump files becomes full because its size has reached the maximum size specified by the
FILESIZE parameter, it is closed, and a new dump file (with a new generated name) is created to take its place.
If multiple dump file templates are provided, they are used to generate dump files in a round-robin fashion. For example, if
expc%U were all specified for a job having a parallelism of 6, the initial dump files created would be
For import and SQLFILE operations, if dump file specifications
expc%U are specified, then the operation will begin by attempting to open the dump files
dmp. If the dump file containing the master table is not found in this set, the operation expands its search for dump files by incrementing the substitution variable and looking up the new filenames (for example,
dmp). The search continues until the dump file containing the master table is located. If a dump file does not exist, the operation stops incrementing the substitution variable for the dump file specification that was in error. For example, if
dmp are found but
dmp is not found, then no more files are searched for using the
expb%U specification. Once the master table is found, it is used to determine whether all dump files in the dump file set have been located.
Because most Data Pump operations are performed on the server side, if you are using any version of the database other than
COMPATIBLE, you must provide the server with specific version information. Otherwise, errors may occur. To specify version information, use the
Keep the following information in mind when you are using Data Pump Export and Import to move data between different database versions:
If you specify a database version that is older than the current database version, certain features may be unavailable. For example, specifying
VERSION=10.1 will cause an error if data compression is also specified for the job because compression was not supported in 10.1.
On a Data Pump export, if you specify a database version that is older than the current database version, then a dump file set is created that you can import into that older version of the database. However, the dump file set will not contain any objects that the older database version does not support. For example, if you export from a version 10.2 database to a version 10.1 database, comments on indextypes will not be exported into the dump file set.
Data Pump Import can always read dump file sets created by older versions of the database.
Data Pump Import cannot read dump file sets created by a database version that is newer than the current database version, unless those dump file sets were created with the version parameter set to the version of the target database. Therefore, the best way to perform a downgrade is to perform your Data Pump export with the
VERSION parameter set to the version of the target database.
When operating across a network link, Data Pump requires that the remote database version be either the same as the local database or one version older, at the most. For example, if the local database is version 10.2, the remote database must be either version 10.1 or 10.2. If the local database is version 10.1, then 10.1 is the only version supported for the remote database
If you are familiar with the original Export (
exp) and Import (
imp) utilities, it is important to understand that many of the concepts behind them do not apply to Data Pump Export (
expdp) and Data Pump Import (
impdp). In particular:
Data Pump Export and Import operate on a group of files called a dump file set rather than on a single sequential dump file.
Data Pump Export and Import access files on the server rather than on the client. This results in improved performance. It also means that directory objects are required when you specify file locations.
The Data Pump Export and Import modes operate symmetrically, whereas original export and import did not always exhibit this behavior.
For example, suppose you perform an export with
FULL=Y, followed by an import using
SCHEMAS=HR. This will produce the same results as if you performed an export with
SCHEMAS=HR, followed by an import with
Data Pump Export and Import use parallel execution rather than a single stream of execution, for improved performance. This means that the order of data within dump file sets and the information in the log files is more variable.
Data Pump Export and Import represent metadata in the dump file set as XML documents rather than as DDL commands. This provides improved flexibility for transforming the metadata at import time.
Data Pump Export and Import are self-tuning utilities. Tuning parameters that were used in original Export and Import, such as
RECORDLENGTH, are neither required nor supported by Data Pump Export and Import.
At import time there is no option to perform interim commits during the restoration of a partition. This was provided by the
COMMIT parameter in original Import.
There is no option to merge extents when you re-create tables. In original Import, this was provided by the
COMPRESS parameter. Instead, extents are reallocated according to storage parameters for the target table.
Sequential media, such as tapes and pipes, are not supported.
The Data Pump method for moving data between different database versions is different than the method used by original Export/Import. With original Export, you had to run an older version of Export (
exp) to produce a dump file that was compatible with an older database version. With Data Pump, you can use the current Export (
expdp) version and simply use the
VERSION parameter to specify the target database version. See Moving Data Between Different Database Versions.
When you are importing data into an existing table using either
TRUNCATE, if any row violates an active constraint, the load is discontinued and no data is loaded. This is different from original Import, which logs any rows that are in violation and continues with the load.
Data Pump Export and Import consume more undo tablespace than original Export and Import. This is due to additional metadata queries during export and some relatively long-running master table queries during import. As a result, for databases with large amounts of metadata, you may receive an
ORA-01555: snapshot too old error. To avoid this, consider adding additional undo tablespace or increasing the value of the
UNDO_RETENTION initialization parameter for the database.
If a table has compression enabled, Data Pump Import attempts to compress the data being loaded. Whereas, the original Import utility loaded data in such a way that if a even table had compression enabled, the data was not compressed upon import.
Data Pump supports character set conversion for both direct path and external tables. Most of the restrictions that exist for character set conversions in the original Import utility do not apply to Data Pump. The one case in which character set conversions are not supported under the Data Pump is when using transportable tablespaces. [This info was added per mail from Simon Law/Bill Fisher on 2/23/05]
See Also:For a comparison of Data Pump Export and Import parameters to the parameters of original Export and Import, see the following: