Importing Large Data Sets

The topics in this section provide tips on improving performance when importing large data sets to the directory server. By default, the server imports data with a fixed set of parameters. You can change the default behavior in two ways:

Specify certain options when you run the import-ldif command.

For more information, see Setting the Import Options.
Use the dsjavaproperties command to set the appropriate Java arguments before running the import-ldif command.

For more information, see Tuning the JVM and Java Arguments.

Setting the Import Options

The following options of the import-ldif command are useful when you are importing particularly large databases:

--skipDNValidation

This option significantly speeds up a large import because no DN validation or database loading is performed during the first phase of the import. The DNs in the LDIF file are treated as regular indexes and are written to a scratch index file that is loaded in phase two of the import.

During the second phase of the import, limited DN parental checking is performed. During this evaluation, the DNs in the LDIF file are examined to make sure that each DN has a correct parent DN. When a DN is detected without a parent, a dummy entry is written to the reject file.

If the --skipDNValidation option is specified, no duplicate DN checking is performed.

The server does not remove bad entry IDs from the index database during phase two of the import. It is therefore essential that the LDIF import file is correct if the --skipDNVavlidation option is specified. Correct LDIF files are generally those that are generated by using the make-ldif command, LDIF files exported from an LDAP server, or LDIF files created by scripts that are historically known to generate correct LDIF files.
--threadCount

This option speeds up a large import by enabling you to specify that more threads are dedicated to the import process. By default, two threads per CPU are used for an import operation.

Increasing the --thread-count also increases the buffer space that is required in phase one of the LDIF import.
--tmpDirectory

In the first phase of the import, the server parses the LDIF file, sorts the index records, and writes the records to temporary files. By default, the temporary index files are written to intall-dir/import-tmp. If you are importing particularly large index files, you might want to specify another location that has more disk space.

The amount of space required for the temporary index files depends on the following factors:
- The number of entries in the LDIF file.
- The size of the entries in the LDIF file.
  
  Entries with large numbers of attributes that require indexing will require more space in the temporary directory location, and in the database directory.
- The number of indexes that are configured.
  
  The more indexes that are configured, the more disk space is required in the temporary directory location, and in the database directory. Substring indexes require more temporary disk space to process than other types of indexes.
- Increasing the index-entry-limit for all indexes, or for individual indexes, requires more disk space.
  
  This is especially true for substring indexes. If you are importing an LDIF file with a large number of entries, you should turn off all substring indexing to prevent a number of the index records will hitting the index-entry-limit.

Tuning the JVM and Java Arguments

Tuning the JVM heap is essential to the performance of the import-ldif command. Although the import-ldif command attempts to limit the amount of JVM heap that it requires, you should allocate as large a JVM heap as possible to import-ldif if you are importing a large number of entries.

The following JVM tuning considerations have specific impact on the import-ldif operation:

Performing an online import uses the JVM settings that were specified when the server was started. If you plan to import a large LDIF file by using the online import, you should provide extra JVM heap when the server is started. In general, if you need to import a large LDIF file, the best option is to perform an offline import.
The 32-bit JVM generally performs better for smaller LDIF files and for most larger LDIF files.

You should always try this JVM first, with as large a heap as can be spared. A minimum heap of 2 Gbytes is recommended.
You might require a 64-bit JVM with a large JVM heap (greater than 4 Gbytes) for extremely large LDIF files, depending on the size of the entries and the indexes configured.

The 64-bit JVM does not generally perform as well as the 32-bit JVM.
The default JVM ergonomics might be too small for some JVMs and can seriously impact performance.

Take note of the default ergonomic values for your JVM (these values differ by vendor and by operating system).
If you are using replication, you should budget additional JVM heap, particularly if you plan to do a full initialization of the other replicas in the topology after an online import.

When you have calculated the memory requirement, perform the following steps:

Edit the java.properties file and set the following values:

overwrite-env-java-args=true
import-ldif.offline.java-args=-Xms2560M -Xmx2560M

Run the dsjavaproperties command:
```
$ bin/dsjavaproperties
```

Note - Running the dsjavaproperties command, or setting the OPENDS_JAVA_ARGS environment variable, only has a performance impact if the import is offline. If the server is already running and you perform an online import, changing the Java arguments has no impact on the import performance because the import is performed by the server JVM.

Skip Navigation Links
Exit Print View
	Oracle Fusion Middleware Administration Guide for Oracle Unified Directory 11g Release 1 (11.1.1)