SQL*Loader Conventional and Direct Path Loads
This chapter describes SQL*Loader's conventional and direct path load methods. The following topics are covered:
- Overview of Data Loading Methods
- Conventional Path Load Method
For an example of loading with using the direct path load method, see Case 6 . The other cases use the conventional path load method.
- Maximizing Performance of Direct Path Loads
Note: You can use the direct path load method with Trusted Oracle7 Server just as you can with the standard Oracle7 Server.
Data Loading Methods
SQL*Loader provides two methods for loading data:
Direct path loads can be significantly faster than conventional path loads. Direct path loads achieve this performance gain by eliminating much of the Oracle database overhead by writing directly to the database files. The direct load, therefore, does not compete with other users for database resources so it can usually load data at nearly disk speed. Certain considerations, inherent to this method of access to database files, such as security and backup implications, are discussed in this chapter.
Conventional Path Loads
Conventional path loads (the default) use the SQL command INSERT and a bind array buffer to load data into database tables. This method is used by all Oracle tools and applications.
When SQL*Loader performs a conventional path load, it competes equally with all other processes for buffer resources. This can slow the load significantly. Extra overhead is added as SQL commands are generated, passed to Oracle, and processed.
Oracle looks for partially filled blocks and attempts to fill them on each insert. Although appropriate during normal use, this can slow bulk loads dramatically.
When to Use a Conventional Path Load
Because the direct path is many times faster than the conventional path, it is highly desirable to use the direct path. But there are times when the conventional path is preferred. You should use the conventional path in the following situations:
- When accessing an indexed table concurrently with the load, or when applying inserts or updates to a non-indexed table concurrently with the load.
To use the direct path (excepting parallel loads), SQL*Loader must have exclusive write access to the table and exclusive read-write access to any indexes.
- When loading data with SQL*Net.
You cannot load data through the direct path with SQL*Net; unless both systems belong to the same family of computers, and both are using the same character set. Even then, load performance can be significantly impaired by network overhead.
- When loading data into a clustered table.
Clustered tables cannot be loaded through the direct path.
- When loading a relatively small number of rows into a large indexed table.
On the direct path, the existing index is copied when it is merged with the new index keys. If the existing index is very large and the number of new keys is very small, then the index copy time can offset the time saved by loading the data with the direct path.
- When loading a relatively small number of rows into a large table with referential and column-check integrity constraints.
Because these constraints cannot be applied to rows loaded on the direct path, they are disabled for the duration of the load. Then they are applied to the whole table when the load completes. The costs could outweigh the savings for a very large table and a small number of new rows.
- When you want to apply SQL functions to data fields.
SQL functions are not available on the direct path. For more information on the SQL functions, see "Applying SQL Operators to Fields" .
Direct Path Loads
Direct path loads are optimized for maximum data loading capability. Like the conventional path method, SQL*Loader's direct path method provides full support for media recovery.
Instead of filling a bind array buffer and passing it to Oracle with a SQL INSERT command, the direct path option creates data blocks that are already in Oracle database block format. These database blocks are then written directly to the database.
Internally, multiple buffers are used for the formatted data. While one buffer is being filled, multiple buffers are being written if asynchronous I/O is available on the host platform. This parallelism increases load performance.
Figure 8 - 1 shows how conventional and direct path loads perform database writes.
Figure 8 - 1. Database Writes on Direct Path and Conventional Path
Although direct path loads minimize the necessity of database processing, a few, fast calls to Oracle are made at the beginning and end of the load. Tables are locked and the locks are released at the end. Also, during the load, space management routines are used to get new extents when needed and to adjust the high-water mark. The high-water mark is described in "Data Saves" .
Oracle calls are also used to sort the data and build the index.
SQL calls are not performed anytime during the load.
Advantages of Direct Path Loads
The direct path method is faster than the conventional path for the following reasons:
- Partial blocks are not used, so no reads are needed to find them and fewer writes are performed.
- SQL INSERT commands are not generated by SQL*Loader, and therefore, processing load on the Oracle database is reduced.
- The bind-array buffer is not used -- formatted database blocks are written directly.
- The direct path method calls on Oracle to lock tables and indexes at the start of the load and releases them when the load is finished. The conventional path calls Oracle once for each array of rows to process a SQL INSERT statement.
- Unlike conventional path loads, direct path loads use asynchronous I/O, if available, to perform these operations in parallel:
- writing to database files
- Processes using the direct path perform their own write I/O, instead of using Oracle's buffer cache in contention with other Oracle users. Therefore, the direct path does not contend for free buffers in the buffer cache.
- The direct path's pre-sorting option allows you to use high-performance sort routines that are native to your system or installation.
- When the table to be loaded is empty, the pre-sorting option eliminates the sort and merge phases of index-building -- the index is simply filled in as data arrives.
- Protection against instance failure does not require redo log file entries during direct path loads. Therefore, if Oracle is operating in NOARCHIVELOG mode, no time is required to log the load. See "Instance Recovery with the Direct Path" .
When to Use a Direct Path Load
You should use a direct path load in the following situations:
- You have a large amount of data to load quickly. A direct path can quickly load and index large amounts of data. It can also load data into either an empty or non-empty table.
- You want to load data in PARALLEL for maximum performance. See page 8 - 21.
- You want to load data in a character set that cannot be supported in your current session, or when the conventional conversion to the database character set would cause errors.
Conditions for Using Direct PATH LOADS
In addition to the general load conditions described , the following conditions must be satisfied to use the direct path load method:
- Tables are not clustered.
- Tables to be loaded do not have any active transactions pending.
To check for this condition, use the SQL*DBA command MONITOR TABLE to find the object ID for the table(s) you want to load. Then use the command MONITOR LOCK to see if there are any locks on the table.
- SQL strings are not used in the control file.
- If the table(s) is indexed, there are no current SELECT statements on the table(s).
It may be necessary to log off users that have SELECT statements on the table(s). You can use the The SQL*DBA command MONITOR TABLE to see which Oracle users are accessing the table.
All integrity constraints are enforced during direct path loads, although not necessarily at the same time. All constraints that can be checked without referring to other rows or tables, such as the NOT NULL constraint, are enforced during the load. Records that fail these constraints are rejected.
Integrity constraints that depend on other rows or tables, such as referential constraints, are disabled before the direct path load and must be re-enabled afterwards. If REENABLE is specified, SQL*Loader can re-enable them automatically at the end of the load. When the constraints are re-enabled, the entire table is checked. Any rows that fail this check are reported in the specified error log. See the section in this chapter called "Direct Loads, Integrity Constraints, and Triggers".
Field Defaults on the Direct Path
DEFAULT column specifications defined in the database are not available when loading on the direct path. Fields for which default values are desired must be specified with the DEFAULTIF clause, described . If a DEFAULTIF clause is not specified, and the field is NULL, then a NULL value is inserted into the database.
Loading into Synonyms
You can load data into a synonym for a table during a the direct path load, but the synonym must point directly to a table. It cannot be a synonym for a view or a synonym for another synonym.
Exact Version Requirement
A SQL*Loader direct load can only be done for a database of the same version. For example, you cannot do a SQL*Loader Version 7.1.2 direct path load to load into a Oracle Version 7.1.3 database.
Using Direct Path Load
This section explains you how to use SQL*Loader's direct path load.
Setting Up for Direct Path Loads
To prepare the database for direct path loads, you must run the setup script, CATLDR.SQL to create the necessary views. You need only run this script once for each database you plan to do direct loads to. This script can be run during database installation if you know then that you will be doing direct loads.
Specifying a Direct Path Load
To start SQL*Loader in direct load mode, the parameter DIRECT must be set to TRUE on the command line or in the parameter file, if used, in the format:
See Case 6 for an example.
During a direct path load, performance is improved by using temporary storage. After the data is loaded into the table, the new keys are copied to a temporary segment and sorted. The old index and the new keys are then merged to create the new index. The old index, temporary segment, and new index all require storage until the merge is complete. Then the old index and temporary segment are removed.
Note that, during a conventional path load, every time a row is inserted the index is updated. This method does not require temporary storage space, but it does add processing time.
The SINGLEROW Option
Performance on systems with limited memory can also be improved by using the SINGLEROW option. For more information see page 5 - 36.
Note: If, during a direct load, you have specified that the data is to be pre-sorted and the existing index is empty, a temporary segment is not required, and no merge occurs--the keys are put directly into the index. See "Maximizing Performance of Direct Path Loads" for more information.
When multiple indexes are built, the temporary segments corresponding to each index exist simultaneously, in addition to the old indexes. The new keys are then merged with the old indexes, one index at a time. As each new index is created, the old index and the corresponding temporary segment are removed.
Index Storage Requirements
The formula for calculating the amount of space needed for storing the index itself can be found in Chapter 8 "Managing Database Files" of the Oracle7 Server Administrator's Guide. Remember that two indexes exist until the load is complete: the old index and the new index.
Temporary Segment Storage Requirements
The amount of temporary segment space needed for storing the new index keys (in bytes) can be estimated using the following formula:
1.3 * key_storage
key_storage = (number_of_rows) *
( 10 + sum_of_column_sizes + number_of_columns )
The columns included in this formula are the columns in the index. There is one length byte per column, and 10 bytes per row are used for a ROWID and additional overhead.
The constant 1.3 reflects the average amount of extra space needed for sorting. This value is appropriate for most randomly ordered data. If the data arrives in exactly opposite order, twice the key-storage space is required for sorting, and the value of this constant would be 2.0. That is the worst case.
If the data is fully sorted, only enough space to store the index entries is required, and the value of this constant reduces to 1.0. See "Pre-sorting Data for Faster Indexing" for more information.
Indexes Left in Direct Load State
SQL*Loader may leave indexes in direct load state if a direct path load does not complete successfully.
Any SQL statement that tries to use an index that is in direct load state returns an error. The following conditions cause the direct path option to leave an index in direct load state:
- SQL*Loader runs out of space for the index.
- The data is not in the order specified by the SORTED INDEXES clause.
- There is an instance failure while building the index.
To determine if an index is in direct load state, you can execute a simple query:
- There are duplicate keys in a unique index.
SELECT INDEX_NAME, STATUS
WHERE TABLE_NAME = 'tablename';
If you are not the owner of the table, then search ALL_INDEXES or DBA_INDEXES instead of USER_INDEXES.
You can use data saves to protect against loss of data due to instance or media failure. All data loaded up to the last data save is protected against instance failure To continue the load after an instance failure, determine how many rows from the input file were processed before the failure, then use the SKIP option to skip those processed rows. If there were any indexes on the table, drop them before continuing the load, then recreate them after the load. See "Recovery" for more information on media and instance failure.
Note: Indexes are not protected by a data save, because SQL*Loader usually does not build indexes until after data loading completes. (The only time indexes are built during the load is when pre-sorted data is loaded into an empty table -- but these indexes are also unprotected.)
Using the ROWS Parameter
The parameter ROWS determines when data saves occur during a direct path load. The value you specify for ROWS is the number of rows you want SQL*Loader to read from the input file before saving inserts in the database.
The number of rows you specify for a data save is an approximate number. Direct loads always act on full data buffers that match the format of Oracle database blocks. So, the actual number of data rows saved is rounded up to a multiple of the number of rows in a database block.
SQL*Loader always reads the number of rows needed to fill a database block. Discarded and rejected records are then removed, and the remaining records are inserted into the database. So the actual number of rows inserted before a save is the value you specify, rounded up to the number of rows in a database block, minus the number of discarded and rejected records.
Data Save Versus Commit
In a conventional load, ROWS is the number of rows to read before a commit. A direct load data save is similar to a conventional load commit, but it is not identical. The similarities are:
- Data save will make the rows visible to other users
The major difference is that the indexes will be unusable (in DIRECT load state) until the load completes.
- Rows cannot be rolled back after a data save
SQL *Loader provides full support for data recovery when using the direct path option. There are two main types of recovery:
Media recovery is recovering from the loss of a database file. You must operate in ARCHIVELOG mode to recover after a file has been lost.
Instance recovery is recovering from a system failure in which in-memory data was changed (but not written to disk) before the failure occurred. Oracle can always recover from instance failures, even if redo log files are not archived.
See the Oracle7 Server Administrator's Guide for more information about recovery.
Instance Recovery and Direct Path Loads
Because SQL*Loader writes directly to the database files, all rows inserted up to the last data save will automatically be present in the database files if the instance is restarted. Changes do not need to be recorded in the redo log file to make instance recovery possible.
If an instance failure occurs, the indexes being built may be left in direct load state. Drop and re-create any affected indexes before using the table or continuing the load. See "Indexes Left in Direct Load State" for more information on how to determine if an index has been left in direct load state.
Media Recovery and Direct Path Loads
If redo log file archiving is enabled (you are operating in ARCHIVELOG mode), SQL*Loader logs loaded data when using the direct path, making media recovery possible. If redo log archiving is not enabled (you are operating in NOARCHIVELOG mode), then media recovery is not possible.
To recover a database file that was lost while it was being loaded, use the same method that you use to recover data loaded with the conventional path:
1. Restore the most recent backup of the affected database file.
Loading LONG Data Fields
Data that is longer than SQL*Loader's maximum buffer size can be loaded on the direct path with either the PIECED option or by specifying the number of READBUFFERS. This section describes those two options.
Loading Data as PIECED
The data can be loaded in sections with the pieced option if it is the last column of the logical record. The syntax for this specification is given .
Declaring a column as PIECED informs the direct path loader that the field may be processed in pieces, one buffer at once.
The following restrictions apply when declaring a column as PIECED:
- This option is only valid on the direct path.
- Only one field per table may be PIECED.
- The PIECED field must be the last field in the logical record.
- The PIECED field may not be used in any WHEN, NULLIF, or DEFAULTIF clauses.
- The PIECED field's region in the logical record must not overlap with any other field's region.
- The PIECED corresponding database column may not be part of the index.
- It may not be possible to load a rejected record from the bad file if it contains a PIECED field.
For example, a PIECED filed could span 3 records. SQL*Loader loads the piece from the first record and then reuses the buffer for the second buffer. After loading the second piece, the buffer is reused for the third record. If an error is then discovered, only the third record is placed in the bad file because the first two records no longer exist in the buffer. As a result, the record in the bad file would not be valid.
Using the READBUFFERS Keyword
For data that is not divided into separate sections, or not in the last column, READBUFFERS can be specified. With READBUFFERS a buffer transfer area can be allocated that is large enough to hold the entire logical record at one time.
READBUFFERS specifies the number of buffers to use during a direct path load. (A LONG can span multiple buffers.) The default value is four buffers. If the number of read buffers is too small, the following error results:
ORA-02374 ... No more slots for read buffer queue
Note: Do not specify a value for READBUFFERS unless it becomes necessary, as indicated by ORA-2374. Values of READBUFFERS that are larger than necessary do not enhance performance. Instead, higher values unnecessarily increase system overhead.
Maximizing Performance of Direct Path Loads
You can control the time and temporary storage used during direct path loads.
To minimize time:
- Pre-allocate storage space.
- Perform infrequent data saves.
To minimize space:
- Disable archiving of redo log files.
- When sorting data before the load, sort data on the index that requires the most temporary storage space.
- Drop indexes and recreate them after the load.
Allocating I/O Buffers
When doing a direct path load, it is a advisable to specify a large number of buffers, if your operating system provides for that capacity. Buffers are allocated with the I/O processing options string, described .
On some systems, as many as 200 buffers are needed to keep the CPU busy. Otherwise, the CPU spends most of its time idling, waiting for I/O to complete. If you can measure CPU utilization, you will achieve maximum performance of direct loads when the number of buffers allow the CPU to operate at 95% to 98% utilization.
Additional Information: The procedure for allocating additional I/O buffers depends on your operating system. See your Oracle operating system-specific documentation for more information.
Pre-allocating Storage for Faster Loading
SQL*Loader automatically adds extents to the table if necessary, but this process takes time. For faster loads into a new table, allocate the required extents when the table is created.
To calculate the space required by a table, see Chapter 8 "Managing Database Files" in the Oracle7 Server Administrator's Guide. Then use the INITIAL or MINEXTENTS clause in the SQL command CREATE TABLE to allocate the required space.
Pre-sorting Data for Faster Indexing
You can improve the performance of direct path loads by pre-sorting your data on indexed columns. Pre-sorting minimizes temporary storage requirements during the load. Pre-sorting also allows you to take advantage of high-performance sorting routines that are optimized for your operating system or application.
If the data is pre-sorted and the existing index is not empty, then pre-sorting minimizes the amount of temporary segment space needed for the new keys. The sort routine appends each new key to the key list. Instead of requiring extra space for sorting, only space for the keys is needed. To calculate the amount of storage needed, use a sort factor of 1.0 instead of 1.3. For more information on estimating storage requirements, see "Temporary Segment Storage Requirements" on
page 8 - 8.
If pre-sorting is specified and the existing index is empty, then maximum efficiency is achieved. The sort routines are completely bypassed, with the merge phase of index creation. The new keys are simply inserted into the index. Instead of having a temporary segment and new index existing simultaneously with the empty, old index, only the new index exists. So, temporary storage is not required, and time is saved.
SORTED INDEXES Statement
The SORTED INDEXES statement identifies the indexes on which the data is presorted. This statement is allowed only for direct path loads. See Chapter 5, "SQL*Loader Control File Reference," for the syntax. See Case 6 for an illustration.
Generally, you specify only one index in the SORTED INDEXES statement because data that is sorted for one index is not usually in the right order for another index. When the data is in the same order for multiple indexes, however, all of the indexes can be specified at once.
All indexes listed in the SORTED INDEXES statement must be created before you start the direct path load.
If you specify an index in the SORTED INDEXES statement, and the data is not sorted for that index, then the index is left in direct load state at the end of the load. The data is present, but any attempt to use the index results in an error. Any index which is left in direct load state must be dropped and re-created after the load.
Multiple Column Indexes
If you specify a multiple-column index in the SORTED INDEXES statement, the data should be sorted so that it is ordered first on the first column in the index, next on the second column in the index, and so on.
For example, if the first column of the index is city, and the second column is last name; then the data should be ordered by name within each city, as in the following list:
Choosing the Best Sort Order
For the best overall performance of direct path loads, you should presort the data based on the index that requires the most temporary segment space. For example, if the primary key is one numeric column, and the secondary key consists of three text columns, then you can minimize both sort time and storage requirements by pre-sorting on the secondary key.
To determine the index that requires the most storage space, use the following procedure:
1. For each index, add up the widths of all columns in that index.
2. For a single-table load, pick the index with the largest overall width.
3. For each table in a multiple table load, identify the index with the largest, overall width for each table. If the same number of rows are to be loaded into each table, then again pick the index with the largest overall width. Usually, the same number of rows are loaded into each table.
4. If a different number of rows are to be loaded into the indexed tables in a multiple table load, then multiply the width of each index identified in step 3 by the number of rows that are to be loaded into that index. Multiply the number of rows to be loaded into each index by the width of that index and pick the index with the largest result.
Infrequent Data Saves
Frequent data saves resulting from a small ROWS value adversely affect the performance of a direct path load. Because direct path loads can be many times faster than conventional loads, the value of ROWS should be considerably higher for a direct load than it would be for a conventional load.
During a data save, loading stops until all of SQL*Loader's buffers are successfully written. You should select the largest value for ROWS that is consistent with safety. It is a good idea to determine the average time to load a row by loading a few thousand rows. Then you can use that value to select a good value for ROWS.
For example, if you can load 20,000 rows per minute, and you do not want to repeat more than 10 minutes of work after an interruption, then set ROWS to be 200,000 (20,000 rows/minute * 10 minutes).
Minimizing Use of the Redo Log
One way to speed a direct load dramatically is to minimize use of the redo log. There are two ways to do this. You can disable archiving, or you can specify that the load is UNRECOVERABLE. This section discusses both methods.
Use UNRECOVERABLE to save time and space in the redo log file. An UNRECOVERABLE load does not record loaded data in the redo log file.
Therefore, media recovery is disabled for the loaded table, although database changes by other users may continue to be logged.
Note: Because the data load is not logged, you may want to make a backup of the data after loading.
If media recovery becomes necessary on data that was loaded with the UNRECOVERABLE phrase, the data blocks that were loaded are marked as logically corrupted.
To recover the data, drop and re-create the data. It is a good idea to do backups immediately after the load to preserve the otherwise unrecoverable data.
By default, a direct path load is RECOVERABLE. See "Data Definition Language Syntax" for information on RECOVERABLE and UNRECOVERABLE.
For both the conventional path and the direct path, SQL*Loader builds all existing indexes for a table. The only way to avoid building an index is to drop it before the load and re-create it afterwards.
Dropping and re-creating indexes is one way to save temporary storage while using the direct load facility. This action minimizes the amount of space required during the load, for the following reasons:
- You can build multiple indexes one at a time, saving the temporary segment space that would otherwise be needed for each index.
This approach is quite reasonable when the number of rows to be loaded is large compared to the size of the table. But if relatively few rows are added to a large table, then the time required to re-sort the indexes may be excessive. In such cases, it is usually better to make use of the conventional path.
- Only one index segment exists when an index is built, instead of the three segments that temporarily exist when the new keys are merged into the old index to make the new index.
Direct Loads, Integrity Constraints, and Triggers
With the conventional path, arrays of data are inserted with standard SQL statements -- integrity constraints and insert triggers are automatically applied. But when loading data on the direct path, some integrity constraints and all database triggers are disabled. This section discusses the implications of using direct path loads with respect to these features.
During a direct path load, some integrity constraints are automatically disabled. Others are not. For a description of the constraints, see Chapter 5 "Maintaining Data Integrity" of the Oracle7 Server Application Developer's Guide.
The constraints that remain in force are:
Not Null constraints are checked at insertion time. Any row that violates this constraint is rejected. Unique constraints are verified when indexes are rebuilt at the end of the load. The index will be left in direct load state if a violation is detected. (Direct load state is explained .) A primary key constraint is merely a unique-constraint on a not-null column.
The following constraints are disabled:
- referential constraints (foreign keys)
When the load completes, the integrity constraints will be re-enabled automatically if the REENABLE clause is specified. The syntax for this clause is as follows:
The optional keyword DISABLED_CONSTRAINTS is provided for readability. If the EXCEPTIONS clause is included, the table must already exist and, you must be able to insert into it. This table contains the ROWIDs of all rows that violated one of the integrity constraints. It also contains the name of the constraint that was violated. See the Oracle7 Server SQL Reference for instructions on how to create an exceptions table.
If the REENABLE clause is not used, then the constraints must be re-enabled manually. All rows in the table are verified then. If Oracle finds any errors in the new data, error messages are produced. The names of violated constraints and the ROWIDs of the bad data are placed in an exceptions table, if one is specified. See ENABLE in the Oracle7 Server SQL Reference.
The SQL*Loader log file describes the constraints that were disabled, the ones that were re-enabled and what error, if any, prevented re-enabling of each constraint. It also contains the name of the exceptions table specified for each loaded table.
Warning: As long as bad data remains in the table, the integrity constraint cannot be successfully re-enabled.
Suggestion: Because referential integrity must be reverified for the entire table, performance may be improved by using the conventional path, instead of the direct path, when a small number of rows are to be loaded into a very large table.
Database Insert Triggers
Table insert triggers are also disabled when a direct path load begins. After the rows are loaded and indexes rebuilt, any triggers that were disabled are automatically re-enabled. The log file lists all triggers that were disabled for the load. There should not be any errors re-enabling triggers.
Unlike integrity constraints, insert triggers are not reapplied to the whole table when they are enabled. As a result, insert triggers do not fire for any rows loaded on the direct path. When using the direct path, the application must ensure that any behavior associated with insert triggers is carried out for the new rows.
Replacing Insert Triggers with Integrity Constraints
Applications commonly use insert triggers to implement integrity constraints. Most of these application insert triggers are simple enough that they can be replaced with Oracle's automatic integrity constraints.
When Automatic Constraints Cannot Be Used
Sometimes an insert trigger cannot be replaced with Oracle's automatic integrity constraints. For example, if an integrity check is implemented with a table lookup in an insert trigger, then automatic check constraints cannot be used, because the automatic constraints can only reference constants and columns in the current row. This section describes two methods for duplicating the effects of such a trigger.
Before either method can be used, the table must be prepared. Use the following general guidelines to prepare the table:
1. Before the load, add a one-character column to the table that marks rows as "old data" or "new data".
2. Let the value of null for this column signify "old data", because null columns do not take up space.
3. When loading, flag all loaded rows as "new data" with SQL*Loader's CONSTANT clause.
After following this procedure, all newly loaded rows are identified, making it possible to operate on the new data without affecting the old rows.
Using An Update Trigger
Generally, you can use a database update trigger to duplicate the effects of an insert trigger. This method is the simplest. It can be used whenever the insert trigger does not raise any exceptions.
1. Create an update trigger that duplicates the effects of the insert trigger.
Copy the trigger. Change all occurrences of "new.column_name" to "old.column_name".
2. Replace the current update trigger, if it exists, with the new one
3. Update the table, changing the "new data" flag to null, thereby firing the update trigger
4. Restore the original update trigger, if there was one
Note: Depending on the behavior of the trigger, it may be necessary to have exclusive update access to the table during this operation, so that other users do not inadvertently apply the trigger to rows they modify.
Duplicating the Effects of Exception Conditions
If the insert trigger can raise an exception, then more work is required to duplicate its effects. Raising an exception would prevent the row from being inserted into the table. To duplicate that effect with an update trigger, it is necessary to mark the loaded row for deletion.
The "new data" column cannot be used for a delete flag, because an update trigger cannot modify the column(s) that caused it to fire. So another column must be added to the table. This column marks the row for deletion. A null value means the row is valid. Whenever the insert trigger would raise an exception, the update trigger can mark the row as invalid by setting a flag in the additional column.
Summary: When an insert trigger can raise an exception condition, its effects can be duplicated by an update trigger, provided:
- two columns (which are usually null) are added to the table
- the table can be updated exclusively (if necessary)
Using a Stored Procedure
The following procedure always works, but it is more complex to implement. It can be used when the insert trigger raises exceptions. It does not require a second additional column; and, because it does not replace the update trigger, and it can be used without exclusive access to the table.
1. Create a stored procedure that duplicates the effects of the insert trigger. Follow the general outline given below. (For implementation details, see the PL/SQL User's Guide and Reference for more information about cursor management.)
- declare a cursor for the table, selecting all the new rows
- open it and fetch rows, one at a time, in a processing loop
- perform the operations contained in the insert trigger
- if the operations succeed, change the "new data" flag to null
- if the operations fail, change the "new data" flag to "bad data"
2. Execute the stored procedure using an administration tool such as Server Manager.
3. After running the procedure, check the table for any rows marked "bad data".
4. Update or remove the bad rows.
5. Re-enable the insert trigger.
Permanently Disabled Triggers & Constraints
SQL*Loader needs to acquire several locks on the table to be loaded to disable triggers and constraints. If a competing process is enabling triggers or constraints at the same time that SQL*Loader is trying to disable them for that table, then SQL*Loader may not be able to acquire exclusive access to the table.
SQL*Loader attempts to handle this situation as gracefully as possible. It attempts to re-enable disabled triggers and constraints before exiting. However, the same table-locking problem that made it impossible for SQL*Loader to continue may also have made it impossible for SQL*Loader to finish enabling triggers and constraints. In such cases, triggers and constraints will remain permanently disabled until they are manually enabled.
Although such a situation is unlikely, it is possible. The best way to prevent it is to make sure that no applications are running that could enable triggers or constraints for the table, while the direct load is in progress.
If a direct load is aborted due to failure to acquire the proper locks, carefully check the log. It will show every trigger and constraint that was disabled, and each attempt to re-enable them. Any triggers or constraints that were not re-enabled by SQL*Loader should be manually enabled with the ENABLE clause described in the Oracle7 Server SQL Reference.
Alternative: Partitioned Load
If triggers or integrity constraints pose a problem, but you want faster loading, you should consider a partitioned load. A partitioned load works on a multiple-CPU system. Divide the data set into separate partitions, and then load each part through different CPUs with the conventional path. The resulting load is faster than a single-CPU conventional load, although possibly not as fast as a direct load. But triggers fire, and integrity constraints are applied to the loaded rows.
Parallel Data Loading
SQL*Loader now permits multiple, concurrent sessions to perform a direct path load into the same table. Multiple SQL*Loader sessions improve the performance of a direct path load given the available resources on your system.
To load a table in parallel, the table must not be indexed.
You can only use parallel load to append rows. REPLACE, TRUNCATE, and INSERT should not be used. If you must truncate a table before a parallel load, you must do it manually.
Initiating Multiple SQL*Loader Sessions
Each SQL*Loader session takes a different source file as input. In all sessions executing a direct load on the same table, you must set PARALLEL to TRUE. The syntax is:
PARALLEL can be specified on the command line or in a parameter file. It can also be specified in the control file with the OPTIONS clause.
For example, to invoke three SQL*Loader direct path load sessions on the same table, you would execute the following commands at the operating system prompt:
SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE
SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD2.CTL DIRECT=TRUE PARALLEL=TRUE
SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD3.CTL DIRECT=TRUE PARALLEL=TRUE
The previous commands must be executed in separate sessions, or if permitted on your operating system, as separate background jobs. Note the use of multiple control files. This allows you to be flexible in specifying the files to use for the direct path load (see the example of one of the control files below).
Note: Indexes are not created during a parallel load. Any indexes must be created manually after the load completes. You can use the parallel index creation feature to speed the creation of large indexes after a parallel load.
When you perform a PARALLEL load, SQL*Loader creates temporary segments for each concurrent session and then merges the segments upon completion. The segment created from the merge is then added to the existing table in the database above the table's high water mark. The last extent used for each loader session is trimmed of any free space before being combined with the other extents of the SQL*Loader session.
It is recommended that each concurrent session use files located on different disks to allow for the maximum I/O throughput. You can specify the filename of any valid datafile in the table's tablespace with the FILE keyword of the OPTIONS clause. The following example illustrates a portion of one of the control files used for the SQL*Loader sessions in the previous example:
INSERT INTO TABLE emp
(empno POSITION(01:04) INTEGER EXTERNAL NULLIF empno=BLANKS
Note: The FILE keyword is specified in a table-specific OPTIONS clause. That is, the OPTIONS clause is placed within an INTO TABLE statement, rather than before.
Specifying Temporary Segments
You can specify the database file from which the temporary segments are allocated with the FILE keyword in the OPTIONS clause for each table in the control file. You can also specify the FILE parameter on the command line of each concurrent SQL*Loader session, but then it will globally apply to all tables being loaded with that session.
Enabling Constraints After A Parallel Direct Path Load
If REENABLE is set to TRUE, each SQL*Loader session attempts to re-enable constraints on a table after a direct path load. Each SQL*Loader session has a share lock on the table, however, so that another session cannot enable constraints before it is a finished. When the last session completes, its attempt to re-enable constraints succeeds.
Warning: There is a danger that some constraints may not be re-enabled after a direct path load, you should check the status of the constraint after completing the load to ensure that it was enabled properly.
PRIMARY and UNIQUE KEY constraints
PRIMARY KEY and UNIQUE key constraints create indexes on a table when they are enabled, and subsequently can take a significantly long time to enable after a direct path loading session if the table is very large.
You should consider enabling these constraints manually after a load (and not specify the automatic enable feature). This allows you to manually create the required indexes in parallel to save time before enabling the constraint. See the Oracle7 Server Administrator's Guide for more information about creating indexes in parallel.