This chapter explains the basic concepts of loading data into an Oracle database with SQL*Loader. This chapter covers the following topics:
- Direct Path and Conventional Path Load Methods
- Mapping the Data to Oracle Format
- Discarded and Rejected Records
SQL*Loader moves data from external files into tables in an Oracle database. It has many features of the DB2 Load Utility from IBM. It also has several other features that add power and flexibility.
SQL*Loader loads data in a variety of formats, performs filtering (selectively loading records based upon the data values), and loads multiple tables simultaneously.
During execution, SQL*Loader produces a detailed log file with statistics about the load. It may also produce a bad file (containing records rejected because of incorrect data) and a discard file (containing records that did not meet the specified selection criteria).
- load data from multiple datafiles of different file types
- handle fixed-format, delimited-format, and variable-length records
- manipulate data fields with SQL functions before inserting the data into database columns
- support a wide range of datatypes, including DATE, BINARY, PACKED DECIMAL, and ZONED DECIMAL
- load multiple tables during the same run, loading selected rows into each table
- combine multiple physical records into a single logical record
- treat a single physical record as multiple logical records
- generate unique, sequential key values in specified columns
- use your operating system's file or record management system to access datafiles
- load data from disk or tape
- provide thorough error reporting capabilities, so you can easily adjust and load all records
- use high-performance "direct" loads to load data directly into database files without Oracle processing. This feature is discussed in Chapter 8, "SQL*Loader Conventional and Direct Path Loads".
Direct Path Load vs. Conventional Path Load Method
SQL*Loader can use one of two methods to load data: conventional path (which uses the bind array) and direct path (which stores data directly into the database).
During conventional path loads, multiple data records are read in and placed in a bind array. When the bind array is full (or there is no more data left to read), it is passed to Oracle for insertion. Conventional path uses the Oracle SQL interface with the array option.
For more information on conventional path loads, see the section "Data Loading Methods" . For information on the bind array, see page 5 - 63.
A direct path load creates data blocks that are already in Oracle database block format. These database blocks are then written directly to the database bypassing most RDBMS processing.
This path is much faster than the conventional load, but entails several restrictions. For more information on the direct path, see the section "Data Loading Methods" .
Mapping the Data to Oracle Format
Data to be loaded into the Oracle database must exist in files on disk or on tape. These datafiles require mapping (translation to Oracle format) to be loaded by SQL*Loader. You specify how SQL*Loader interprets the data via data definitions contained in control files. The control file also is the repository for certain file management information.
The Concept of Mapping Data
SQL*Loader must be told where and in what format the data to be loaded is and how to map the data to Oracle format. Definitions in the control file will include:
- specifications for loading logical records into tables
- field condition specifications
- column and field specifications
- data field position specifications
- bind array size specifications
- specifications for setting columns to null or zero
- specifications for loading all-blank fields
- specifications for trimming blanks and tabs
- specifications to preserve whitespace
To define the specifications listed above you will use the SQL*Loader Data Definition Language (DDL).
- specifications for applying SQL operators to fields
The Data Definition Language (DDL)
The SQL*Loader data definition language (DDL) is used to specify exactly how SQL*Loader should interpret the data you are loading into the Oracle database. DDL is used to create DDL definitions which are the map that SQL*Loader uses to translate the loaded data into Oracle format.
The syntax and semantics of DDL is explained in detail in Chapter 5.
SQL*Loader DDL is upwardly compatible with the DB2 Load Utility from IBM. If you have a control file for the DB2 Load Utility, you can also use it with SQL*Loader. See Appendix C, "Notes for DB2/DXT Users," for differences in syntax.
DDL definitions can serve several purposes. Certain definitions specify data location or format. Other DDL definitions specify how SQL*Loader should map specific objects in the loaded data to comparable objects in an Oracle database. Other definitions deal with column definitions, datatype mapping and field specifications. DDL definitions are stored in control files which are read by SQL*Loader on startup.
A single DDL definition is composed of one or more keywords and the arguments and options that modify that keyword's functionality. An example of a control file containing several definitions defining how SQL*Loader should interpret a simple datafile might look like this:
INTO TABLE emp
(empno POSITION(01:04) INTEGER EXTERNAL,
ename POSITION(06:15) CHAR,
job POSITION(17:25) CHAR,
mgr POSITION(27:30) INTEGER EXTERNAL,
sal POSITION(32:39) DECIMAL EXTERNAL,
comm POSITION(41:48) DECIMAL EXTERNAL,
The keywords are LOAD DATA, INFILE, INTO TABLE, POSITION, etc. (shown in all capital letters).
The Control File
Control files contain DDL definitions. You can create a control file using your system text editor.
How you store control files depends on how your operating system organizes data. For example, in UNIX environments, control files are stored in files; in MVS environments, they can be stored as members in a partitioned dataset. They must be located where SQL*Loader has access to them.
Some DDL definitions are mandatory, such as where to find the data and how it corresponds to the database tables. However, many options are also available to describe and manipulate the file data. For example, the instructions can include directions on how to format or filter the data, or to generate unique ID numbers.
A control file can also contain the data itself as well as the DDL definitions, as shown in Case 1 , or in separate files, as shown in Case 2 . Detailed information on creating control files using DDL definitions is found .
Control File Guidelines
- The file is written in free format. That is, statements can continue from line to line with new lines beginning at any word.
- The file is written in uppercase or lowercase. Case is not significant except in strings specified with single or double quotation marks.
- Comments can be included by prefixing them with two hyphens, as in the following example:
--This is a comment
- The double hyphens can appear anywhere on a line; SQL*Loader ignores anything from the double hyphens to the end of line. Comments should not appear in the datafile or the data portion of the control file. If a double dash appears in this area, it is treated as data.
- SQL*Loader reserved words must be enclosed in quotation marks if you want to use them for table or column names. (See Appendix B for a complete list of SQL*Loader reserved words.)
SQL*Loader can load data stored in various formats.
Binary versus Character Format Data
SQL*Loader can load numeric data in binary or character format. Character format is sometimes referred to as numeric external format.
Binary data is one example of native datatypes -- datatypes that are implemented differently on different operating systems. For more information on these and other native datatypes, see "Native Datatypes" . SQL*Loader cannot handle binary data in variable record format. See also the section called "Loading Data Across Different Operating Systems" .
Data in character format can be included in both fixed-format and variable-format files. For more information on the character datatypes, see "Character Datatypes" .
Fixed versus Variable Format
Data records may be in fixed or variable format. In fixed format, the data is contained in records that all have the same (fixed) format. That is, the records have a fixed length, and the data fields in those records have fixed length, type, and position, as shown in Figure 3 - 1.
Figure 3 - 1. Fixed Format Records
In this example, columns 1 to 6 contain a character variable while columns 7 to 10 contain an integer for all the records. The fields are the same size in each record, regardless of the length of the data. The fields are fixed length, rather than variable length. In consequence, the record size is also fixed at 10 characters for each of the records.
In variable format (sometimes called stream format), each record is only as long as necessary to contain the data. Figure 3 - 2 shows variable length records containing one varying-length character fields and one fixed length integer field.
Figure 3 - 2. Variable Format Records
In addition, the type of data in each record may vary. One record may contain a character string, the next may contain seven integers, the third may contain three decimals and a float, and so on. Operating systems use a record terminator character (such as newline) to mark where variable records end.
Data in records is divided into fields. Fields can be specified with specific positions and lengths, or their position and length can vary based on delimiters.
There are two types of delimited fields: terminated and enclosed. Terminated fields are followed by a specified character (called a termination delimiter), such as the commas in the following example:
Enclosed fields are both preceded and followed by specified characters (called enclosure delimiters), such as the quotation marks in the following example:
Case 2 shows fixed-length records. Case 1 shows delimited fields. For more details on delimited data, see "Specifying Delimiters" .
Logical versus Physical Records
A final distinction concerns the difference between logical and physical records. A record or line in a file (either of fixed length or terminated) is referred to as a physical record. An operating system-dependent file/record management system, such as DEC's Record Management System (RMS) or IBM's Sequential Access Method (SAM) returns physical records.
Logical records, on the other hand, correspond to a row in a database table. Sometimes the logical and physical records are equivalent. Such is the case when only a few short columns are being loaded. However, sometimes several physical records must be combined to make one logical record. For example, you could have a file containing 24 10-character columns in a format of 80-character, fixed-length records. In this case, three physical records would constitute a single logical record.
SQL*Loader allows you to compose logical records from multiple physical records using continuation fields. Physical records are combined into a single, logical record when some condition of the continuation field is true. You can specify that a logical record should be composed of multiple, physical records in the following ways:
- A fixed number of physical records are concatenated to form a logical record (no continuation field is used).
- Physical records are appended if the continuation field contains a specified string (or another test, such as "not equal", succeeds when applied to the continuation field).
Case 4 uses continuation fields to form one logical record from multiple physical records.
- Physical records are appended if they contain a specified character as their last non-blank character.
When SQL*Loader begins execution, it creates a log file. If it cannot create a log file, execution terminates. The log file contains a detailed summary of the load, including a description of any errors that occurred during the load. For details on the information contained in the log file, see page 7 - 1. All of the case studies in Chapter 4 also contain sample log files.
To load data using the conventional load method, the tables to receive the data must already exist in the database. There are no special requirements for these tables. The tables may be clustered or indexed, or they may actually be a view for which you have insert privileges. Tables may already contain data, or they may be empty.
The following privileges are required for a conventional load:
- You must have INSERT privileges on the table to be loaded.
In addition to the above privileges, you must have write access to all labels you are loading data into a Trusted Oracle7 Server database. See the Trusted Oracle7 Server Administrator's Guide.
- You must have DELETE privilege on the table to be loaded, when using the REPLACE option to empty out the table's old data before loading the new data in its place.
Discarded and Rejected Records
Records that are read from the input file might not be inserted into the database. Figure 3 - 3 shows the stages at which records may be rejected or discarded.
Figure 3 - 3. Record Filtering
The Bad File
The bad file contains records that are rejected, either by SQL*Loader or by Oracle. Some of the possible reasons for rejection are discussed in the next sections.
Records are rejected by SQL*Loader when the input format is invalid. For example, if the second enclosure delimiter is missing, or if a delimited field exceeds its maximum length, SQL*Loader rejects the record. Rejected records are placed in the bad file. For details on how to specify the bad file, see "Specifying the Bad File" .
After a record is accepted for processing by SQL*Loader, a row is sent to Oracle for insertion. If Oracle determines that the row is valid, then the row is inserted into the database. If not, the record is rejected, and SQL*Loader puts it in the bad file. The row may be rejected, for example, because a key is not unique, because a required field is null, or because the field contains invalid data for the Oracle datatype.
The bad file is written in the same format as the datafile. So the rejected data can be loaded with the existing control file, after any necessary corrections are made.
Case 4 is an example of the use of a bad file.
As SQL*Loader executes, it may create a file called the discard file. This file is created only when it is needed, and only if you have specified that a discard file should be enabled. The discard file contains records that were filtered out of the load because they did not match any of the record-selection criteria specified in the control file.
The discard file therefore contains records that were not inserted into any table, up to a specifiable maximum. If a record's data is written to any table, it is not written to the discard file.
The discard file is written in the same format as the datafile. The discard data can be loaded with the existing control file, after any necessary editing or correcting.
Case 4 shows how the discard file is used. For more details, see "Specifying the Discard File" .
Data Conversion and Datatype Specification
Figure 3 - 4 shows the stages in which fields in the datafile are converted into columns in the database during a conventional path load. The top of the diagram shows a data record containing one or more fields. The bottom shows the database column in which the data winds up. It is important to understand the intervening steps when using SQL*Loader.
Figure 3 - 4 depicts the "division of labor" between SQL*Loader and the Oracle7 Server. The field specifications tell SQL*Loader how to interpret the format of the datafile. The Oracle7 Server then converts that data and inserts it into the database columns, using the column datatypes as a guide.
Figure 3 - 4. Field to Column Translation
Keep in mind the distinction between a field (in a datafile) and a column (in the database). It is also important to remember that the field datatypes defined in a SQL*Loader control file are not the same as the column datatypes.
SQL*Loader uses the field specifications in the control file to recognize data and creates a SQL insert statement using that data. The insert statement is then passed to the Oracle7 Server to be stored in the table. The Oracle7 Server uses the datatype of the column to convert the data into its final, stored form.
In actuality, there are two conversion steps:
1. SQL*Loader identifies a field in the datafile, interprets the data, and passes it to the Oracle7 Server.
2. The Oracle7 Server accepts the data and stores it in the database.
In Figure 3 - 5, two CHAR fields are defined for a data record. The field specifications are contained in the control file. Note that the control file CHAR specification is not the same as the database CHAR specification. A data field defined as CHAR in the control file merely tells SQL*Loader how to create the row insert. The data could then be inserted into a CHAR, VARCHAR2, or even a NUMBER column in the database, with the Oracle7 Server handling any necessary conversions.
Figure 3 - 5. Example of Field Conversion
By default, SQL*Loader removes trailing spaces from CHAR data before passing it to the database. So, in Figure 3 - 5, both field A and field B are passed to the database as three-column fields. When the data is inserted into the table, however, there is a difference.
Column A is defined in the database as a fixed-length CHAR column of length 5. So the data (aaa) is left justified in that column, which remains five characters wide. The extra space on the right is padded with blanks. Column B, however, is defined as a varying length field with a maximum length of five characters. The data for that column (bbb) is left-justified as well, but the length remains three characters.
The name of the field tells SQL*Loader what column to insert the data into. Because the first data field has been specified with the name "A" in the control file, SQL*Loader knows to insert the data into column A of the target database table.
It will be useful to keep the following points in mind:
- The name of the data field corresponds to the name of the table column into which the data is loaded.
- The datatype of the field tells SQL*Loader how to read the data in the datafile. It is not the same as the column datatype.
- Data is converted from the datatype specified in the control file to the datatype of the column in the database.