|Oracle9i Database Utilities
Release 2 (9.2)
Part Number A96652-01
The Oracle9i external tables feature is a complement to existing SQL*Loader functionality. It allows you to access data in external sources as if it were in a table in the database.
External tables are read-only. No data manipulation language (DML) operations or index creation is allowed on an external table. Therefore, SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table. See Behavior Differences Between SQL*Loader and External Tables for more information on how load behavior differs between SQL*Loader and external tables.
To use the external tables feature, you must have some knowledge of the file format and record format of the datafiles on your platform. You must also know enough about SQL to be able to create an external table and perform queries against it.
This chapter discusses the following topics:
An external table describes how the external table layer must present the data to the server. The access driver and the external table layer transform the data in the datafile to match the external table definition.
When you create an external table of a particular type, you provide access parameters that describe the external data source. If you do not specify a type for the external table, then the
ORACLE_LOADER type is used as a default. For a description of the access parameters for the
ORACLE_LOADER type, see Chapter 12.
The description of the data in the data source is separate from the definition of the external table. This means that:
The access driver ensures that data from the data source is processed so that it matches the definition of the external table.
In the following example, a traditional table named
emp is defined along with an external table named
CREATE TABLE emp (emp_no CHAR(6), last_name CHAR(25), first_name CHAR(20), middle_initial CHAR(1)); CREATE TABLE emp_load (employee_number CHAR(5), employee_last_name CHAR(20), employee_first_name CHAR(15), employee_middle_name CHAR(15)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (RECORDS FIXED 62 FIELDS (employee_number INTEGER(2), employee_dob CHAR(20), employee_last_name CHAR(18), employee_first_name CHAR(11), employee_middle_name CHAR(11))) LOCATION ('foo.dat')); INSERT INTO emp (emp_no, first_name, middle_initial, last_name) (SELECT employee_number, employee_first_name, substr(employee_middle_name, 1, 1), employee_last_name FROM emp_load);
Note the following in the preceding example:
employee_numberfield in the datafile is converted to a character string for the
employee_numberfield in the external table.
employee_dobfield that is not loaded into any field in the table.
substrfunction is used on the
employee_middle_namecolumn in the external table to generate the value for
This section lists what external tables does not do and also describes some processing restrictions.
The access driver runs inside of the database server. This is different from SQL*Loader, which is a client program that sends the data to be loaded over to the server. This difference has the following implications:
The access driver does not allow you to specify random names for a file. This is because the server may have access to files that you do not, and allowing you to read this data would affect security. Similarly, you cannot specify a location for an output file, because the server could overwrite a file that you might not normally have privileges to delete.
Instead, you are required to specify directory objects as the locations from which to read files and write files. A directory object maps a name to a directory name on the file system. For example, the following statement creates a directory object named
Directory objects can be created by DBAs or by any user with the
DIRECTORY privilege. After a directory is created, the user creating the directory object needs to grant
WRITE permission on the directory to other users. For example, to allow the server to read files on behalf of user
scott in the directory named by
load_src, the user who created the directory object must execute the following command:
The name of the directory object can appear in the following places in a
CREATE TABLE...ORGANIZATION EXTERNAL statement:
LOCATIONclause, which lists all of the datafiles for the external table. The files are named in the form
directoryportion is optional. If it is missing, the default directory is used as the directory for the file.
directoryportion is optional. If it is missing, the default directory is used as the directory for the file. Syntax in the access parameters allows you to indicate that a particular output file should not be created. This is useful if you do not care about the output files or if you do not have write access to any directory objects.
SYS user is the only user that can own directory objects, but the
SYS user can grant other users the privilege to create directory objects. Note that
WRITE permission to a directory object only means that the Oracle database server will read or write that file on your behalf. You are not given direct access to those files outside of the Oracle database server unless you have the appropriate operating system privileges. Similarly, the Oracle database server requires permission from the operating system to read and write files in the directories.
The main use for external tables is as a row source for loading data into a real table in the database. After you create an external table, you can issue a
CREATE TABLE AS SELECT or
INSERT INTO... AS SELECT statement using the external table as the source of the
SELECT clause. Remember that external tables are read-only, so you cannot insert into them or update records in them.
When the external table is accessed through a SQL statement, the fields of the external table can be used just like any other field in a normal table. In particular, the fields can be used as arguments for any SQL built-in function, PL/SQL function, or Java function. This allows you to manipulate the data from the external source.
Although external tables cannot contain a column object, you can use constructor functions to build a column object from attributes in the external table. For example, assume a table in the database is defined as follows:
CREATE TYPE student_type AS object ( student_no CHAR(5), name CHAR(20)) / CREATE TABLE roster ( student student_type, grade CHAR(2));
Also assume there is an external table defined as follows:
CREATE TABLE roster_data ( student_no CHAR(5), name CHAR(20), grade CHAR(2)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY ',') LOCATION ('foo.dat'));
To load table
roster_data, you would specify something similar to the following:
INSERT INTO roster (student, grade) (SELECT student_type(student_no, name), grade FROM roster_data);
To enable external table support of parallel processing on the datafiles, use the
PARALLEL clause when you create the external table. The access driver attempts to divide large datafiles into chunks that can be processed separately.
The following file, record, and data characteristics make it impossible for a file to be processed in parallel:
This restriction does not apply to any datafile with a fixed number of bytes per record.
When you monitor performance, the most important measurement is the elapsed time for a load. Other important measurements are CPU usage, memory usage, and I/O rates.
You can alter performance by increasing or decreasing the degree of parallelism. The degree of parallelism indicates the number of access drivers that can be started to process the datafiles. The degree of parallelism allows you to choose on a scale between slower load with little resource usage and faster load with all resources utilized. The access driver cannot automatically tune itself, because it cannot determine how many resources you want to dedicate to the access driver.
Performance can also sometimes be increased with use of date cache functionality. By using the date cache to specify the number of unique dates anticipated during the load, you can reduce the number of date conversions done when many duplicate date or timestamp values are present in the input data. The date cache functionality provided by external tables is identical to the date cache functionality provided by SQL*Loader. See Specifying a Value for the Date Cache for a detailed description.
In addition to changing the degree of parallelism and using the date cache to improve performance, consider the following information:
DEFAULTIF) slow down processing.
The access driver takes advantage of multithreading to streamline the work as much as possible.
This section describes important differences between loading data with external tables as opposed to loading data with SQL*Loader conventional and direct path loads.
If there are multiple primary input datafiles with SQL*Loader loads, a bad file and a discard file are created for each input datafile. With external table loads, there is only one bad file and one discard file for all input datafiles. If parallel access drivers are used for the external table load, each access driver has its own bad file and discard file.
The following are not supported with external table loads:
CONCATENATEto combine multiple physical records into a single logical record.
GRAPHIC EXTERNAL, and
LONGs, nested tables,
REFs, primary key
With SQL*Loader, if the
SEQUENCE parameter is used and there are rejected rows, the rejected row still updates the sequence number value. With external tables, if the
SEQUENCE parameter is used, rejected rows do not update the sequence number value. For example, suppose you load 5 rows with sequence numbers beginning with 1 and incrementing by 1. In SQL*Loader, if rows 2 and 4 are rejected, the successfully loaded rows are assigned the sequence numbers 1, 3, and 5. In an external table load, the successfully loaded rows are assigned the sequence numbers 1, 2, and 3.
With SQL*Loader, if a primary datafile uses a Unicode character set (UTF8 or UTF16) and it also contains a byte-order mark (BOM), then the byte-order mark is written at the beginning of the corresponding bad and discard files. With external table loads, the byte-order mark is not written at the beginning of the bad and discard files.
For fields in a datafile, it is the client's NLS environment that determines the default character set and date masks. For fields in external tables, it is the server's NLS environment that determines the default character set and date masks.