|Oracle9i Database Utilities
Release 1 (9.0.1)
Part Number A90192-01
The Oracle9i external tables feature is a complement to existing SQL*Loader functionality. It allows you to access data in external sources as if it were in a table in the database.
External tables are read-only. No data manipulation language (DML) operations or index creation is allowed on an external table. Therefore, SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table.
To use the external tables feature, you must have some knowledge of the file format and record format of the datafiles on your platform. You must also know enough about SQL to be able to create an external table and execute DML statements that access the external table.
This chapter discusses the following topics:
An external table describes how the external table layer must present the data to the server. This is accomplished using the external table API, which is implemented through an access driver. The access driver and the external table layer transform the data in the datafile to match the external table definition.
When you create an external table of a particular type, you provide access parameters that describe the external data source. See Chapter 12 for descriptions of these access parameters.
The description of the data in the data source is separate from the definition of the external table. This means that:
The access driver ensures that data from the data source is processed so that it matches the definition of the external table.
In the following example, a traditional table named
emp is defined along with an external table named
CREATE TABLE emp (emp_no CHAR(6), last_name CHAR(25), first_name CHAR(20), middle_initial CHAR(1)); CREATE TABLE emp_load (employee_number CHAR(5), employee_last_name CHAR(20), employee_first_name CHAR(15), employee_middle_name CHAR(15)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIXED 62 FIELDS (employee_number INTEGER(2), employee_dob CHAR(20), employee_last_name CHAR(18), employee_first_name CHAR(11), employee_middle_name CHAR(11)) LOCATION ('foo.dat')); INSERT INTO emp (emp_no, first_name, middle_initial, last_name) AS SELECT employee_number, employee_first_name, substr(employee_middle_initial, 1, 1), employee_last_name FROM emp_load;
Note the following in the preceding example:
numfield in the datafile is converted to a character string for the
employee_dobfield that is not loaded into any field in the table.
The access driver runs inside of the database server. This is different from SQL*Loader in that SQL*Loader is a client program that sends the data to be loaded over to the server. This difference has the following implications:
The access driver does not allow you to specify random names for a file. This is because the server may have access to files that you do not, and allowing you to read this data would affect security. Similarly, you cannot specify a location for an output file, because the server could overwrite a file that you might not normally have privileges to delete.
Instead, you are required to specify directory objects as the locations from which to read files and write files. A directory object maps a name to a directory name on the file system. Once a directory object is created by the system user, then the system user can grant
WRITE permission on the object. For example, the following statement creates a directory object
After a directory is created, the user creating the directory object needs to grant
WRITE permission on the directory to other users. For example, to allow the server to read files on behalf of user
scott in the directory named by
load_src, the user who created the directory object must execute the following command:
The name of the directory object can appear in the following places in a
CREATE TABLE...ORGANIZATION EXTERNAL statement:
LOCATIONclause, which lists all of the datafiles for the external table. The files are named in the form
directoryportion is optional. If it is missing, the default directory is used as the directory for the file.
directoryportion is optional. If it is missing, the default directory is used as the directory for the file. There is syntax in the access parameters that allows you to indicate that a particular output file should not be created. This is useful if you do not care about the output files or if you do not have write access to any directory objects.
SYSTEM user is the only user that can create directory objects, but the
SYSTEM user can grant other users the privilege to create directory objects. Note that
WRITE permission to a directory object only means that the Oracle database server will read or write that file on your behalf. You are not given direct access to those files outside of the Oracle database server unless you have the operating system privileges to do so. Similarly, the Oracle database server requires permission from the operating system to read and write files in the directories.
The main use for external tables is as a row source for loading data into a real table in the database. After you create an external table, you can issue a
CREATE TABLE AS SELECT or
INSERT INTO... AS SELECT statement using the external table as the source of the
SELECT clause. Remember that external tables are read-only, so you cannot insert into them or update records in them.
When the external table is accessed through a SQL statement, the fields of the external table can be used just like any other field in a normal table. In particular, the fields can be used as arguments for any SQL built-in function, PL/SQL function, or Java function. This allows you to manipulate the data from the external source.
Although external tables cannot contain a column object, you can use constructor functions to build a column object from attributes in the external table. For example, assume a table in the database is defined as follows:
CREATE TYPE student_type AS object ( student_no CHAR(5), name CHAR(20); CREATE TABLE roster ( student student_type, grade CHAR(2));
Also assume there is an external table defined as follows:
CREATE TABLE roster_data ( student_no CHAR(5), name CHAR(20), grade CHAR(2)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY ,) LOCATION ('foo.dat');
To load table
roster_data, you would specify something similar to the following:
INSERT INTO roster (student, grade) AS SELECT (student_type(student_no, name), grade) FROM roster_data;
The external table feature supports parallel processing on the datafiles. You control the degree of parallelism using the standard parallel hints and with the
PARALLEL clause when creating the external table. The degree of parallelism indicates the number of access drivers that can be started to process the datafiles. The access driver attempts to divide large datafiles into chunks that can be processed separately. The chunks are not too small (so that the overhead of managing the chunks is small) and not too big (so that all of the parallel access drivers have a chance to keep equally busy).
The following file, record, and data characteristics make it impossible for a file to be processed in parallel:
This restriction does not apply to any datafile with a fixed number of bytes per record.
When you monitor performance, the most important measurement is the elapsed time for a load. Other important measurements are CPU usage, memory usage, and I/O rates.
You can alter performance by increasing or decreasing the degree of parallelism. The degree of parallelism allows you to choose on a scale between slower load with little resource usage and faster load with all resources utilized. The access driver cannot automatically tune itself, because it cannot determine how many resources you want to dedicate to the access driver.
In addition to changing the degree of parallelism to improve performance, consider the following information:
DEFAULTIF) slow down processing.
The access driver takes advantage of multithreading to streamline the work as much as possible.