Oracle9i Database Utilities Release 1 (9.0.1) Part Number A90192-01 |
|
The Oracle9i external tables feature is a complement to existing SQL*Loader functionality. It allows you to access data in external sources as if it were in a table in the database.
External tables are read-only. No data manipulation language (DML) operations or index creation is allowed on an external table. Therefore, SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table.
To use the external tables feature, you must have some knowledge of the file format and record format of the datafiles on your platform. You must also know enough about SQL to be able to create an external table and execute DML statements that access the external table.
This chapter discusses the following topics:
An external table describes how the external table layer must present the data to the server. This is accomplished using the external table API, which is implemented through an access driver. The access driver and the external table layer transform the data in the datafile to match the external table definition.
When you create an external table of a particular type, you provide access parameters that describe the external data source. See Chapter 12 for descriptions of these access parameters.
The description of the data in the data source is separate from the definition of the external table. This means that:
The access driver ensures that data from the data source is processed so that it matches the definition of the external table.
In the following example, a traditional table named emp
is defined along with an external table named emp_load
.
CREATE TABLE emp (emp_no CHAR(6), last_name CHAR(25), first_name CHAR(20), middle_initial CHAR(1)); CREATE TABLE emp_load (employee_number CHAR(5), employee_last_name CHAR(20), employee_first_name CHAR(15), employee_middle_name CHAR(15)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIXED 62 FIELDS (employee_number INTEGER(2), employee_dob CHAR(20), employee_last_name CHAR(18), employee_first_name CHAR(11), employee_middle_name CHAR(11)) LOCATION ('foo.dat')); INSERT INTO emp (emp_no, first_name, middle_initial, last_name) AS SELECT employee_number, employee_first_name, substr(employee_middle_initial, 1, 1), employee_last_name FROM emp_load;
Note the following in the preceding example:
num
field in the datafile is converted to a character string for the employee_number
field.
employee_dob
field that is not loaded into any field in the table.
The access driver runs inside of the database server. This is different from SQL*Loader in that SQL*Loader is a client program that sends the data to be loaded over to the server. This difference has the following implications:
The access driver does not allow you to specify random names for a file. This is because the server may have access to files that you do not, and allowing you to read this data would affect security. Similarly, you cannot specify a location for an output file, because the server could overwrite a file that you might not normally have privileges to delete.
Instead, you are required to specify directory objects as the locations from which to read files and write files. A directory object maps a name to a directory name on the file system. Once a directory object is created by the system user, then the system user can grant READ
or WRITE
permission on the object. For example, the following statement creates a directory object load_src
.
create directory load_src as '/usr/apps/datafiles';
After a directory is created, the user creating the directory object needs to grant READ
or WRITE
permission on the directory to other users. For example, to allow the server to read files on behalf of user scott
in the directory named by load_src
, the user who created the directory object must execute the following command:
GRANT READ ON DIRECTORY load_src TO scott;
The name of the directory object can appear in the following places in a CREATE TABLE...ORGANIZATION EXTERNAL
statement:
LOCATION
clause, which lists all of the datafiles for the external table. The files are named in the form directory:file
. The directory
portion is optional. If it is missing, the default directory is used as the directory for the file.
directory:file
. The directory
portion is optional. If it is missing, the default directory is used as the directory for the file. There is syntax in the access parameters that allows you to indicate that a particular output file should not be created. This is useful if you do not care about the output files or if you do not have write access to any directory objects.
The SYSTEM
user is the only user that can create directory objects, but the SYSTEM
user can grant other users the privilege to create directory objects. Note that READ
or WRITE
permission to a directory object only means that the Oracle database server will read or write that file on your behalf. You are not given direct access to those files outside of the Oracle database server unless you have the operating system privileges to do so. Similarly, the Oracle database server requires permission from the operating system to read and write files in the directories.
The main use for external tables is as a row source for loading data into a real table in the database. After you create an external table, you can issue a CREATE TABLE AS SELECT
or INSERT INTO... AS SELECT
statement using the external table as the source of the SELECT
clause. Remember that external tables are read-only, so you cannot insert into them or update records in them.
When the external table is accessed through a SQL statement, the fields of the external table can be used just like any other field in a normal table. In particular, the fields can be used as arguments for any SQL built-in function, PL/SQL function, or Java function. This allows you to manipulate the data from the external source.
Although external tables cannot contain a column object, you can use constructor functions to build a column object from attributes in the external table. For example, assume a table in the database is defined as follows:
CREATE TYPE student_type AS object ( student_no CHAR(5), name CHAR(20); CREATE TABLE roster ( student student_type, grade CHAR(2));
Also assume there is an external table defined as follows:
CREATE TABLE roster_data ( student_no CHAR(5), name CHAR(20), grade CHAR(2)) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_dir ACCESS PARAMETERS (FIELDS TERMINATED BY ,) LOCATION ('foo.dat');
To load table roster
from roster_data
, you would specify something similar to the following:
INSERT INTO roster (student, grade) AS SELECT (student_type(student_no, name), grade) FROM roster_data;
The external table feature supports parallel processing on the datafiles. You control the degree of parallelism using the standard parallel hints and with the PARALLEL
clause when creating the external table. The degree of parallelism indicates the number of access drivers that can be started to process the datafiles. The access driver attempts to divide large datafiles into chunks that can be processed separately. The chunks are not too small (so that the overhead of managing the chunks is small) and not too big (so that all of the parallel access drivers have a chance to keep equally busy).
The following file, record, and data characteristics make it impossible for a file to be processed in parallel:
This restriction does not apply to any datafile with a fixed number of bytes per record.
VAR
format
When you monitor performance, the most important measurement is the elapsed time for a load. Other important measurements are CPU usage, memory usage, and I/O rates.
You can alter performance by increasing or decreasing the degree of parallelism. The degree of parallelism allows you to choose on a scale between slower load with little resource usage and faster load with all resources utilized. The access driver cannot automatically tune itself, because it cannot determine how many resources you want to dedicate to the access driver.
In addition to changing the degree of parallelism to improve performance, consider the following information:
WHEN
, NULLIF
, and DEFAULTIF
) slow down processing.
The access driver takes advantage of multithreading to streamline the work as much as possible.
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|