The external tables feature is a complement to existing SQL*Loader functionality. It enables you to access data in external sources as if it were in a table in the database.
14.1 How Are External Tables Created?
External tables are created using the SQL
TABLE...ORGANIZATION EXTERNAL statement.
Note that SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table. See Behavior Differences Between SQL*Loader and External Tables for more information about how load behavior differs between SQL*Loader and external tables.
As of Oracle Database 12c Release 2 (18.104.22.168), you can partition data contained in external tables, which allows you to take advantage of the same performance improvements provided when you partition tables stored in a database (for example, partition pruning).
Oracle Database Administrator’s Guide for additional information about creating and managing external tables, and about partitioning them.
External tables can be used as inline external tables in SQL statements, thus eliminating the need to create an external table as a persistent database object in the data dictionary. For additional information, see Oracle Database SQL Language Reference.
When you create an external table, you specify the following attributes:
TYPE— specifies the type of external table. Each type of external table is supported by its own access driver.
ORACLE_LOADER— this is the default access driver. It loads data from external tables to internal tables. The data must come from text data files. (The
ORACLE_LOADERaccess driver cannot perform unloads; that is, it cannot move data from an internal table to an external table.)
ORACLE_DATAPUMP— this access driver can perform both loads and unloads. The data must come from binary dump files. Loads to internal tables from external tables are done by fetching from the binary dump files. Unloads from internal tables to external tables are done by populating the binary dump files of the external table. The
ORACLE_DATAPUMPaccess driver can write dump files only as part of creating an external table with the SQL
CREATE TABLE AS SELECTstatement. Once the dump file is created, it can be read any number of times, but it cannot be modified (that is, no DML operations can be performed).
ORACLE_HDFS— extracts data stored in a Hadoop Distributed File System (HDFS).
ORACLE_HIVE— extracts data stored in Apache HIVE.
DIRECTORY— specifies the default directory to use for all input and output files that do not explicitly name a directory object. The location is specified with a directory object, not a directory path. You must create the directory object before you create the external table; otherwise, an error is generated. See Location of Data Files and Output Files for more information.
PARAMETERS— describe the external data source and implement the type of external table that was specified. Each type of external table has its own access driver that provides access parameters unique to that type of external table. Access parameters are optional. See Access Parameters.
LOCATION— specifies the data files for the external table.
ORACLE_DATAPUMP, the files are named in the form
directoryportion is optional. If it is missing, then the default directory is used as the directory for the file. If you are using the
ORACLE_LOADERaccess driver, then you can use wildcards in the file name: an asterisk (*) signifies multiple characters, a question mark (?) signifies a single character.
LOCATIONclause is a list of Uniform Resource Identifiers (URIs) for a directory or for a file. There is no directory object associated with a URI.
LOCATIONclause is not used. Instead, the Hadoop HCatalog table is read to obtain information about the location of the data source (which could be a file or another database).
The following examples briefly show the use of attributes for each of the access drivers.
Example 14-1 Specifying Attributes for the ORACLE_LOADER Access Driver
The following example uses the
ORACLE_LOADER access driver to show the use of each of these attributes (it assumes that the default directory
def_dir1 already exists):
SQL> CREATE TABLE emp_load 2 (employee_number CHAR(5), 3 employee_dob CHAR(20), 4 employee_last_name CHAR(20), 5 employee_first_name CHAR(15), 6 employee_middle_name CHAR(15), 7 employee_hire_date DATE) 8 ORGANIZATION EXTERNAL 9 (TYPE ORACLE_LOADER 10 DEFAULT DIRECTORY def_dir1 11 ACCESS PARAMETERS 12 (RECORDS DELIMITED BY NEWLINE 13 FIELDS (employee_number CHAR(2), 14 employee_dob CHAR(20), 15 employee_last_name CHAR(18), 16 employee_first_name CHAR(11), 17 employee_middle_name CHAR(11), 18 employee_hire_date CHAR(10) date_format DATE mask "mm/dd/yyyy" 19 ) 20 ) 21 LOCATION ('info.dat') 22 ); Table created.
The information you provide through the access driver ensures that data from the data source is processed so that it matches the definition of the external table. The fields listed after
emp_load are actually defining the metadata for the data in the
dat source file.
Example 14-2 Specifying Attributes for the ORACLE_DATAPUMP Access Driver
This example creates an external table named
inventories_xt and populates the dump file for the external table with the data from table
inventories in the
oe sample schema.
SQL> CREATE TABLE inventories_xt 2 ORGANIZATION EXTERNAL 3 ( 4 TYPE ORACLE_DATAPUMP 5 DEFAULT DIRECTORY def_dir1 6 LOCATION ('inv_xt.dmp') 7 ) 8 AS SELECT * FROM inventories; Table created.
Example 14-3 Specifying Attributes for the ORACLE_HDFS Access Driver
CREATE TABLE sales_external ( time_id DATE NOT NULL, … amount_sold NUMBER(10,2) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_HDFS ACCESS PARAMETERS (com.oracle.bigdata.cluster=hadoop1) LOCATION (“hdfs:/usr/sales_1.csv”, “hdfs:/usr/my_sales_*.csv”) )
Example 14-4 Specifying Attributes for the ORACLE_HIVE Access Driver
CREATE TABLE sales_external ( time_id DATE NOT NULL, … amount_sold NUMBER(10,2) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE ACCESS PARAMETERS (com.oracle.bigdata.cluster=hadoop1 com.oracle.bigdata.tablename=default.ratings_hive_table) );
14.2 Location of Data Files and Output Files
Data files and output files must be located on the server. You must have a directory object that specifies the location from which to read and write files.
Note:The information in this section about directory objects does not apply to data files for the
ORACLE_HDFSaccess driver or
ORACLE_HIVEaccess driver. With the
ORACLE_HDFSdriver, the location of data is specified with a list of URIs for a directory or for a file, and there is no directory object associated with a URI. The
ORACLE_HIVEdriver does not specify a data source location; it reads the Hive metastore table to get that information, so no directory object is needed.
The access driver runs inside the database server. This behavior is different from SQL*Loader, which is a client program that sends the data to be loaded over to the server. This difference has the following implications:
The server requires access to files that the access driver can load.
The server must create and write the output files created by the access driver: the log file, bad file, discard file, and also any dump files created by the
To specify the location from which to read and write files, the access driver requires that you use a directory object. A directory object maps a name to a directory name on the file system. For example, the following statement creates a directory object named
ext_tab_dir that is mapped to a directory located at
CREATE DIRECTORY ext_tab_dir AS '/usr/apps/datafiles';
DBAs or any user can create directory objects with the
To use external tables in an Oracle Real Applications Cluster (Oracle RAC) configuration, you must ensure that the directory object path is on a cluster-wide file system.
After a directory is created, the user creating the directory object must grant
WRITE privileges on the directory to other users. These privileges must be explicitly granted, rather than assigned by using roles. For example, to allow the server to read files on behalf of user
scott in the directory named by
ext_tab_dir, the user who created the directory object must execute the following command:
GRANT READ ON DIRECTORY ext_tab_dir TO scott;
SYS user is the only user that can own directory objects, but the
SYS user can grant other users the privilege to create directory objects.
WRITE permission to a directory object means only that Oracle Database reads or writes that file on your behalf. You are not given direct access to those files outside of the Oracle database unless you have the appropriate operating system privileges. Similarly, the Oracle database requires permission from the operating system to read and write files in the directories.
14.3 Access Parameters for External Tables
To modify the default behavior of the access driver for external tables, specify access parameters.
When you create an external table of a particular type, you can specify access parameters to modify the default behavior of the access driver. Each access driver has its own syntax for access parameters. Oracle provides the following access drivers for use with external tables:
These access parameters are collectively referred to as the
opaque_format_spec in the SQL
TABLE...ORGANIZATION EXTERNAL statement. The
parameter clause allows SQL comments.
14.4 Data Type Conversion During External Table Use
If source and target data types do not match, then conversion errors can occur when Oracle Database reads from external tables, and when it writes to external tables.
Conversion Errors When Reading External Tables
When you select rows from an external table, the access driver performs any transformations necessary to make the data from the data source match the data type of the corresponding column in the external table. Depending on the data and the types of transformations required, the transformation can encounter errors.
To illustrate the types of data conversion problems that can occur when reading from
an external table, suppose you create the following external table,
KV_TAB_XT, with two columns:
KEY, whose data
VAL, whose data type is
SQL> CREATE TABLE KV_TAB_XT (KEY VARCHAR2(4), VAL NUMBER) 2 ORGANIZATION EXTERNAL 3 (DEFAULT DIRECTORY DEF_DIR1 LOCATION (‘key_val.csv’));
The external table
KV_TAB_XT uses default values for the access
parameters. The following is therefore true:
- Records are delimited by new lines.
- The data file and the database have the same character set.
- The fields in the data file have the same name and are in the same order as the columns in the external table.
- The data type of the field is
- Data for each field is terminated by a comma.
The records in the data file for the
KV_TAB_XT external table
should have the following:
- A string, up to 4 bytes long. If the string is empty, then the value for the field
- A terminating comma.
- A string of numeric characters. If the string is empty, then the value for this
- An optional terminating comma.
When the access driver reads a record from the data file, it verifies that the
length of the value of the
KEY field in the data file is less than
or equal to 4, and it attempts to convert the value of the
field in the data file to an Oracle Database number.
If the length of the value of the
KEY field is greater than 4, or
if there is a non-numeric character in the value for
VAL, then the
ORACLE_LOADER access driver rejects the row. The result is that
a copy of the row is written to the bad file, and an error message is written to the
All access drivers must handle conversion from the data type of fields in the source for the external table and the data type for the columns of the external tables. The following are some examples of the types of conversions and checks that access drivers perform:
- Convert character data from character set used by the source data to the character set used by the database.
- Convert from character data to numeric data.
- Convert from numeric data to character data.
- Convert from character data to a date or timestamp.
- Convert from a date or timestamp to character data.
- Convert from character data to an interval data type.
- Convert from an interval data type to a character data.
- Verify that the length of data value for a character column does not exceed the length limits of that column.
When the access driver encounters an error doing the required conversion or
verification, it can decide how to handle the error. When the
drivers encounter errors, they reject the record, and write an error message to the
log file. In that event it is as if that record were not in the data source. When
ORACLE_HIVE access drivers
encounter errors, the value of the field in which the error is encountered is set to
NULL. This action is consistent with the behavior of how Hive
handles errors in Hadoop.
Even after the access driver has converted the data from the data source to match the
data type of the external table columns, the SQL statement that is accessing the
external table could require additional data type conversions. If any of these
additional conversions encounter an error, then the entire statement fails. (The
exception to this is if you use the DML error logging feature in the SQL statement
to handle these errors.) These conversions are the same as any that typically can be
required when running a SQL statement. For example, suppose you change the
definition of the
KV_TAB_XT external table to only have columns
with character data types, and then you run an
INSERT statement to
load data from the external table into another table that has a
NUMBER data type for column
SQL> CREATE TABLE KV_TAB_XT (KEY VARCHAR2(20), VAL VARCHAR2(20)) 2 ORGANIZATION EXTERNAL 3 (DEFAULT DIRECTORY DEF_DIR1 LOCATION (‘key_val.csv’)); 4 CREATE TABLE KV_TAB (KEY VARCHAR2(4), VAL NUMBER); 5 INSERT INTO KV_TAB SELECT * FROM KV_TAB_XT;
In this example, the access driver will not reject a record if the data for
VAL contains a non-numeric character, because the data type of
VAL in the external table is now
NUMBER). However, SQL processing now must handle the
conversion from character data type in
KV_TAB_XT to number data
KV_TAB. If there is a non-numeric character in the value
VAL in the external table, then SQL raises a conversion error,
and rolls back any rows that were inserted. To avoid conversion errors in SQL Oracle
recommends that you make the data types of the columns in the external table match
the data types expected by other tables or functions that will be using the values
of those columns.
Conversion Errors When Writing to External Tables
ORACLE_DATAPUMP access driver allows you to use a
TABLE AS SELECT statement to unload data into an external table. Data
conversion occurs if the data type of a column in the
expression does not match the data type of the column in the external table. If SQL
encounters an error while converting the data type, then SQL stops the statement,
and the data file will not be readable.
To avoid problems with conversion errors that cause the operation to fail, the data
type of the column in the external table should match the data type of the column in
the source table or expression used to write to the external table. This is not
always possible, because external tables do not support all data types. In these
cases, the unsupported data types in the source table must be converted into a data
type that the external table can support. The following
TABLE statement shows an example of this conversion:
CREATE TABLE LONG_TAB_XT (LONG_COL CLOB) ORGANIZATION EXTERNAL...SELECT TO_LOB(LONG_COL) FROM LONG_TAB;
The source table named
LONG_TAB has a
Because of that, the corresponding column in the external table being created,
LONG_TAB_XT, must be a
CLOB, and the
SELECT subquery that is used to populate the external table
must use the
TO_LOB operator to load the column.
All forms of
LONG data types (
VARRAW) were deprecated in Oracle8i Release 8.1.6. For succeeding
LONG data type was provided for backward
compatibility with existing applications. In new applications developed with
later releases, Oracle strongly recommends that you use
NCLOB data types for large amounts of character data.