The external tables feature is a complement to existing SQL*Loader functionality. It enables you to access data in external sources as if it were in a table in the database.
14.1 How Are External Tables Created?
External tables are created using the SQL
TABLE...ORGANIZATION EXTERNAL statement.
Note that SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table. See Behavior Differences Between SQL*Loader and External Tables for more information about how load behavior differs between SQL*Loader and external tables.
As of Oracle Database 12c Release 2 (126.96.36.199), you can partition data contained in external tables, which allows you to take advantage of the same performance improvements provided when you partition tables stored in a database (for example, partition pruning).
Oracle Database Administrator’s Guide for additional information about creating and managing external tables, and about partitioning them.
External tables can be used as inline external tables in SQL statements, thus eliminating the need to create an external table as a persistent database object in the data dictionary. For additional information, see Oracle Database SQL Language Reference.
When you create an external table, you specify the following attributes:
TYPE— specifies the type of external table. Each type of external table is supported by its own access driver.
ORACLE_LOADER— this is the default access driver. It loads data from external tables to internal tables. The data must come from text data files. (The
ORACLE_LOADERaccess driver cannot perform unloads; that is, it cannot move data from an internal table to an external table.)
ORACLE_DATAPUMP— this access driver can perform both loads and unloads. The data must come from binary dump files. Loads to internal tables from external tables are done by fetching from the binary dump files. Unloads from internal tables to external tables are done by populating the binary dump files of the external table. The
ORACLE_DATAPUMPaccess driver can write dump files only as part of creating an external table with the SQL
CREATE TABLE AS SELECTstatement. Once the dump file is created, it can be read any number of times, but it cannot be modified (that is, no DML operations can be performed).
ORACLE_HDFS— extracts data stored in a Hadoop Distributed File System (HDFS).
ORACLE_HIVE— extracts data stored in Apache HIVE.
DIRECTORY— specifies the default directory to use for all input and output files that do not explicitly name a directory object. The location is specified with a directory object, not a directory path. You must create the directory object before you create the external table; otherwise, an error is generated. See Location of Data Files and Output Files for more information.
PARAMETERS— describe the external data source and implement the type of external table that was specified. Each type of external table has its own access driver that provides access parameters unique to that type of external table. Access parameters are optional. See Access Parameters.
LOCATION— specifies the data files for the external table.
ORACLE_DATAPUMP, the files are named in the form
directoryportion is optional. If it is missing, then the default directory is used as the directory for the file. If you are using the
ORACLE_LOADERaccess driver, then you can use wildcards in the file name: an asterisk (*) signifies multiple characters, a question mark (?) signifies a single character.
LOCATIONclause is a list of Uniform Resource Identifiers (URIs) for a directory or for a file. There is no directory object associated with a URI.
LOCATIONclause is not used. Instead, the Hadoop HCatalog table is read to obtain information about the location of the data source (which could be a file or another database).
The following examples briefly show the use of attributes for each of the access drivers.
Example 14-1 Specifying Attributes for the ORACLE_LOADER Access Driver
The following example uses the
ORACLE_LOADER access driver to show the use of each of these attributes (it assumes that the default directory
def_dir1 already exists):
SQL> CREATE TABLE emp_load 2 (employee_number CHAR(5), 3 employee_dob CHAR(20), 4 employee_last_name CHAR(20), 5 employee_first_name CHAR(15), 6 employee_middle_name CHAR(15), 7 employee_hire_date DATE) 8 ORGANIZATION EXTERNAL 9 (TYPE ORACLE_LOADER 10 DEFAULT DIRECTORY def_dir1 11 ACCESS PARAMETERS 12 (RECORDS DELIMITED BY NEWLINE 13 FIELDS (employee_number CHAR(2), 14 employee_dob CHAR(20), 15 employee_last_name CHAR(18), 16 employee_first_name CHAR(11), 17 employee_middle_name CHAR(11), 18 employee_hire_date CHAR(10) date_format DATE mask "mm/dd/yyyy" 19 ) 20 ) 21 LOCATION ('info.dat') 22 ); Table created.
The information you provide through the access driver ensures that data from the data source is processed so that it matches the definition of the external table. The fields listed after
emp_load are actually defining the metadata for the data in the
dat source file.
Example 14-2 Specifying Attributes for the ORACLE_DATAPUMP Access Driver
This example creates an external table named
inventories_xt and populates the dump file for the external table with the data from table
inventories in the
oe sample schema.
SQL> CREATE TABLE inventories_xt 2 ORGANIZATION EXTERNAL 3 ( 4 TYPE ORACLE_DATAPUMP 5 DEFAULT DIRECTORY def_dir1 6 LOCATION ('inv_xt.dmp') 7 ) 8 AS SELECT * FROM inventories; Table created.
Example 14-3 Specifying Attributes for the ORACLE_HDFS Access Driver
CREATE TABLE sales_external ( time_id DATE NOT NULL, … amount_sold NUMBER(10,2) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_HDFS ACCESS PARAMETERS (com.oracle.bigdata.cluster=hadoop1) LOCATION (“hdfs:/usr/sales_1.csv”, “hdfs:/usr/my_sales_*.csv”) )
Example 14-4 Specifying Attributes for the ORACLE_HIVE Access Driver
CREATE TABLE sales_external ( time_id DATE NOT NULL, … amount_sold NUMBER(10,2) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE ACCESS PARAMETERS (com.oracle.bigdata.cluster=hadoop1 com.oracle.bigdata.tablename=default.ratings_hive_table) );
14.2 Location of Data Files and Output Files
Data files and output files must be located on the server. You must have a directory object that specifies the location from which to read and write files.
Note:The information in this section about directory objects does not apply to data files for the
ORACLE_HDFSaccess driver or
ORACLE_HIVEaccess driver. With the
ORACLE_HDFSdriver, the location of data is specified with a list of URIs for a directory or for a file, and there is no directory object associated with a URI. The
ORACLE_HIVEdriver does not specify a data source location; it reads the Hive metastore table to get that information, so no directory object is needed.
The access driver runs inside the database server. This behavior is different from SQL*Loader, which is a client program that sends the data to be loaded over to the server. This difference has the following implications:
The server requires access to files that the access driver can load.
The server must create and write the output files created by the access driver: the log file, bad file, discard file, and also any dump files created by the
To specify the location from which to read and write files, the access driver requires that you use a directory object. A directory object maps a name to a directory name on the file system. For example, the following statement creates a directory object named
ext_tab_dir that is mapped to a directory located at
CREATE DIRECTORY ext_tab_dir AS '/usr/apps/datafiles';
DBAs or any user can create directory objects with the
To use external tables in an Oracle Real Applications Cluster (Oracle RAC) configuration, you must ensure that the directory object path is on a cluster-wide file system.
After a directory is created, the user creating the directory object must grant
WRITE privileges on the directory to other users. These privileges must be explicitly granted, rather than assigned by using roles. For example, to allow the server to read files on behalf of user
scott in the directory named by
ext_tab_dir, the user who created the directory object must execute the following command:
GRANT READ ON DIRECTORY ext_tab_dir TO scott;
SYS user is the only user that can own directory objects, but the
SYS user can grant other users the privilege to create directory objects.
WRITE permission to a directory object means only that Oracle Database reads or writes that file on your behalf. You are not given direct access to those files outside of the Oracle database unless you have the appropriate operating system privileges. Similarly, the Oracle database requires permission from the operating system to read and write files in the directories.
14.3 Access Parameters for External Tables
To modify the default behavior of the access driver for external tables, specify access parameters.
When you create an external table of a particular type, you can specify access parameters to modify the default behavior of the access driver. Each access driver has its own syntax for access parameters. Oracle provides the following access drivers for use with external tables:
These access parameters are collectively referred to as the
opaque_format_spec in the SQL
TABLE...ORGANIZATION EXTERNAL statement. The
parameter clause allows SQL comments.
14.4 Data Type Conversion During External Table Use
Conversion errors can occur when external tables are read from and when they are written to.
Conversion Errors When Reading External Tables
When you select rows from an external table, the access driver performs any transformations necessary to make the data from the data source match the data type of the corresponding column in the external table. Depending on the data and the types of transformations required, the transformation can encounter errors.
To illustrate the types of data conversion problems that can occur when reading from an external table, suppose you create the following external table
KV_TAB_XT with two columns:
KEY whose data type is
VAL whose data type is
SQL> CREATE TABLE KV_TAB_XT (KEY, VARCHAR2(4), VAL NUMBER) 2 ORGANIZATION EXTERNAL 3 (DEFAULT DIRECTORY DEF_DIR1 LOCATION (‘key_val.csv’));
The external table
KV_TAB_XT uses default values for the access parameters, which means:
Records are delimited by new lines.
The data file and the database have the same character set.
The fields in the data file have the same name and are in the same order as the columns in the external table.
The data type of the field is
Data for each field is terminated by a comma.
The records in the data file for the
KV_TAB_XT external table should be:
A string, up to 4 bytes long. If the string is empty, then the value for the field is
A terminating comma.
A string of numeric characters. If the string is empty, then the value for this field is
An optional terminating comma.
When the access driver reads a record from the data file, it verifies that the length of the value of the
KEY field in the data file is less than or equal to 4, and it attempts to convert the value of the
VAL field in the data file to an Oracle number.
If the length of the value of the
KEY field is greater than 4 or if there is a non-numeric character in the value for
VAL, then the
ORACLE_LOADER access driver rejects the row which results in a copy of the row being written to the bad file and an error message being written to the log file.
All access drivers have to handle conversion from the data type of fields in the source for the external table and the data type for the columns of the external tables. The following are some examples of the types of conversions and checks that access drivers perform:
Convert character data from character set used by the source data to the character set used by the database.
Convert from character data to numeric data.
Convert from numeric data to character data.
Convert from character data to a date or timestamp.
Convert from a date or timestamp to character data.
Convert from character data to an interval data type.
Convert from an interval data type to a character data.
Verify that the length of data value for a character column does not exceed the length limits of that column.
When the access driver encounters an error doing the required conversion or verification, it can decide how to handle the error. When the
ORACLE_DATAPUMP access drivers encounter errors, they reject the record and write an error message to the log file. It is as if that record were not in the data source. When the
ORACLE_HIVE access drivers encounter errors, the value of the field in which the error is encountered is set to NULL. This is in keeping with the behavior of how Hive handles errors in Hadoop.
Even after the access driver has converted the data from the data source to match the data type of the external table columns, the SQL statement that is accessing the external table could require additional data type conversions. If any of these additional conversions encounter an error, then the entire statement fails. (The exception to this is if you use the DML error logging feature in the SQL statement to handle these errors.) These conversions are the same as any that might normally be required when executing a SQL statement. For example, suppose you change the definition of the
KV_TAB_XT external table to only have columns with character data types, and then you execute an
INSERT statement to load data from the external table into another table that has a
NUMBER data type for column
SQL> CREATE TABLE KV_TAB_XT (KEY VARCHAR2(20), VAL VARCHAR2(20)) 2 ORGANIZATION EXTERNAL 3 (DEFAULT DIRECTORY DEF_DIR1 LOCATION (‘key_val.csv’)); 4 CREATE TABLE KV_TAB (KEY VARCHAR2(4), VAL NUMBER); 5 INSERT INTO KV_TAB SELECT * FROM KV_TAB_XT;
In this example, the access driver will not reject a record if the data for
VAL contains a non-numeric character because the data type of
VAL in the external table is now
VARCHAR2 (instead of
NUMBER). However, SQL execution now needs to handle the conversion from character data type in
KV_TAB_XT to number data type in
KV_TAB. If there is a non-numeric character in the value for
VAL in the external table, then SQL raises a conversion error and rolls back any rows that were inserted. To avoid conversion errors in SQL execution, try to make the data types of the columns in the external table match the data types expected by other tables or functions that will be using the values of those columns.
Conversion Errors When Writing to External Tables
ORACLE_DATAPUMP access driver allows you to use a
CREATE TABLE AS SELECT statement to unload data into an external table. Data conversion occurs if the data type of a column in the
SELECT expression does not match the data type of the column in the external table. If SQL encounters an error converting the data type, then SQL aborts the statement and the data file will not be readable.
To avoid problems with conversion errors that cause the operation to fail, the data type of the column in the external table should match the data type of the column in the source table or expression used to write to the external table. This is not always possible because external tables do not support all data types. In these cases, the unsupported data types in the source table must be converted into a data type that the external table can support. The following
CREATE TABLE statement shows an example of this:
CREATE TABLE LONG_TAB_XT (LONG_COL CLOB) ORGANIZATION EXTERNAL...SELECT TO_LOB(LONG_COL) FROM LONG_TAB;
The source table named
LONG_TAB has a
LONG column, therefore the corresponding column in the external table being created,
LONG_TAB_XT, must be a
CLOB and the
SELECT subquery that is used to populate the external table must use the
TO_LOB operator to load the column.