External Tables Concepts

14 External Tables Concepts

The external tables feature is a complement to existing SQL*Loader functionality. It enables you to access data in external sources as if it were in a table in the database.

Note that SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table. See "Behavior Differences Between SQL*Loader and External Tables" for more information about how load behavior differs between SQL*Loader and external tables.

As of Oracle Database 12c Release 2 (12.2.0.1), you can partition data contained in external tables, which allows you to take advantage of the same performance improvements provided when you partition tables stored in a database (for example, partition pruning).

See the following topics:

How Are External Tables Created?
Data Type Conversion During External Table Use
Conversion errors can occur when external tables are read from and when they are written to.

14.1 How Are External Tables Created?

External tables are created using the SQL CREATE TABLE...ORGANIZATION EXTERNAL statement. When you create an external table, you specify the following attributes:

TYPE — specifies the type of external table. Each type of external table is supported by its own access driver.
- ORACLE_LOADER — this is the default access driver. It loads data from external tables to internal tables. The data must come from text data files. (The ORACLE_LOADER access driver cannot perform unloads; that is, it cannot move data from an internal table to an external table.)
- ORACLE_DATAPUMP — this access driver can perform both loads and unloads. The data must come from binary dump files. Loads to internal tables from external tables are done by fetching from the binary dump files. Unloads from internal tables to external tables are done by populating the binary dump files of the external table. The ORACLE_DATAPUMP access driver can write dump files only as part of creating an external table with the SQL CREATE TABLE AS SELECT statement. Once the dump file is created, it can be read any number of times, but it cannot be modified (that is, no DML operations can be performed).
- ORACLE_HDFS — extracts data stored in a Hadoop Distributed File System (HDFS).
- ORACLE_HIVE — extracts data stored in Apache HIVE.
DEFAULT DIRECTORY — specifies the default directory to use for all input and output files that do not explicitly name a directory object. The location is specified with a directory object, not a directory path. You must create the directory object before you create the external table; otherwise, an error is generated. See "Location of Data Files and Output Files" for more information.
ACCESS PARAMETERS — describe the external data source and implement the type of external table that was specified. Each type of external table has its own access driver that provides access parameters unique to that type of external table. Access parameters are optional. See "Access Parameters".
LOCATION — specifies the data files for the external table.
- For ORACLE_LOADER and ORACLE_DATAPUMP, the files are named in the form directory:file. The directory portion is optional. If it is missing, then the default directory is used as the directory for the file. If you are using the ORACLE_LOADER access driver, then you can use wildcards in the file name: an asterisk (*) signifies multiple characters, a question mark (?) signifies a single character.
- For ORACLE_HDFS, the LOCATION clause is a list of Uniform Resource Identifiers (URIs) for a directory or for a file. There is no directory object associated with a URI.
- For ORACLE_HIVE, the LOCATION clause is not used. Instead, the Hadoop HCatalog table is read to obtain information about the location of the data source (which could be a file or another database).

The following examples briefly show the use of attributes for each of the access drivers.

Example 14-1 Specifying Attributes for the ORACLE_LOADER Access Driver

The following example uses the ORACLE_LOADER access driver to show the use of each of these attributes (it assumes that the default directory def_dir1 already exists):

SQL> CREATE TABLE emp_load
  2    (employee_number      CHAR(5),
  3     employee_dob         CHAR(20),
  4     employee_last_name   CHAR(20),
  5     employee_first_name  CHAR(15),
  6     employee_middle_name CHAR(15),
  7     employee_hire_date   DATE)
  8  ORGANIZATION EXTERNAL
  9    (TYPE ORACLE_LOADER
 10     DEFAULT DIRECTORY def_dir1
 11     ACCESS PARAMETERS
 12       (RECORDS DELIMITED BY NEWLINE
 13        FIELDS (employee_number      CHAR(2),
 14                employee_dob         CHAR(20),
 15                employee_last_name   CHAR(18),
 16                employee_first_name  CHAR(11),
 17                employee_middle_name CHAR(11),
 18                employee_hire_date   CHAR(10) date_format DATE mask "mm/dd/yyyy"
 19               )
 20       )
 21     LOCATION ('info.dat')
 22    );
 
Table created.

The information you provide through the access driver ensures that data from the data source is processed so that it matches the definition of the external table. The fields listed after CREATE TABLE emp_load are actually defining the metadata for the data in the info.dat source file.

Example 14-2 Specifying Attributes for the ORACLE_DATAPUMP Access Driver

This example creates an external table named inventories_xt and populates the dump file for the external table with the data from table inventories in the oe sample schema.

SQL> CREATE TABLE inventories_xt
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY def_dir1
6 LOCATION ('inv_xt.dmp')
7 )
8 AS SELECT * FROM inventories;
Table created.

Example 14-3 Specifying Attributes for the ORACLE_HDFS Access Driver

CREATE TABLE sales_external
(   time_id           DATE NOT NULL, …
    amount_sold       NUMBER(10,2)
)
ORGANIZATION EXTERNAL 
(  TYPE ORACLE_HDFS
   ACCESS PARAMETERS (com.oracle.bigdata.cluster=hadoop1)
   LOCATION (“hdfs:/usr/sales_1.csv”, “hdfs:/usr/my_sales_*.csv”)
)

Example 14-4 Specifying Attributes for the ORACLE_HIVE Access Driver

CREATE TABLE sales_external
(   time_id           DATE NOT NULL, …
    amount_sold       NUMBER(10,2)
)
ORGANIZATION EXTERNAL 
(   TYPE ORACLE_HIVE
    ACCESS PARAMETERS (com.oracle.bigdata.cluster=hadoop1        
                       com.oracle.bigdata.tablename=default.ratings_hive_table)
);

Parent topic: External Tables Concepts

14.1.1 Location of Data Files and Output Files

Note:

The information in this section about directory objects does not apply to data files for the ORACLE_HDFS access driver or ORACLE_HIVE access driver. With the ORACLE_HDFS driver, the location of data is specified with a list of URIs for a directory or for a file, and there is no directory object associated with a URI. The ORACLE_HIVE driver does not specify a data source location; it reads the Hive metastore table to get that information, so no directory object is needed.

The access driver runs inside the database server. This is different from SQL*Loader, which is a client program that sends the data to be loaded over to the server. This difference has the following implications:

The server must have access to any files to be loaded by the access driver.
The server must create and write the output files created by the access driver: the log file, bad file, discard file, and also any dump files created by the ORACLE_DATAPUMP access driver.

The access driver requires that a directory object be used to specify the location from which to read and write files. A directory object maps a name to a directory name on the file system. For example, the following statement creates a directory object named ext_tab_dir that is mapped to a directory located at /usr/apps/datafiles.

CREATE DIRECTORY ext_tab_dir AS '/usr/apps/datafiles';

Directory objects can be created by DBAs or by any user with the CREATE ANY DIRECTORY privilege.

Note:

To use external tables in an Oracle Real Applications Cluster (Oracle RAC) configuration, you must ensure that the directory object path is on a cluster-wide file system.

After a directory is created, the user creating the directory object needs to grant READ and WRITE privileges on the directory to other users. These privileges must be explicitly granted, rather than assigned through the use of roles. For example, to allow the server to read files on behalf of user scott in the directory named by ext_tab_dir, the user who created the directory object must execute the following command:

GRANT READ ON DIRECTORY ext_tab_dir TO scott;

The SYS user is the only user that can own directory objects, but the SYS user can grant other users the privilege to create directory objects. Note that READ or WRITE permission to a directory object means only that the Oracle database will read or write that file on your behalf. You are not given direct access to those files outside of the Oracle database unless you have the appropriate operating system privileges. Similarly, the Oracle database requires permission from the operating system to read and write files in the directories.

Parent topic: How Are External Tables Created?

14.1.2 Access Parameters

When you create an external table of a particular type, you can specify access parameters to modify the default behavior of the access driver. Each access driver has its own syntax for access parameters. Oracle provides the following access drivers for use with external tables: ORACLE_LOADER , ORACLE_DATAPUMP, ORACLE_HDFS, and ORACLE_HIVE.

Note:

These access parameters are collectively referred to as the opaque_format_spec in the SQL CREATE TABLE...ORGANIZATION EXTERNAL statement.

14.2 Data Type Conversion During External Table Use

Conversion errors can occur when external tables are read from and when they are written to.

Conversion Errors When Reading External Tables

When you select rows from an external table, the access driver performs any transformations necessary to make the data from the data source match the data type of the corresponding column in the external table. Depending on the data and the types of transformations required, the transformation can encounter errors.

To illustrate the types of data conversion problems that can occur when reading from an external table, suppose you create the following external table KV_TAB_XT with two columns: KEY whose data type is VARCHAR2(4) and VAL whose data type is NUMBER.

SQL> CREATE TABLE KV_TAB_XT (KEY, VARCHAR2(4), VAL NUMBER)
2 ORGANIZATION EXTERNAL
3 (DEFAULT DIRECTORY DEF_DIR1 LOCATION (‘key_val.csv’));

The external table KV_TAB_XT uses default values for the access parameters, which means:

Records are delimited by new lines.
The data file and the database have the same character set.
The fields in the data file have the same name and are in the same order as the columns in the external table.
The data type of the field is CHAR(255).
Data for each field is terminated by a comma.

The records in the data file for the KV_TAB_XT external table should be:

A string, up to 4 bytes long. If the string is empty, then the value for the field is NULL.
A terminating comma.
A string of numeric characters. If the string is empty, then the value for this field is NULL.
An optional terminating comma.

When the access driver reads a record from the data file, it verifies that the length of the value of the KEY field in the data file is less than or equal to 4, and it attempts to convert the value of the VAL field in the data file to an Oracle number.

If the length of the value of the KEY field is greater than 4 or if there is a non-numeric character in the value for VAL, then the ORACLE_LOADER access driver rejects the row which results in a copy of the row being written to the bad file and an error message being written to the log file.

All access drivers have to handle conversion from the data type of fields in the source for the external table and the data type for the columns of the external tables. The following are some examples of the types of conversions and checks that access drivers perform:

Convert character data from character set used by the source data to the character set used by the database.
Convert from character data to numeric data.
Convert from numeric data to character data.
Convert from character data to a date or timestamp.
Convert from a date or timestamp to character data.
Convert from character data to an interval data type.
Convert from an interval data type to a character data.
Verify that the length of data value for a character column does not exceed the length limits of that column.

When the access driver encounters an error doing the required conversion or verification, it can decide how to handle the error. When the ORACLE_LOADER and ORACLE_DATAPUMP access drivers encounter errors, they reject the record and write an error message to the log file. It is as if that record were not in the data source. When the ORACLE_HDFS and ORACLE_HIVE access drivers encounter errors, the value of the field in which the error is encountered is set to NULL. This is in keeping with the behavior of how Hive handles errors in Hadoop.

Even after the access driver has converted the data from the data source to match the data type of the external table columns, the SQL statement that is accessing the external table could require additional data type conversions. If any of these additional conversions encounter an error, then the entire statement fails. (The exception to this is if you use the DML error logging feature in the SQL statement to handle these errors.) These conversions are the same as any that might normally be required when executing a SQL statement. For example, suppose you change the definition of the KV_TAB_XT external table to only have columns with character data types, and then you execute an INSERT statement to load data from the external table into another table that has a NUMBER data type for column VAL:

SQL> CREATE TABLE KV_TAB_XT (KEY VARCHAR2(20), VAL VARCHAR2(20))
2 ORGANIZATION EXTERNAL
3 (DEFAULT DIRECTORY DEF_DIR1 LOCATION (‘key_val.csv’));
4 CREATE TABLE KV_TAB (KEY VARCHAR2(4), VAL NUMBER);
5 INSERT INTO KV_TAB SELECT * FROM KV_TAB_XT;

In this example, the access driver will not reject a record if the data for VAL contains a non-numeric character because the data type of VAL in the external table is now VARCHAR2 (instead of NUMBER). However, SQL execution now needs to handle the conversion from character data type in KV_TAB_XT to number data type in KV_TAB. If there is a non-numeric character in the value for VAL in the external table, then SQL raises a conversion error and rolls back any rows that were inserted. To avoid conversion errors in SQL execution, try to make the data types of the columns in the external table match the data types expected by other tables or functions that will be using the values of those columns.

Conversion Errors When Writing to External Tables

The ORACLE_DATAPUMP access driver allows you to use a CREATE TABLE AS SELECT statement to unload data into an external table. Data conversion occurs if the data type of a column in the SELECT expression does not match the data type of the column in the external table. If SQL encounters an error converting the data type, then SQL aborts the statement and the data file will not be readable.

To avoid problems with conversion errors that cause the operation to fail, the data type of the column in the external table should match the data type of the column in the source table or expression used to write to the external table. This is not always possible because external tables do not support all data types. In these cases, the unsupported data types in the source table must be converted into a data type that the external table can support. The following CREATE TABLE statement shows an example of this:

CREATE TABLE LONG_TAB_XT (LONG_COL CLOB) 
ORGANIZATION EXTERNAL...SELECT TO_LOB(LONG_COL) FROM LONG_TAB;

The source table named LONG_TAB has a LONG column, therefore the corresponding column in the external table being created, LONG_TAB_XT, must be a CLOB and the SELECT subquery that is used to populate the external table must use the TO_LOB operator to load the column.

Parent topic: External Tables Concepts