Query External Data with Autonomous Database on Dedicated Exadata Infrastructure

Describes packages and tools to query and validate data with Autonomous Database on Dedicated Exadata Infrastructure.

External data is not managed by the database; however, you can use DBMS_CLOUD procedures to query your external data. Although queries on external data will not be as fast as queries on database tables, you can use this approach to quickly start running queries on your external source files and external data.

You can use DBMS_CLOUD procedures to validate the data in the external source files for an external table so that you can identify problems and either correct the data in the external table or exclude invalid data before you use the data.

Note:

If you are not using ADMIN user, ensure the user has the necessary privileges for the operations the user needs to perform. See Manage Database User Privileges for more information.

Query External Data

To query data in files in the Cloud, you need to first store your object storage credentials in your Autonomous Database, and then create an external table using the PL/SQL procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE.

The source file in this example, channels.txt, has the following data:

S,Direct Sales,Direct
T,Tele Sales,Direct
C,Catalog,Indirect
I,Internet,Indirect
P,Partners,Others
  1. Store your Cloud Object Storage credential using the DBMS_CREDENTIAL.CREATE_CREDENTIAL procedure. For example:
    BEGIN
      DBMS_CREDENTIAL.CREATE_CREDENTIAL(
        credential_name => 'DEF_CRED_NAME',
        username => 'adb_user@oracle.com',
        password => 'password'
      );
    END;
    /

    The values you provide for username and password depend on the Cloud Object Storage service you are using:

    • Oracle Cloud Infrastructure Object Storage: username is your Oracle Cloud Infrastructure user name and password is your Oracle Cloud Infrastructure auth token. See Working with Auth Tokens.

    • Oracle Cloud Infrastructure Object Storage Classic: username is your Oracle Cloud Infrastructure Classic user name and password is your Oracle Cloud Infrastructure Classic password.

    This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for all data loads.

  2. Create an external table on top of your source files using the procedure  DBMS_CLOUD.CREATE_EXTERNAL_TABLE.

    The procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE supports external files in the supported cloud object storage services. The credential is a table level property; therefore, the external files must be on the same object store.

    For example:

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'CHANNELS_EXT',
        credential_name =>'DEF_CRED_NAME',
        file_uri_list =>'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/idthydc0kinr/mybucket/channels.txt',
        format => json_object('delimiter' value ','),
        column_list => 'CHANNEL_ID NUMBER, 
    	CHANNEL_DESC VARCHAR2(20), 
    	CHANNEL_CLASS VARCHAR2(20) );
    END;
    /
    

    The parameters are:

    • table_name: is the external table name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to query.

      In this example, file_uri_list is an Oracle Cloud Infrastructure Swift URI that specifies the channels.txt file in the mybucket bucket in the us-phoenix-1 region. (idthydc0kinr is the object storage namespace in which the bucket resides.) For information about the supported URI formats, see Cloud Object Storage URI Formats.

    • format: defines the options you can specify to describe the format of the source file. For information about the format options you can specify, see Format Parameter.

    • column_list: is a comma delimited list of the column definitions in the source files.

    You can now run queries on the external table you created in the previous step. For example:

    SELECT count(*) FROM channels_ext;

    By default the database expects all rows in the external data file to be valid and match both the target data type definitions as well as the format definition of the file(s). If there are any rows in the source files that do not match the format options you specified, the query reports an error. You can use format parameter options like rejectlimit to suppress these errors. As an alternative, you can also validate the external table you created to see the error messages and the rejected rows so that you can change your format options accordingly. See Validate External Data for more information.

    For detailed information about the parameters, see CREATE_EXTERNAL_TABLE Procedure.

Validate External Data

To validate an external table, you use the procedure DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE.

Before validating an external table you need to create the external table using the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure. Then use the DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE procedure to validate it. For example:

BEGIN
  DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
    table_name => 'CHANNELS_EXT' );
END;
/

This procedure scans your source files and validates them using the format options specified when you created the external table.

The validate operation, by default, scans all the rows in your source files and stops when a row is rejected. If you want to validate only a subset of the rows, use the rowcount parameter. When the rowcount parameter is set, the validate operation scans rows and stops either when a row is rejected or when the specified number of rows are validated without errors.

For example, the following validate operation scans 100 rows and stops when a row is rejected or when 100 rows are validated without errors:

BEGIN 
  DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
     table_name => 'CHANNELS_EXT',
     rowcount => 100 ); 
END; 
/

If you do not want the validate to stop when a row is rejected and you want to see all rejected rows, set the stop_on_error parameter to FALSE. In this case VALIDATE_EXTERNAL_TABLE scans all rows and reports all rejected rows.

If you want to validate only a subset of rows use the rowcount parameter. When rowcount is set and stop_on_error is set to FALSE, the validate operation scans rows and stops either when the specified number of rows are rejected or when the specified number of rows are validated without errors. For example, the following example scans 100 rows and stops when 100 rows are rejected or when 100 rows are validated without errors:

BEGIN 
  DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
     table_name => 'CHANNELS_EXT',
     rowcount => 100 
     stop_on_error => FALSE );
END; 
/

See VALIDATE_EXTERNAL_TABLE Procedure for detailed information about DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE parameters.

See View Logs for Data Validation to see the results of validate operations in the tables dba_load_operations and user_load_operations.

View Logs for Data Validation

After validating an external table, you can see the result of the validate operation by querying a load operations table:

  • dba_load_operations: shows all validate operations.

  • user_load_operations: shows the validate operations in your schema.

You can use these tables to view load validation information. For example use this SELECT statement to query user_load_operations:

SELECT table_name, owner_name, type, status, start_time, update_time, logfile_table, badfile_table
FROM user_load_operations
WHERE type = 'VALIDATE';

TABLE_NAME    OWNER_NAME  TYPE       STATUS     START_TIME     UPDATE_TIME    LOGFILE_TABLE    BADFILE_TABLE
------------- ----------  ---------- ---------- -------------- -------------  ---------------  ---------------
CHANNELS_EXT  SH          VALIDATE   COMPLETED  13-NOV-17...   13-NOV-17...   VALIDATE$21_LOG  VALIDATE$21_BAD

Using this SQL statement with the WHERE clause on the TYPE column displays all of the load operations with type VALIDATE.

The LOGFILE_TABLE column shows the name of the table you can query to look at the log of a validate operation. For example, the following query shows the log for this validate operation:

SELECT * FROM VALIDATE$21_LOG;

The column BADFILE_TABLE shows the name of the table you can query to look at the rows where there were errors during validation. For example, the following query shows the rejected records for the above validate operation:

SELECT * FROM VALIDATE$21_BAD;

Depending on the errors shown in the log and the rows shown in the BADFILE_TABLE, you can correct the error by dropping the external table using the DROP TABLE command and recreating it by specifying the correct format options in DBMS_CLOUD.CREATE_EXTERNAL_TABLE.

Note:

The LOGFILE_TABLE and BADFILE_TABLE tables are stored for two days for each validate operation and then removed automatically.