4 Querying External Data with Autonomous Transaction Processing

Describes packages and tools to query data with Autonomous Transaction Processing. Although queries on external data will not be as fast as queries on database tables, you can use this approach to validate and quickly start running queries on your source files.

Note:

If you are not using ADMIN user, ensure the user has the necessary privileges for the operations the user needs to perform. See Manage User Privileges with Autonomous Transaction Processing for more information.

Query External Data

To query data in files in the Cloud, you need to first store your object storage credentials in your Autonomous Transaction Processing, and then create an external table using the PL/SQL procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE.

The source file in this example, channels.txt, has the following data:

S,Direct Sales,Direct
T,Tele Sales,Direct
C,Catalog,Indirect
I,Internet,Indirect
P,Partners,Others
  1. Store your object store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL.

    For example:

    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL(
        credential_name => 'DEF_CRED_NAME',
        username => 'atpc_user@example.com',
        password => 'password'
      );
    END;
    /
    

    This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for creating external tables.

    See CREATE_CREDENTIAL Procedure for information about the username and password parameters for different object storage services.

  2. Create an external table on top of your source files using the procedure  DBMS_CLOUD.CREATE_EXTERNAL_TABLE.

    For example:

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'CHANNELS_EXT',
        credential_name =>'DEF_CRED_NAME',
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/atpc/b/atpc_user/o/channels.txt',
        format => json_object('delimiter' value ','),
        column_list => 'CHANNEL_ID NUMBER, 
    	CHANNEL_DESC VARCHAR2(20), 
    	CHANNEL_CLASS VARCHAR2(20), 
    	CHANNEL_CLASS_ID NUMBER, 
    	CHANNEL_TOTAL VARCHAR2(13), 
    	CHANNEL_TOTAL_ID NUMBER'
     );
    END;
    /
    

    The parameters are:

    • table_name: is the external table name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to query.

    • format: defines the options you can specify to describe the format of the source file.

    • column_list: is a comma delimited list of the column definitions in the source files.

    You can now run queries on the external table you created in the previous step. For example:

    SELECT count(*) FROM channels_ext;

    If there are any rows in the source files that do not match the format options you specified, the query reports an error. You can use DBMS_CLOUD parameters, like rejectlimit, to suppress these errors. As an alternative, you can also validate the external table you created to see the error messages and the rejected rows so that you can change your format options accordingly.

For detailed information about the parameters, see CREATE_EXTERNAL_TABLE Procedure.

Query External Data with Parquet or Avro Source Files

Autonomous Transaction Processing makes it easy to access Parquet or Avro data stored in object store using external tables. Parquet and Avro sources have metadata embedded in them and the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure can utilize this metadata to simplify the creation of external tables.

You don’t need to know the structure of the data, DBMS_CLOUD can examine the file and convert either Parquet or Avro types to Oracle data types. You only need to know the location of the data in object store, specify its type, Parquet or Avro, and have credentials to access the source file on your object store.

Note:

The steps to use external tables are very similar for Parquet and Avro. These steps show working with a Parquet format source file.

The source file in this example, sales_extended.parquet, contains Parquet format data. To query this file in Autonomous Transaction Processing, do the following:

  1. Store your object store credentials, to access the object store, using the procedure DBMS_CLOUD.CREATE_CREDENTIAL:
    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL(
        credential_name => 'DEF_CRED_NAME',
        username => 'atpc_user@example.com',
        password => 'password'
      );
    END;
    /
    

    This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for creating external tables.

    See CREATE_CREDENTIAL Procedure for information about the username and password parameters for different object storage services.

  2. Create an external table for Parquet or Avro on top of your source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE. By default, the columns created in the external table automatically map their data types to Oracle data types for the fields found in the source file(s) and the external table column names match the source field names:
    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'sales_extended_ext',
        credential_name =>'DEF_CRED_NAME',
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/atpc/b/atpc_user/o/sales_extended.parquet',
        format =>  '{"type":"parquet", "schema": "first"}'
     );
    END;
    /     
    

    The parameters are:

    • table_name: is the external table name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to query.

    • format: defines the options to describe the format of the source file. For a Parquet file, use the format parameter to specify the type parquet. For an Avro file use the format parameter to specify the type avro.

    By default the format schema parameter is set and the columns and data types are derived automatically and the fields in the source match the external table columns by name. Source data types are converted to the external table column Oracle data types according to the DBMS_CLOUD mapping for Parquet or Avro data types. The valid schema parameter values are:

    • first: Analyze the schema of the first Parquet or Avro file that DBMS_CLOUD finds in the specified file_uri_list (first is the default value for schema).

    • all: Analyze all the schemas for all the Parquet or Avro files found in the file_uri_list. Because these are simply files captured in an object store, there is no guarantee that each file’s metadata is the same. For example, File1 may contain a field called “address”, while File2 may be missing that field. Examining each file to derive the columns is a bit more expensive but may be required if the first file does not contain all the required fields.

    Note:

    If the column_list parameter is specified, then you provide the column names and data types for the external table and the schema value, if specified is ignored. Using column_list you can limit the columns in the external table. If column_list is not specified then the schema default value is first.
  3. You can now run queries on the external table you created in the previous step:
    DESC sales_extended_ext;
    Name           Null? Type           
    -------------- ----- -------------- 
    PROD_ID              NUMBER(10)     
    CUST_ID              NUMBER(10)     
    TIME_ID              VARCHAR2(4000) 
    CHANNEL_ID           NUMBER(10)     
    PROMO_ID             NUMBER(10)     
    QUANTITY_SOLD        NUMBER(10)     
    AMOUNT_SOLD          NUMBER(10,2)   
    GENDER               VARCHAR2(4000) 
    CITY                 VARCHAR2(4000) 
    STATE_PROVINCE       VARCHAR2(4000) 
    INCOME_LEVEL         VARCHAR2(4000)
    
    SELECT prod_id, quantity_sold, gender, city, income_level 
         FROM sales_extended_ext 
         WHERE ROWNUM < 8;
    
         PROD_ID     QUANTITY_SOLD     GENDER    CITY              INCOME_LEVEL
    1    13          1                 M         Adelaide          K: 250,000 – 299,999
    2    13          1                 M         Dolores           L: 300,000 and above
    3    13          1                 M         Cayuga            F: 110,000 – 129,999
    4    13          1                 F         Bergen op Zoom    C: 50,000 – 69,999
    5    13          1                 F         Neuss             J: 190,000 – 249,999
    6    13          1                 F         Darwin            F: 110,000 – 129,999
    7    13          1                 M         Sabadell          K:250,000 – 299,999
    

    This query shows values for rows in the external table. If you want to query this data frequently, after examining the data you can load it into a table with DBMS_CLOUD.COPY_DATA.

Validate External Data

To validate an external table you can use the procedure DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE.

Before validating an external table you need to create the external table. To create an external table use the procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE (see Query External Data for more details):

BEGIN
DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
    table_name => 'CHANNELS_EXT');
END;
/

This procedure scans your source files and validates them using the format options specified when you create the external table.

The validate operation, by default, scans all the rows in your source files and stops when a row is rejected. If you want to validate only a subset of the rows, use the rowcount parameter. When the rowcount parameter is set the validate operation scans rows and stops either when a row is rejected or when the specified number of rows are validated without errors.

For example, the following validate operation scans 100 rows and stops when a row is rejected or when 100 rows are validated without errors:

BEGIN 
DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
     table_name => 'CHANNELS_EXT',
     rowcount=>100); 
END; 
/

If you do not want the validate to stop when a row is rejected and you want to see all rejected rows, set the stop_on_error parameter to FALSE. In this case VALIDATE_EXTERNAL_TABLE scans all rows and reports all rejected rows.

If you want to validate only a subset of rows use the rowcount parameter. When rowcount is set and stop_on_error is set to FALSE, the validate operation scans rows and stops either when the specified number of rows are rejected or when the specified number of rows are validated without errors. For example, the following example scans 100 rows and stops when 100 rows are rejected or when 100 rows are validated without errors:

BEGIN 
DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
     table_name => 'CHANNELS_EXT',
     rowcount=>100 
     stop_on_error=>FALSE);
END; 
/

For detailed information about DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE parameters see Packages DBMS_CLOUD and DBMS_CLOUD_ADMIN.

To see the result of a validate operation you can query the tables dba_load_operations and user_load_operations:

  • dba_load_operations: shows all validate operations.

  • user_load_operations: shows the validate operations in your schema.

SELECT table_name, owner_name, type, status, start_time, update_time, logfile_table, badfile_table
FROM user_load_operations
WHERE type = 'VALIDATE';

TABLE_NAME    OWNER_NAME  TYPE       STATUS     START_TIME     UPDATE_TIME    LOGFILE_TABLE    BADFILE_TABLE
------------- ----------  ---------- ---------- -------------- -------------  ---------------  ---------------
CHANNELS_EXT  SH          VALIDATE   COMPLETED  13-NOV-17...   13-NOV-17...   VALIDATE$21_LOG  VALIDATE$21_BAD

Using this SQL statement with a WHERE clause on the TYPE column displays all of the load operations with type VALIDATE.

The LOGFILE_TABLE column shows the name of the table you can query to look at the log of a validate operation. For example, the following query shows the log of the above validate operation:

SELECT * FROM VALIDATE$21_LOG;

The column BADFILE_TABLE shows the name of the table you can query to look at the rows that got errors during validation. For example, the following query shows the rejected records for the above validate operation:

SELECT * FROM VALIDATE$21_BAD;

Depending on the errors shown in the log and the rows shown in the BADFILE_TABLE, you can correct the error by dropping the external table using the DROP TABLE command and recreating it by specifying the correct format options in DBMS_CLOUD.CREATE_EXTERNAL_TABLE.

Note:

The LOGFILE_TABLE and BADFILE_TABLE tables are stored for two days for each validate operation and then removed automatically.