Query External Data with Autonomous AI Database on Dedicated Exadata Infrastructure
Describes packages and tools to query and validate data with Autonomous AI Database on Dedicated Exadata Infrastructure.
External data is not managed by the database; however, you can use DBMS_CLOUD procedures to query your external data. Although queries on external data will not be as fast as queries on database tables, you can use this approach to quickly start running queries on your external source files and external data.
You can use DBMS_CLOUD procedures to validate the data in the external source files for an external table so that you can identify problems and either correct the data in the external table or exclude invalid data before you use the data.
Note: If you are not using ADMIN user, ensure the user has the necessary privileges for the operations the user needs to perform. See Manage Database User Privileges for more information.
Query External Data
To query data in files in the Cloud, you need to first store your object storage credentials in your Autonomous AI Database, and then create an external table using the PL/SQL procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE.
The source file in this example, channels.txt, has the following data:
S,Direct Sales,Direct
T,Tele Sales,Direct
C,Catalog,Indirect
I,Internet,Indirect
P,Partners,Others
-
Store your Cloud Object Storage credential using the
DBMS_CREDENTIAL.CREATE_CREDENTIALprocedure. For example:BEGIN DBMS_CREDENTIAL.CREATE_CREDENTIAL( credential_name => 'DEF_CRED_NAME', username => 'adb_user@oracle.com', password => 'password' ); END; /The values you provide for
usernameandpassworddepend on the Cloud Object Storage service you are using:-
Oracle Cloud Infrastructure Object Storage:
usernameis your Oracle Cloud Infrastructure user name andpasswordis your Oracle Cloud Infrastructure auth token. See Working with Auth Tokens. -
Oracle Cloud Infrastructure Object Storage Classic:
usernameis your Oracle Cloud Infrastructure Classic user name andpasswordis your Oracle Cloud Infrastructure Classic password.
This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for all data loads.
-
-
Create an external table on top of your source files using the procedure
DBMS_CLOUD.CREATE_EXTERNAL_TABLE.The procedure
DBMS_CLOUD.CREATE_EXTERNAL_TABLEsupports external files in the supported cloud object storage services. The credential is a table level property; therefore, the external files must be on the same object store.For example:
BEGIN DBMS_CLOUD.CREATE_EXTERNAL_TABLE( table_name =>'CHANNELS_EXT', credential_name =>'DEF_CRED_NAME', file_uri_list =>'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/idthydc0kinr/mybucket/channels.txt', format => json_object('delimiter' value ','), column_list => 'CHANNEL_ID NUMBER, CHANNEL_DESC VARCHAR2(20), CHANNEL_CLASS VARCHAR2(20)' ); END; /The parameters are:
-
table_name: is the external table name. -
credential_name: is the name of the credential created in the previous step. -
file_uri_list: is a comma delimited list of the source files you want to query.In this example,
file_uri_listis an Oracle Cloud Infrastructure Swift URI that specifies thechannels.txtfile in themybucketbucket in theus-phoenix-1region. (idthydc0kinris the object storage namespace in which the bucket resides.) For information about the supported URI formats, see Cloud Object Storage URI Formats. -
format: defines the options you can specify to describe the format of the source file. For information about the format options you can specify, see Format Parameter. -
column_list: is a comma delimited list of the column definitions in the source files.
You can now run queries on the external table you created in the previous step. For example:
SELECT count(*) FROM channels_ext;By default the database expects all rows in the external data file to be valid and match both the target data type definitions as well as the format definition of the file(s). If there are any rows in the source files that do not match the format options you specified, the query reports an error. You can use
formatparameter options likerejectlimitto suppress these errors. As an alternative, you can also validate the external table you created to see the error messages and the rejected rows so that you can change your format options accordingly. See Validate External Data for more information.For detailed information about the parameters, see CREATE_EXTERNAL_TABLE Procedure.
-
External Table Metadata Columns
The external table metadata helps you determine where data is coming from when you perform a query.
The external tables you create with DBMS_CLOUD.CREATE_EXTERNAL_TABLE, DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE, or DBMS_CLOUD.CREATE_HYBRID_PART_TABLE include two invisible columns file$path and file$name. These columns help identify which file a record is coming from.
-
file$path: Specifies the file path text up to the beginning of the object name. -
file$name: Specifies the object name, including all the text that follows the final “/”.
For example:
SELECT genre_id, name, file$name, file$path FROM ext_genre
WHERE rownum <= 2;
genre_id name file$name file$path
-------- --------- ----------- ----------------------
1 Action genre.csv https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/moviestream_gold/o/genre
2 Adventure genre.csv https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/moviestream_gold/o/genre
See Invisible Columns for more information on invisible columns.
Validate External Data
To validate an external table, you use the procedure DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE.
Before validating an external table you need to create the external table using the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure. Then use the DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE procedure to validate it. For example:
BEGIN
DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
table_name => 'CHANNELS_EXT' );
END;
/
This procedure scans your source files and validates them using the format options specified when you created the external table.
The validate operation, by default, scans all the rows in your source files and stops when a row is rejected. If you want to validate only a subset of the rows, use the rowcount parameter. When the rowcount parameter is set, the validate operation scans rows and stops either when a row is rejected or when the specified number of rows are validated without errors.
For example, the following validate operation scans 100 rows and stops when a row is rejected or when 100 rows are validated without errors:
BEGIN
DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
table_name => 'CHANNELS_EXT',
rowcount => 100 );
END;
/
If you do not want the validate to stop when a row is rejected and you want to see all rejected rows, set the stop_on_error parameter to FALSE. In this case VALIDATE_EXTERNAL_TABLE scans all rows and reports all rejected rows.
If you want to validate only a subset of rows use the rowcount parameter. When rowcount is set and stop_on_error is set to FALSE, the validate operation scans rows and stops either when the specified number of rows are rejected or when the specified number of rows are validated without errors. For example, the following example scans 100 rows and stops when 100 rows are rejected or when 100 rows are validated without errors:
BEGIN
DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE (
table_name => 'CHANNELS_EXT',
rowcount => 100,
stop_on_error => FALSE );
END;
/
See VALIDATE_EXTERNAL_TABLE Procedure for detailed information about DBMS_CLOUD.VALIDATE_EXTERNAL_TABLE parameters.
See View Logs for Data Validation to see the results of validate operations in the tables dba_load_operations and user_load_operations.
View Logs for Data Validation
After validating an external table, you can see the result of the validate operation by querying a load operations table:
-
dba_load_operations: shows all validate operations. -
user_load_operations: shows the validate operations in your schema.
You can use these tables to view load validation information. For example use this SELECT statement to query user_load_operations:
SELECT table_name,owner_name,type,status,start_time,update_time,logfile_table,badfile_table
FROM user_load_operations
WHERE type = 'VALIDATE';
TABLE_NAME OWNER_NAME TYPE STATUS START_TIME UPDATE_TIME LOGFILE_TABLE BADFILE_TABLE
------------- ---------- ---------- ---------- -------------- ------------- --------------- ---------------
CHANNELS_EXT SH VALIDATE COMPLETED 13-NOV-17... 13-NOV-17... VALIDATE$21_LOG VALIDATE$21_BAD
Using this SQL statement with the WHERE clause on the TYPE column displays all of the load operations with type VALIDATE.
The LOGFILE_TABLE column shows the name of the table you can query to look at the log of a validate operation. For example, the following query shows the log for this validate operation:
SELECT * FROM VALIDATE$21_LOG;
The column BADFILE_TABLE shows the name of the table you can query to look at the rows where there were errors during validation. For example, the following query shows the rejected records for the above validate operation:
SELECT * FROM VALIDATE$21_BAD;
Depending on the errors shown in the log and the rows shown in the BADFILE_TABLE, you can correct the error by dropping the external table using the DROP TABLE command and recreating it by specifying the correct format options in DBMS_CLOUD.CREATE_EXTERNAL_TABLE.
Note: The LOGFILE_TABLE and BADFILE_TABLE tables are stored for two days for each validate operation and then removed automatically.