Load Data from Files in the Cloud

The PL/SQL package DBMS_CLOUD provides support for loading data from files in the cloud to tables created in your Autonomous Database on Dedicated Exadata Infrastructure.

You can load data from different file formats using the following PL/SQL procedures provided by DBMS_CLOUD:
  • Text files in the cloud, using the DBMS_CLOUD.COPY_DATA procedure
  • JSON files in the cloud, using the DBMS_CLOUD.COPY_TEXT procedure
Before loading the data from the files, ensure that:
  • The source file is available as a local file in your client computer or uploaded to a cloud-based object store, such as Oracle Cloud Infrastructure Object Storage and is accessible to the database user attempting to load data.
  • Your Cloud Object Storage credentials are stored using the DBMS_CLOUD.CREATE_CREDENTIAL procedure. See Create Credentials for more information.

The package DBMS_CLOUD supports loading from files in the following cloud services: Oracle Cloud Infrastructure Object Storage, Oracle Cloud Infrastructure Object Storage Classic, Azure Blob Storage, and Amazon S3.

Create Credentials

Learn how to store your Cloud Object Storage credential using the DBMS_CLOUD.CREATE_CREDENTIAL procedure.

Execute the DBMS_CLOUD.CREATE_CREDENTIAL procedure using any database tool such as SQL*Plus, SQL Developer, or Database Actions (web based SQL Developer tool). For example:
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'DEF_CRED_NAME',
    username => 'adb_user@oracle.com',
    password => 'password'
  );
END;
/

The values you provide for username and password depend on the Cloud Object Storage service you are using:

  • Oracle Cloud Infrastructure Object Storage: username is your Oracle Cloud Infrastructure user name and password is your Oracle Cloud Infrastructure auth token. See Working with Auth Tokens.

  • Oracle Cloud Infrastructure Object Storage Classic: username is your Oracle Cloud Infrastructure Classic user name and password is your Oracle Cloud Infrastructure Classic password.

This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for all data loads.

Load Data from Text Files

Learn how to load data from text files in the cloud to your Autonomous Database using the DBMS_CLOUD.COPY_DATA procedure.

The source file in this example, channels.txt, has the following data:

S,Direct Sales,Direct
T,Tele Sales,Direct
C,Catalog,Indirect
I,Internet,Indirect
P,Partners,Others
  1. Store your Cloud Object Storage credential using the DBMS_CREDENTIAL.CREATE_CREDENTIAL procedure. See Create Credentials for more details.
  2. Create the table that will contain the data. For example:
    CREATE TABLE CHANNELS
       (channel_id CHAR(1),
        channel_desc VARCHAR2(20),
        channel_class VARCHAR2(20)
       );
    /
  3. Load data into the table using the procedure DBMS_CLOUD.COPY_DATA. For example:
    BEGIN
     DBMS_CLOUD.COPY_DATA(
        table_name =>'CHANNELS',
        credential_name =>'DEF_CRED_NAME',
        file_uri_list =>'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/idthydc0kinr/mybucket/channels.txt',
        format => json_object('delimiter' value ',')
     );
    END;
    /
    

    The parameters are:

    • table_name: is the target table’s name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to load.

      In this example, file_uri_list is an Oracle Cloud Infrastructure Swift URI that specifies the channels.txt file in the mybucket bucket in the us-phoenix-1 region. (idthydc0kinr is the object storage namespace in which the bucket resides.) For information about the supported URI formats, see Cloud Object Storage URI Formats.

    • format: defines the options you specify to describe the format of the source file. For information about the format options you can specify, see Format Parameter.

    For more detailed information, see COPY_DATA Procedure .

Load a JSON File of Delimited Documents into a Collection

Learn how to load a JSON file of delimited documents into a collection in your Autonomous Database using the DBMS_CLOUD.COPY_DATA procedure.

This example loads JSON values from a line-delimited file and uses the JSON file myCollection.json. Each value, each line, is loaded into a collection on your Autonomous Database as a single document.

Here is an example of such a file. It has three lines, with one object per line. Each of those objects gets loaded as a separate JSON document.

{ "name" : "apple", "count": 20 }
{ "name" : "orange", "count": 42 }
{ "name" : "pear", "count": 10 }
Procedure:
  1. Store your Cloud Object Storage credential using the DBMS_CLOUD.CREATE_CREDENTIAL procedure. See Create Credentials for more details.
  2. Load data into a collection using the procedure DBMS_CLOUD.COPY_DATA. For example:
    BEGIN 
      DBMS_CLOUD.COPY_COLLECTION(
        collection_name =>'fruit',
        credential_name =>'DEF_CRED_NAME',
        file_uri_list =>'https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/fruit_bucket/o/myCollection.json',
        format => json_object('recorddelimiter' value '''\n''')
     );
    END;
    /
    

    The parameters are:

    • collection_name: is the target collection’s name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to load.

      In this example, file_uri_list is an Oracle Cloud Infrastructure Swift URI that specifies the myCollection.json file in the mybucket bucket in the us-phoenix-1 region. For information about the supported URI formats, see Cloud Object Storage URI Formats.

    • format: defines the options you specify to describe the format of the source file. The format options characterset, compression, ignoreblanklines, jsonpath, maxdocsize, recorddelimiter, rejectlimit, unpackarray are supported for loading JSON data. Any other formats specified will result in an error. For information about the format options you can specify, see Format Parameter.

    For more detailed information, see COPY_COLLECTION Procedure.

Load an Array of JSON Documents into a Collection

Learn how to load an array of JSON documents into a collection in your Autonomous Database using the DBMS_CLOUD.COPY_COLLECTION procedure.

This example uses the JSON file fruit_array.json. The following shows the contents of the file fruit_array.json:

[{"name" : "apple", "count": 20 },
 {"name" : "orange", "count": 42 },
 {"name" : "pear", "count": 10 }]
Procedure:
  1. Store your Cloud Object Storage credential using the DBMS_CLOUD.CREATE_CREDENTIAL procedure. See Create Credentials for more details.
  2. Load data into a collection using the procedure DBMS_CLOUD.COPY_DATA. For example:
    BEGIN 
      DBMS_CLOUD.COPY_COLLECTION(    
        collection_name => 'fruits',    
        credential_name => 'DEF_CRED_NAME',    
        file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/json/o/fruit_array.json',
        format => '{"recorddelimiter" : "0x''01''", "unpackarrays" : "TRUE", "maxdocsize" : "10240000"}'
      );
    END;
    /

    In this example, you load a single JSON value which occupies the whole file. So, there is no need to specify a record delimiter. To indicate that there is no record delimiter, you can use a character that does not occur in the input file. For example, you can use value "0x''01''" because this character does not occur directly in JSON text.

    When unpackarrays parameter for format value is set to TRUE, the array of documents is loaded as individual documents rather than as an entire array. The unpacking of array elements is however limited to single level. If there are nested arrays in the documents, those arrays are not unpacked.

    The parameters are:

    • collection_name: is the target collection’s name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to load.

      In this example, file_uri_list is an Oracle Cloud Infrastructure Swift URI that specifies the myCollection.json file in the mybucket bucket in the us-phoenix-1 region. For information about the supported URI formats, see Cloud Object Storage URI Formats.

    • format: defines the options you specify to describe the format of the source file. The format options characterset, compression, ignoreblanklines, jsonpath, maxdocsize, recorddelimiter, rejectlimit, unpackarray are supported for loading JSON data. Any other formats specified will result in an error. For information about the format options you can specify, see Format Parameter.

    Loading fruit_array.json with DBMS_CLOUD.COPY_COLLECTION using the format option unpackarrays makes the procedure recognize array values in the source. Therefore, instead of loading the data as a single document, as it would by default, the data is loaded in the collection fruits with each value in the array as a single document.

    For more detailed information, see COPY_COLLECTION Procedure.

Monitor and Troubleshoot Data Loading

All data load operations done using the PL/SQL package DBMS_CLOUD are logged in the tables dba_load_operations and user_load_operations:

  • dba_load_operations: shows all load operations.

  • user_load_operations: shows the load operations in your schema.

Query these tables to see information about ongoing and completed data loads. For example, using a SELECT statement with a WHERE clause predicate on the TYPE column, shows load operations with the type COPY:


SELECT table_name, owner_name, type, status, start_time, update_time, logfile_table, badfile_table 
   FROM user_load_operations WHERE type = 'COPY';

TABLE_NAME OWNER_NAME  TYPE   STATUS     START_TIME                            UPDATE_TIME                          LOGFILE_TABLE   BADFILE_TABLE
---------- ----------- ------- ---------- ---------------------- --------------------- --------------- ------------- ------------- -------------
CHANNELS   SH          COPY   COMPLETED  04-MAR-21 07.38.30.522711000 AM GMT    04-MAR-21 07.38.30.522711000 AM GMT  COPY$1_LOG     COPY$1_BAD

The LOGFILE_TABLE column shows the name of the table you can query to look at the log of a load operation. For example, the following query shows the log of the load operation:

select * from COPY$21_LOG;

The BADFILE_TABLE column shows the name of the table you can query to look at the rows that got errors during loading. For example, the following query shows the rejected records for the load operation:

select * from COPY$21_BAD;

Depending on the errors shown in the log and the rows shown in the specified BADFILE_TABLE table, you can correct the error by specifying the correct format options in DBMS_CLOUD.COPY_DATA.

Note:

The LOGFILE_TABLE and BADFILE_TABLE tables are stored for two days for each load operation and then removed automatically.

See DELETE_ALL_OPERATIONS Procedure for information on clearing the user_load_operations table.