3 Loading Data with Autonomous Data Warehouse

Describes packages and tools to load data with Autonomous Data Warehouse.

About Data Loading

You load data into Autonomous Data Warehouse using Oracle Database tools, and Oracle or other 3rd party data integration tools.

In general you load data from files local to your client computer or from files stored in a cloud-based object store. For data loading from files in the cloud, Autonomous Data Warehouse provides a new PL/SQL package, DBMS_CLOUD.

For the fastest data loading experience Oracle recommends uploading the source files to a cloud-based object store, such as Oracle Cloud Infrastructure Object Storage, before loading the data into your Autonomous Data Warehouse. Oracle provides support for loading files that are located locally in your data center, but when using this method of data loading you should factor in the transmission speeds across the Internet which may be significantly slower.

For more information on Oracle Cloud Infrastructure Object Storage, see Putting Data into Object Storage and Overview of Object Storage.

For a tutorial on data loading using Oracle Cloud Infrastructure Object Storage, see Loading Your Data.

Load Data from Files in the Cloud

The PL/SQL package DBMS_CLOUD provides support for loading data from files in the Cloud to your tables in Autonomous Data Warehouse.

Topics

The package DBMS_CLOUD supports loading from files in the following cloud services: Oracle Cloud Infrastructure Object Storage, Oracle Cloud Infrastructure Object Storage Classic, Azure Blob Storage, and Amazon S3.

Load Data from Files in the Cloud

For data loading from files in the Cloud, you need to first store your object storage credentials in your Autonomous Data Warehouse and then use the procedure DBMS_CLOUD.COPY_DATA to load data.

The source file in this example, channels.txt, has the following data:

S,Direct Sales,Direct
T,Tele Sales,Direct
C,Catalog,Indirect
I,Internet,Indirect
P,Partners,Others
  1. Store your object store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL. For example:
    SET DEFINE OFF
    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL(
        credential_name => 'DEF_CRED_NAME',
        username => 'adwc_user@oracle.com',
        password => 'password'
      );
    END;
    /

    This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for all data loads.

    For detailed information about the parameters, see CREATE_CREDENTIAL Procedure.

    Note:

    Some tools like SQL*Plus and SQL Developer use the ampersand character (&) as a special character. If you have the ampersand character in your password use the SET DEFINE OFF command in those tools as shown in the example to disable the special character and get the credential created properly.
  2. Load data into an existing table using the procedure DBMS_CLOUD.COPY_DATA. For example:
    CREATE TABLE CHANNELS
       (channel_id char(1),
        channel_desc varchar2(20),
        channel_class varchar2(20)
       );
    
    /
    
    BEGIN
     DBMS_CLOUD.COPY_DATA(
        table_name =>'CHANNELS',
        credential_name =>'DEF_CRED_NAME',
        file_uri_list =>'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/adwc/adwc_user/channels.txt',
        format => json_object('delimiter' value ',')
     );
    END;
    /
    

    The parameters are:

    • table_name: is the target table‚Äôs name.

    • credential_name: is the name of the credential created in the previous step.

    • file_uri_list: is a comma delimited list of the source files you want to load.

    • format: defines the options you can specify to describe the format of the source file.

    For detailed information about the parameters, see COPY_DATA Procedure.

Load Data – Monitor and Troubleshoot Loads

All data load operations done using the PL/SQL package DBMS_CLOUD are logged in the tables dba_load_operations and user_load_operations:

  • dba_load_operations: shows all load operations.

  • user_load_operations: shows the load operations in your schema.

Query these tables to see information about ongoing and completed data loads. For example


SELECT table_name, owner_name, type, status, start_time, update_time, logfile_table, badfile_table 
   FROM user_load_operations WHERE type = 'COPY';

TABLE_NAME OWNER_NAME  TYPE   STATUS     START_TIME              UPDATE_TIME           LOGFILE_TABLE   BADFILE_TABLE
---------- ----------- ------- ---------- ---------------------- --------------------- --------------- -------------
CHANNELS   SH          COPY   COMPLETED  06-NOV-18 01.55.19.3    06-NOV-18 01.55.28.2  COPY$21_LOG     COPY$21_BAD

Using this SELECT statement with a WHERE clause predicate on the TYPE column, shows load operations with the type COPY.

The LOGFILE_TABLE column shows the name of the table you can query to look at the log of a load operation. For example, the following query shows the log of the load operation:

select * from COPY$21_LOG;

The column BADFILE_TABLE shows the name of the table you can query to look at the rows that got errors during loading. For example, the following query shows the rejected records for the load operation:

select * from COPY$21_BAD;

Depending on the errors shown in the log and the rows shown in the specified BADFILE_TABLE table you can correct the error by specifying the correct format options in DBMS_CLOUD.COPY_DATA.

Note:

The LOGFILE_TABLE and BADFILE_TABLE tables are stored for two days for each load operation and then removed automatically.

Load Data – Delete Credentials

The PL/SQL package DBMS_CLOUD provides the ability to store your object storage credentials in the database using the procedure DBMS_CLOUD.CREATE_CREDENTIAL. You remove credentials with DBMS_CLOUD.DROP_CREDENTIAL.

For example, to remove the credential named DEF_CRED_NAME, run the following command:

BEGIN
   DBMS_CLOUD.DROP_CREDENTIAL('DEF_CRED_NAME');
END;

For more information about the DBMS_CLOUD procedures and parameters, see Summary of DBMS_CLOUD Subprograms.

Import Data Using Oracle Data Pump

Oracle Data Pump offers very fast bulk data and metadata movement between Oracle databases and Autonomous Data Warehouse.

Data Pump Import lets you import data from Data Pump files residing on the Oracle Cloud Infrastructure Object Storage, Oracle Cloud Infrastructure Object Storage Classic, and AWS S3. You can save your data to your Cloud Object Store and use Oracle Data Pump to load data to Autonomous Data Warehouse.

Export Your Existing Oracle Database to Import into Autonomous Data Warehouse

You need to use Oracle Data Pump Export to export your existing Oracle Database schemas to migrate them to Autonomous Data Warehouse using Oracle Data Pump Import.

Oracle recommends using the following Data Pump Export parameters for faster and easier migration to Autonomous Data Warehouse:

exclude=index, cluster, indextype, materialized_view, materialized_view_log, materialized_zonemap, db_link
data_options=group_partition_table_data 
parallel=n
schemas=schema name
dumpfile=export%u.dmp

Oracle Data Pump Export provides several export modes, Oracle recommends using the schema mode for migrating to Autonomous Data Warehouse. You can list the schemas you want to export by using the schemas parameter.

For a faster migration, export your schemas into multiple Data Pump files and use parallelism. You can specify the dump file name format you want to use with the dumpfile parameter. Set the parallel parameter to at least the number of CPUs you have in your Autonomous Data Warehouse database.

The exclude and data_options parameters ensure that the object types not required in Autonomous Data Warehouse are not exported and table partitions are grouped together so that they can be imported faster during the import to Autonomous Data Warehouse. If you want to migrate your existing indexes, materialized views, and materialized view logs to Autonomous Data Warehouse and manage them manually, you can remove those object types from the exclude list which will export those object types too. Similarly, if you want to migrate your existing partitioned tables as-is without converting them into non-partitioned tables and manage them manually you can remove the data_options argument which will export your partitioned tables as-is. For more information, see Managing Partitions, Indexes, and Materialized Views .

The following example exports the SH schema from a source Oracle Database for migration to an Autonomous Data Warehouse database with 16 CPUs:

expdp sh/sh@orcl \
exclude=index, cluster, indextype, materialized_view, materialized_view_log, materialized_zonemap, db_link \
data_options=group_partition_table_data  \
parallel=16 \
schemas=sh \
dumpfile=export%u.dmp

You can use other Data Pump Export parameters, like compression, depending on your requirements. For more information on Oracle Data Pump Export see Oracle Database Utilities.

Import Data Using Oracle Data Pump Version 18.3 or Later

Oracle recommends using the latest Oracle Data Pump version for importing data from Data Pump files into your Autonomous Data Warehouse as it contains enhancements and fixes for a better experience.

Download the latest version of Oracle Instant Client, which includes Oracle Data Pump, for your platform from Oracle Instant Client Downloads. See the installation instructions on the platform install download page for the installation steps required after you download Oracle Instant Client.

In Oracle Data Pump version 18.3 and later, the credential argument authenticates Data Pump to the Cloud Object Storage service you are using for your source files. The dumpfile argument is a comma delimited list of URLs for your Data Pump files.

Importing with Oracle Data Pump and Setting credential Parameter

  1. Store your Cloud Object Storage credential using DBMS_CLOUD.CREATE_CREDENTIAL. For example:
    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL(
        credential_name => 'DEF_CRED_NAME',
        username => 'adwc_user@oracle.com',
        password => 'password'
      );
    END;
    /

    For more information on the credentials for different Cloud Object Storage services, see CREATE_CREDENTIAL Procedure.

  2. Run Data Pump Import with the dumpfile parameter set to the list of file URLs on your Cloud Object Storage and the credential parameter set to the name of the credential you created in the previous step. For example:
    impdp admin/password@ADWC1_high \       
         directory=data_pump_dir \       
         credential=def_cred_name \       
         dumpfile= https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/adwc/adwc_user/export%u.dmp \
         parallel=16 \
         partition_options=merge \ 
         transform=segment_attributes:n \
         transform=dwcs_cvt_iots:y transform=constraint_use_default_index:y \
         exclude=index, cluster, indextype, materialized_view, materialized_view_log, materialized_zonemap, db_link

    For the best import performance use the HIGH database service for your import connection and set the PARALLEL parameter to the number of CPUs in your Autonomous Data Warehouse as shown in the example.

    For information on which database service name to connect to run Data Pump Import, see Managing Concurrency and Priorities on Autonomous Data Warehouse.

    For the dump file URL format for different Cloud Object Storage services, see DBMS_CLOUD Package File URI Formats.

    This example shows the recommended parameters for importing into your Autonomous Data Warehouse.

    In this example:

    • Partitioned tables are converted into non-partitioned tables.

    • Storage attributes for tables are ignored.

    • Index-organized tables are converted into regular tables.

    • Indexes created for primary key and unique key constraints will be renamed to the constraint name.

    • Indexes, clusters, indextypes, materialized views, materialized view logs, and zone maps are excluded during Data Pump Import.

    For information on disallowed objects in Autonomous Data Warehouse, see Restrictions for SQL Commands.

    For detailed information on Oracle Data Pump Import parameters see Oracle Database Utilities.

Import Data Using Oracle Data Pump (Versions 12.2.0.1 and Earlier)

You can import data from Data Pump files into your Autonomous Data Warehouse using Data Pump client versions 12.2.0.1 and earlier by setting the default_credential parameter.

Data Pump Import versions 12.2.0.1 and earlier do not have the credential parameter. If you are using an older version of Data Pump Import you need to define a default credential property for Autonomous Data Warehouse and use the default_credential keyword in the dumpfile parameter.

Importing with Older Oracle Data Pump Versions and Setting default_credential

  1. Store your Cloud Object Storage credential using DBMS_CLOUD.CREATE_CREDENTIAL. For example:
    BEGIN
      DBMS_CLOUD.CREATE_CREDENTIAL(
        credential_name => 'DEF_CRED_NAME',
        username => 'adwc_user@oracle.com',
        password => 'password'
      );
    END;
    /

    For more information on the credentials for different Cloud Object Storage services, see CREATE_CREDENTIAL Procedure.

  2. Set the credential as the default credential for your Autonomous Data Warehouse, as the ADMIN user. For example:
    alter database property set default_credential = 'ADMIN.DEF_CRED_NAME'
  3. Run Data Pump Import with the dumpfile parameter set to the list of file URLs on your Cloud Object Storage, and set the default_credential keyword. For example:
    impdp admin/password@ADWC1_high \      
         directory=data_pump_dir \
         dumpfile=default_credential:https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/adwc/adwc_user/export%u.dmp \
         parallel=16 \      
         partition_options=merge \ 
         transform=segment_attributes:n \      
         exclude=index, cluster, indextype, materialized_view, materialized_view_log, materialized_zonemap, db_link
    

    For the best import performance use the HIGH database service for your import connection and set the PARALLEL parameter to the number of CPUs in your Autonomous Data Warehouse as shown in the example.

    For information on which database service name to connect to run Data Pump Import, see Managing Concurrency and Priorities on Autonomous Data Warehouse.

    For the dump file URL format for different Cloud Object Storage services, see DBMS_CLOUD Package File URI Formats.

    This example shows the recommended parameters for importing into your Autonomous Data Warehouse.

    In this example:

    • Partitioned tables are converted into non-partitioned tables.

    • Storage attributes for tables are ignored.

    • Indexes, clusters, indextypes, materialized views, materialized view logs, and zone maps are excluded during Data Pump Import.

    For information on disallowed objects in Autonomous Data Warehouse, see Restrictions for SQL Commands.

    For detailed information on Oracle Data Pump Import parameters see Oracle Database Utilities.

Access Log Files for Data Pump Import

The log files for Data Pump Import operations are stored in the directory DATA_PUMP_DIR; this is the only directory you can specify for the data pump directory parameter.

To access the log file you need to move the log file to your Cloud Object Storage using the procedure DBMS_CLOUD.PUT_OBJECT. For example, the following PL/SQL block moves the file import.log to your Cloud Object Storage:

BEGIN
  DBMS_CLOUD.PUT_OBJECT(
    credential_name => 'DEF_CRED_NAME',
    object_uri => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/v1/adwc/adwc_user/import.log',
    directory_name  => 'DATA_PUMP_DIR',
    file_name => 'import.log');
END;
/

For more information, see Summary of DBMS_CLOUD Subprograms.

Use Oracle GoldenGate to Replicate Data to Autonomous Data Warehouse

You can replicate data to Autonomous Data Warehouse using Oracle GoldenGate On Premises and Oracle GoldenGate Cloud Service.

Load Data from Local Files Using SQL*Loader

You can use Oracle SQL*Loader to load data from local files in your client machine into Autonomous Data Warehouse.

Using SQL*Loader may be suitable for loading small amounts of data, as the load performance depends on the network bandwidth between your client and Autonomous Data Warehouse. For large amounts of data Oracle recommends loading data from the Cloud Object Storage (for information on loading from Cloud Object Store, see Load Data from Files in the Cloud).

Oracle recommends using the following SQL*Loader parameters for the best load performance:

readsize=100M
bindsize=100M
direct=N

For detailed information on SQL*Loader parameters see Oracle Database Utilities.

Autonomous Data Warehouse gathers optimizer statistics for your tables during the load operation if you use the recommended parameters. If you do not use the recommended parameters, then you need to gather optimizer statistics manually as explained in Manage Optimizer Statistics on Autonomous Data Warehouse.

For loading multiple files at the same time you can invoke a separate SQL*Loader session for each file.

For detailed information on SQL*Loader see, Oracle Database Utilities.

Managing DML Performance and Compression

Autonomous Data Warehouse uses Hybrid Columnar Compression for all tables by default. This gives the best compression ratio and optimal performance for direct-path load operations like the loads done using the DBMS_CLOUD package. If you perform DML operations like UPDATE and MERGE on your tables these may cause the compression ratio for the affected rows to decrease leading to larger table sizes. These operations may also perform slower compared to the same operations on an uncompressed table.

For the best compression ratio and optimal performance Oracle recommends using bulk operations like direct-path loads and CREATE TABLE AS SELECT statements. But, if your workload requires frequent DML operations like UPDATE and MERGE on large parts of a table, you can create those tables as uncompressed tables to achieve better DML performance. For example, the following statement creates the table SALES as an uncompressed table:

create table sales (
    prod_id             NUMBER          NOT NULL,
    cust_id             NUMBER          NOT NULL,
    time_id             DATE            NOT NULL,
    channel_id          NUMBER          NOT NULL,
    promo_id            NUMBER          NOT NULL,
    quantity_sold       NUMBER(10,2)    NOT NULL,
    amount_sold         NUMBER(10,2)    NOT NULL)
nocompress;

At any point in time you can use the ALTER TABLE MOVE statement to compress these tables without impacting queries accessing them. For example, the following statement compresses the table SALES using Hybrid Columnar Compression.

alter table sales move column store compress for query high;