DBMS_CLOUD_PIPELINE Package

The DBMS_CLOUD_PIPELINE package allows you to create data pipelines for loading and exporting data in the cloud. This package supports continuous incremental data load of files in object store into the database. DBMS_CLOUD_PIPELINE also supports continuous incremental export of table data or query results from the database to object store based on a timestamp column.

Prerequisites

As a developer, you can use DBMS_CLOUD procedures with Autonomous Databases deployed on Oracle Public Cloud or Exadata Cloud@Customer.

Depending on the deployment choice, the following prerequisites must be met to use the DBMS_CLOUD procedures with Amazon S3, Azure Blob Storage, and Google Cloud Storage service providers.

An outbound connectivity must have been configured using a NAT gateway, by your fleet administrator as described below:
  • Create a NAT gateway in the Virtual Cloud Network (VCN) where your Autonomous Database resources reside by following the instructions in Create a NAT Gateway in Oracle Cloud Infrastructure Documentation.
  • After creating the NAT gateway, add a route rule and an egress security rule to each subnet (in the VCN) where Autonomous Database resources reside so that these resources can use the gateway to obtain a public key from your Azure AD instance:
    1. Go to the Subnet Details page for the subnet.
    2. In the Subnet Information tab, click the name of the subnet's Route Table to display its Route Table Details page.
    3. In the table of existing Route Rules, check whether there is already a rule with the following characteristics:
      • Destination: 0.0.0.0/0
      • Target Type: NAT Gateway
      • Target: The name of the NAT gateway you just created in the VCN

      If such a rule does not exist, click Add Route Rules and add a route rule with these characteristics.

    4. Return to the Subnet Details page for the subnet.
    5. In the subnet's Security Lists table, click the name of the subnet's security list to display its Security List Details page.
    6. In the side menu, under Resources, click Egress Rules.
    7. In the table of existing Egress Rules, check whether there is already a rule with the following characteristics:
      • Destination Type: CIDR
      • Destination: 0.0.0.0/0
      • IP Protocol: TCP
      • Source Port Range: 443
      • Destination Port Range: All

      If such a rule does not exist, click Add Egress Rules and add an egress rule with these characteristics.

The HTTP Proxy settings in your environment must allow the database to access the cloud service provider.

These settings are defined by your fleet administrator while creating the Exadata Cloud@Customer infrastructure as described in Using the Console to Provision Exadata Database Service on Cloud@Customer .

Note:

The network configuration including the HTTP Proxy can only be edited until the Exadata Infrastructure is in Requires Activation state. Once it is activated, you cannot edit those settings.

Setting up an HTTP Proxy for an already provisioned Exadata Infrastructure needs a Service Request (SR) in My Oracle Support. See Create a Service Request in My Oracle Support for details.

Summary of DBMS_CLOUD_PIPELINE Subprograms

This table summarizes the subprograms included in the DBMS_CLOUD_PIPELINE package.

Note:

You can use the DBMS_CLOUD_PIPELINE package only with Autonomous Database versions 19.22 or later.
Subprogram Description

CREATE_PIPELINE Procedure

Creates a new data pipeline.

DROP_PIPELINE Procedure

Drops an existing data pipeline.

RESET_PIPELINE Procedure

Resets the tracking state of a data pipeline. Use reset pipeline to restart the pipeline from the initial state of data load or export. Optionally reset pipeline can purge data in the database or in object store, depending on the type of pipeline.

RUN_PIPELINE_ONCE Procedure

Performs an on-demand run of the pipeline in the current foreground session, instead of a scheduled job.

SET_ATTRIBUTE Procedure

Sets pipeline attributes. There are two overloaded procedures, one to set a single attribute and another to set multiple attributes using a JSON document of attribute name/value pairs

START_PIPELINE Procedure

Starts the data pipeline. When a pipeline is started, the pipeline operation will continuously run in a scheduled job according to the "interval" configured in pipeline attributes.

STOP_PIPELINE Procedure

Stops the data pipeline. When a pipeline is stopped, no future jobs are scheduled for the pipeline.

CREATE_PIPELINE Procedure

The procedure creates a new data pipeline.

Syntax

DBMS_CLOUD_PIPELINE.CREATE_PIPELINE(
      pipeline_name        IN   VARCHAR2,
      pipeline_type        IN   VARCHAR2,
      attributes           IN   CLOB        DEFAULT NULL,
      description          IN   VARCHAR2    DEFAULT NULL
);

Parameters

Parameter Description

pipeline_name

Specifies a name for the pipeline. The pipeline name must follow the naming rules of Oracle SQL identifiers. See Identifiers for more information.

This parameter is mandatory.

pipeline_type

Specifies the pipeline type.

Valid values: LOAD, EXPORT

This parameter is mandatory.

attributes

Pipeline attributes in JSON format.

Default value: NULL

See DBMS_CLOUD_PIPELINE Attributes for more information.

description

Description for the pipeline.

Default value: NULL

DROP_PIPELINE Procedure

The procedure drops an existing data pipeline. If a pipeline has been started, then it must be stopped before it can be dropped.

Syntax

DBMS_CLOUD_PIPELINE.DROP_PIPELINE(
       pipeline_name        IN   VARCHAR2,
       force                IN   BOOLEAN DEFAULT FALSE
 );

Parameters

Parameter Description

pipeline_name

Specifies a pipeline name.

This parameter is mandatory.

force

Forcibly drop a pipeline, even if it is in started state.

Valid values: TRUE, FALSE

Default value: FALSE

Usage Note

  • In order to drop a pipeline that is in started state, set the force parameter to TRUE.

RESET_PIPELINE Procedure

Resets the tracking state of a data pipeline. Use reset pipeline to restart the pipeline from the initial state of data load or export. Optionally reset pipeline can purge data in the database or in object store, depending on the type of pipeline. A data pipeline must be in stopped state to reset it.

Syntax

DBMS_CLOUD_PIPELINE.RESET_PIPELINE(
       pipeline_name        IN   VARCHAR2,
       purge_data           IN   BOOLEAN DEFAULT FALSE
 );

Parameters

Parameter Description

pipeline_name

Specifies a name for the pipeline.

This parameter is mandatory.

purge_data

Purge data applies for either a load pipeline or an export pipeline:

  • For a load pipeline, when TRUE, truncate the data in database table.

  • For an export pipeline, when TRUE, delete the files in the object store location.

Valid values: TRUE, FALSE

Default value: FALSE

Usage Notes

  • A data pipeline must be in stopped state to reset it. See STOP_PIPELINE Procedure for more information.

  • For a load pipeline, resetting the pipeline clears the record of the files being loaded by the pipeline. When you call either START_PIPELINE or RUN_PIPELINE_ONCE after resetting a load pipeline, the pipeline repeats the data load and includes all the files present in the object store location.

    When purge_data is set to TRUE, DBMS_CLOUD_PIPELINE.RESET_PIPELINE does the following:

    • Truncates the data in the pipeline's database table you specify with table_name attribute.

    • Drops the pipeline's status table, and the pipeline's bad file table and error table ( if they exist).

  • For an export pipeline, resetting the pipeline clears the last tracked data in the database table. When you call either START_PIPELINE or RUN_PIPELINE_ONCE after resetting an export pipeline, the pipeline repeats exporting data from the table or query.

    When purge_data set to TRUE, DBMS_CLOUD_PIPELINE.RESET_PIPELINE deletes existing files in the object store location specified with the location attribute.

RUN_PIPELINE_ONCE Procedure

This procedure performs an on-demand run of the pipeline in the current foreground session, instead of a running in a scheduled job. Use DBMS_CLOUD_PIPELINE.RUN_PIPELINE_ONCE to test a pipeline before you start the pipeline as a continuous job.

Syntax

DBMS_CLOUD_PIPELINE.RUN_PIPELINE_ONCE(
       pipeline_name  IN  VARCHAR2
);

Parameters

Parameter Description

pipeline_name

Specifies a name for the pipeline to run.

This parameter is mandatory.

Usage Notes

  • After you perform a test run of a pipeline you can reset the pipeline state using DBMS_CLOUD_PIPELINE.RESET_PIPELINE. This allows you to reset the pipeline state before you start the pipeline in a scheduled job.

  • If a pipeline is in the started state, then it cannot be run in the foreground session.

SET_ATTRIBUTE Procedure

This procedure sets pipeline attributes. There are two overloaded procedures, one to set a single attribute and another to set multiple attributes using a JSON document of attribute name/value pairs.

Syntax

PROCEDURE DBMS_CLOUD_PIPELINE.SET_ATTRIBUTE(
      pipeline_name        IN   VARCHAR2,
      attribute_name       IN   VARCHAR2,
      attribute_value      IN   CLOB
);
 
PROCEDURE DBMS_CLOUD_PIPELINE.SET_ATTRIBUTE(
      pipeline_name        IN   VARCHAR2,
      attributes           IN   CLOB
);

Parameters

Parameter Description

pipeline_name

Specifies a name for the pipeline to set attributes.

This parameter is mandatory.

attribute_name

Specifies the attribute name for the attribute to be set.

See DBMS_CLOUD_PIPELINE Attributes for more information.

attribute_value

Specifies the value for the pipeline attribute to set.

See DBMS_CLOUD_PIPELINE Attributes for more information.

attributes

Specifies a JSON document containing attribute names and values.

See DBMS_CLOUD_PIPELINE Attributes for more information.

Usage Note

  • When you use DBMS_CLOUD_PIPELINE.SET_ATTRIBUTE to set multiple attributes with the attributes parameter, all the existing attributes are deleted and overwritten with the specified attributes from the JSON document.

START_PIPELINE Procedure

The procedure starts the data pipeline. When a pipeline is started, the pipeline operation runs continuously in a scheduled job according to the interval configured with the pipeline attributes.

Syntax

DBMS_CLOUD_PIPELINE.START_PIPELINE(
      pipeline_name        IN   VARCHAR2,
      start_date           IN   TIMESTAMP WITH TIME ZONE DEFAULT NULL
);

Parameters

Parameter Description

pipeline_name

Specifies a name for the pipeline.

This parameter is mandatory.

start_date

Specifies the starting date for the pipeline job.

Default value: NULL

Usage Notes

  • By default, a pipeline job begins immediately, as soon as the pipeline is started. To start a pipeline job at a later time specify a valid date or timestamp using the start_date parameter.

  • See DBMS_CLOUD_PIPELINE Attributes for information on the pipeline interval and other pipeline attributes.

STOP_PIPELINE Procedure

The procedure stops the data pipeline. When a pipeline is stopped, no future jobs are scheduled for the pipeline.

Syntax

DBMS_CLOUD_PIPELINE.STOP_PIPELINE(
      pipeline_name        IN   VARCHAR2,
      force                IN   BOOLEAN  DEFAULTFALSE
);

Parameters

Parameter Description

pipeline_name

Specifies a name for the pipeline.

This parameter is mandatory.

force

If force parameter is passed as TRUE, then it will terminate any running jobs for the pipeline.

Valid values: TRUE, FALSE

Default value: FALSE

DBMS_CLOUD_PIPELINE Attributes

Attributes help to control and configure the behavior of a data pipeline.

Attributes

Note:

As indicated in the Pipeline Type column, depending on the pipeline type LOAD or EXPORT, a pipeline supports a different set of attributes.
Attribute Name Description Pipeline Type Modifiable After Pipeline Starts
credential_name

The name of the credential to access the source Cloud Object Storage.

Default value: NULL.

If you do not supply a credential_name, credential_name is set to NULL.

LOAD, EXPORT Yes
field_list

Identifies the fields in the source files and their data types. This argument's syntax is the same as the field_list clause in regular Oracle external tables. For more information about field_list see Oracle® Database Utilities.

Default value: NULL

The default value specifies the fields and their data types are determined by the columns in the table specified in table_name attribute.

LOAD Yes
format

The options describing the format for the type of pipeline.

Datapump format is not supported for an export pipeline.

This attribute is mandatory for both LOAD and EXPORT pipelines.

LOAD, EXPORT Yes
interval

The time interval in minutes between consecutive executions of scheduled pipeline job.

Default value: 15 minutes

LOAD, EXPORT Yes
key_column

A timestamp or date column in the specified table or query for exporting newer data continuously to object store. The last execution timestamp or date is tracked by Export pipeline and compared with the value in the key_column to identify new data for exporting to object store.

Default value: NULL

If key_column is not specified for an export pipeline, the entire contents of the table or query are uploaded to object store in each pipeline job execution.

EXPORT No
location

Specifies a URI that points to an Object Storage location.

The format of the URI depends on the Cloud Object Storage service. See Cloud Object Storage URI Formats for more information.

This attribute is mandatory for both LOAD and EXPORT pipelines.

LOAD, EXPORT No
priority

Specifies a string value that determines the number of parallel operations performed for the pipeline.

  • In a load pipeline, it determines the number of files loaded in parallel.

  • In an export pipeline, it determines the degree of parallelism for fetching data from the database.

An operation with a higher priority consumes more database resources and is completed sooner.

Valid values:

  • HIGH: Determines the number of parallel files handled using the database's ECPU count (OCPU count if your database uses OCPUs).
  • MEDIUM: Determines the number of simultaneous processes using the concurrency limit for Medium service. The default value is 4.
  • LOW: Run the pipeline job in serial order.

Default value: MEDIUM

The maximum number of concurrent file operations is limited to 64.

LOAD, EXPORT Yes
query

Specifies a SELECT statement so that only the required data is exported. The query determines the contents of the files you export as text files (CSV, JSON, Parquet, or XML) or dump files.

For example:

SELECT warehouse_id, quantity FROM inventories

Default value: NULL

For an export pipeline, either table_name or query is mandatory.

EXPORT No
table_name

Specifies the name of the target table for loading or exporting data.

For a load pipeline table_name is mandatory.

For an export pipeline, either table_name or query is mandatory.

LOAD, EXPORT No
table_owner

The name of the schema where the target table resides for loading or exporting data.

Default value: NULL

With a NULL value the target table is in the same schema as the user running the procedure.

LOAD, EXPORT No