DBMS_CLOUD_PIPELINE Package

The DBMS_CLOUD_PIPELINE package allows you to create data pipelines for loading and exporting data in the cloud. This package supports continuous incremental data load of files in object store into the database. DBMS_CLOUD_PIPELINE also supports continuous incremental export of table data or query results from the database to object store based on a timestamp column.

Related Topics

Prerequisites

As a developer, you can use DBMS_CLOUD procedures with Autonomous Databases deployed on Oracle Public Cloud or Exadata Cloud@Customer.

Depending on the deployment choice, the following prerequisites must be met to use the DBMS_CLOUD procedures with Amazon S3, Azure Blob Storage, and Google Cloud Storage service providers.

Oracle Public Cloud
Exadata Cloud@Customer

An outbound connectivity must have been configured using a NAT gateway, by your fleet administrator as described below:

Create a NAT gateway in the Virtual Cloud Network (VCN) where your Autonomous Database resources reside by following the instructions in Create a NAT Gateway in Oracle Cloud Infrastructure Documentation.
After creating the NAT gateway, add a route rule and an egress security rule to each subnet (in the VCN) where Autonomous Database resources reside so that these resources can use the gateway to obtain a public key from your Azure AD instance:
1. Go to the Subnet Details page for the subnet.
2. In the Subnet Information tab, click the name of the subnet's Route Table to display its Route Table Details page.
3. In the table of existing Route Rules, check whether there is already a rule with the following characteristics:
  - Destination: 0.0.0.0/0
  - Target Type: NAT Gateway
  - Target: The name of the NAT gateway you just created in the VCN
  If such a rule does not exist, click Add Route Rules and add a route rule with these characteristics.
4. Return to the Subnet Details page for the subnet.
5. In the subnet's Security Lists table, click the name of the subnet's security list to display its Security List Details page.
6. In the side menu, under Resources, click Egress Rules.
7. In the table of existing Egress Rules, check whether there is already a rule with the following characteristics:
  - Destination Type: CIDR
  - Destination: 0.0.0.0/0
  - IP Protocol: TCP
  - Source Port Range: 443
  - Destination Port Range: All
  If such a rule does not exist, click Add Egress Rules and add an egress rule with these characteristics.

The HTTP Proxy settings in your environment must allow the database to access the cloud service provider.

These settings are defined by your fleet administrator while creating the Exadata Cloud@Customer infrastructure as described in Using the Console to Provision Exadata Database Service on Cloud@Customer .

Note:

The network configuration including the HTTP Proxy can only be edited until the Exadata Infrastructure is in Requires Activation state. Once it is activated, you cannot edit those settings.

Setting up an HTTP Proxy for an already provisioned Exadata Infrastructure needs a Service Request (SR) in My Oracle Support. See Create a Service Request in My Oracle Support for details.

Summary of DBMS_CLOUD_PIPELINE Subprograms

This table summarizes the subprograms included in the DBMS_CLOUD_PIPELINE package.

Note:

You can use the DBMS_CLOUD_PIPELINE package only with Autonomous Database versions 19.22 or later.

Subprogram	Description
CREATE_PIPELINE Procedure	Creates a new data pipeline.
DROP_PIPELINE Procedure	Drops an existing data pipeline.
RESET_PIPELINE Procedure	Resets the tracking state of a data pipeline. Use reset pipeline to restart the pipeline from the initial state of data load or export. Optionally reset pipeline can purge data in the database or in object store, depending on the type of pipeline.
RUN_PIPELINE_ONCE Procedure	Performs an on-demand run of the pipeline in the current foreground session, instead of a scheduled job.
SET_ATTRIBUTE Procedure	Sets pipeline attributes. There are two overloaded procedures, one to set a single attribute and another to set multiple attributes using a JSON document of attribute name/value pairs
START_PIPELINE Procedure	Starts the data pipeline. When a pipeline is started, the pipeline operation will continuously run in a scheduled job according to the "interval" configured in pipeline attributes.
STOP_PIPELINE Procedure	Stops the data pipeline. When a pipeline is stopped, no future jobs are scheduled for the pipeline.

CREATE_PIPELINE Procedure

The procedure creates a new data pipeline.

Syntax

DBMS_CLOUD_PIPELINE.CREATE_PIPELINE(
      pipeline_name        IN   VARCHAR2,
      pipeline_type        IN   VARCHAR2,
      attributes           IN   CLOB        DEFAULT NULL,
      description          IN   VARCHAR2    DEFAULT NULL
);

Parameters

Parameter	Description
`pipeline_name`	Specifies a name for the pipeline. The pipeline name must follow the naming rules of Oracle SQL identifiers. See Identifiers for more information. This parameter is mandatory.
`pipeline_type`	Specifies the pipeline type. Valid values: `LOAD`, `EXPORT` This parameter is mandatory.
`attributes`	Pipeline attributes in JSON format. Default value: `NULL` See DBMS_CLOUD_PIPELINE Attributes for more information.
`description`	Description for the pipeline. Default value: `NULL`

DROP_PIPELINE Procedure

The procedure drops an existing data pipeline. If a pipeline has been started, then it must be stopped before it can be dropped.

Syntax

DBMS_CLOUD_PIPELINE.DROP_PIPELINE(
       pipeline_name        IN   VARCHAR2,
       force                IN   BOOLEAN DEFAULT FALSE
 );

Parameters

Parameter Description

Parameter	Description
`pipeline_name`	Specifies a pipeline name. This parameter is mandatory.
`force`	Forcibly drop a pipeline, even if it is in started state. Valid values: `TRUE`, `FALSE` Default value: `FALSE`

pipeline_name

Specifies a pipeline name.

This parameter is mandatory.

force

Forcibly drop a pipeline, even if it is in started state.

Valid values: TRUE, FALSE

Default value: FALSE

Usage Note

In order to drop a pipeline that is in started state, set the force parameter to TRUE.

RESET_PIPELINE Procedure

Resets the tracking state of a data pipeline. Use reset pipeline to restart the pipeline from the initial state of data load or export. Optionally reset pipeline can purge data in the database or in object store, depending on the type of pipeline. A data pipeline must be in stopped state to reset it.

Syntax

DBMS_CLOUD_PIPELINE.RESET_PIPELINE(
       pipeline_name        IN   VARCHAR2,
       purge_data           IN   BOOLEAN DEFAULT FALSE
 );

Parameters

Parameter Description

Parameter	Description
`pipeline_name`	Specifies a name for the pipeline. This parameter is mandatory.
`purge_data`	Purge data applies for either a load pipeline or an export pipeline: For a load pipeline, when `TRUE`, truncate the data in database table. For an export pipeline, when `TRUE`, delete the files in the object store location. Valid values: `TRUE`, `FALSE` Default value: `FALSE`

pipeline_name

Specifies a name for the pipeline.

This parameter is mandatory.

purge_data

Purge data applies for either a load pipeline or an export pipeline:

For a load pipeline, when TRUE, truncate the data in database table.
For an export pipeline, when TRUE, delete the files in the object store location.

Valid values: TRUE, FALSE

Default value: FALSE

Usage Notes

A data pipeline must be in stopped state to reset it. See STOP_PIPELINE Procedure for more information.
For a load pipeline, resetting the pipeline clears the record of the files being loaded by the pipeline. When you call either START_PIPELINE or RUN_PIPELINE_ONCE after resetting a load pipeline, the pipeline repeats the data load and includes all the files present in the object store location.

When purge_data is set to TRUE, DBMS_CLOUD_PIPELINE.RESET_PIPELINE does the following:
- Truncates the data in the pipeline's database table you specify with table_name attribute.
- Drops the pipeline's status table, and the pipeline's bad file table and error table ( if they exist).
For an export pipeline, resetting the pipeline clears the last tracked data in the database table. When you call either START_PIPELINE or RUN_PIPELINE_ONCE after resetting an export pipeline, the pipeline repeats exporting data from the table or query.

When purge_data set to TRUE, DBMS_CLOUD_PIPELINE.RESET_PIPELINE deletes existing files in the object store location specified with the location attribute.

RUN_PIPELINE_ONCE Procedure

This procedure performs an on-demand run of the pipeline in the current foreground session, instead of a running in a scheduled job. Use DBMS_CLOUD_PIPELINE.RUN_PIPELINE_ONCE to test a pipeline before you start the pipeline as a continuous job.

Syntax

DBMS_CLOUD_PIPELINE.RUN_PIPELINE_ONCE(
       pipeline_name  IN  VARCHAR2
);

Parameters

Parameter Description

Parameter	Description
`pipeline_name`	Specifies a name for the pipeline to run. This parameter is mandatory.

pipeline_name

Specifies a name for the pipeline to run.

This parameter is mandatory.

Usage Notes

After you perform a test run of a pipeline you can reset the pipeline state using DBMS_CLOUD_PIPELINE.RESET_PIPELINE. This allows you to reset the pipeline state before you start the pipeline in a scheduled job.
If a pipeline is in the started state, then it cannot be run in the foreground session.

SET_ATTRIBUTE Procedure

This procedure sets pipeline attributes. There are two overloaded procedures, one to set a single attribute and another to set multiple attributes using a JSON document of attribute name/value pairs.

Syntax

PROCEDURE DBMS_CLOUD_PIPELINE.SET_ATTRIBUTE(
      pipeline_name        IN   VARCHAR2,
      attribute_name       IN   VARCHAR2,
      attribute_value      IN   CLOB
);
 
PROCEDURE DBMS_CLOUD_PIPELINE.SET_ATTRIBUTE(
      pipeline_name        IN   VARCHAR2,
      attributes           IN   CLOB
);

Parameters

Parameter	Description
`pipeline_name`	Specifies a name for the pipeline to set attributes. This parameter is mandatory.
`attribute_name`	Specifies the attribute name for the attribute to be set. See DBMS_CLOUD_PIPELINE Attributes for more information.
`attribute_value`	Specifies the value for the pipeline attribute to set. See DBMS_CLOUD_PIPELINE Attributes for more information.
`attributes`	Specifies a JSON document containing attribute names and values. See DBMS_CLOUD_PIPELINE Attributes for more information.

Usage Note

When you use DBMS_CLOUD_PIPELINE.SET_ATTRIBUTE to set multiple attributes with the attributes parameter, all the existing attributes are deleted and overwritten with the specified attributes from the JSON document.

START_PIPELINE Procedure

The procedure starts the data pipeline. When a pipeline is started, the pipeline operation runs continuously in a scheduled job according to the interval configured with the pipeline attributes.

Syntax

DBMS_CLOUD_PIPELINE.START_PIPELINE(
      pipeline_name        IN   VARCHAR2,
      start_date           IN   TIMESTAMP WITH TIME ZONE DEFAULT NULL
);

Parameters

Parameter Description

Parameter	Description
`pipeline_name`	Specifies a name for the pipeline. This parameter is mandatory.
`start_date`	Specifies the starting date for the pipeline job. Default value: `NULL`

pipeline_name

Specifies a name for the pipeline.

This parameter is mandatory.

start_date

Specifies the starting date for the pipeline job.

Default value: NULL

Usage Notes

By default, a pipeline job begins immediately, as soon as the pipeline is started. To start a pipeline job at a later time specify a valid date or timestamp using the start_date parameter.
See DBMS_CLOUD_PIPELINE Attributes for information on the pipeline interval and other pipeline attributes.

STOP_PIPELINE Procedure

The procedure stops the data pipeline. When a pipeline is stopped, no future jobs are scheduled for the pipeline.

Syntax

DBMS_CLOUD_PIPELINE.STOP_PIPELINE(
      pipeline_name        IN   VARCHAR2,
      force                IN   BOOLEAN  DEFAULTFALSE
);

Parameters

Parameter Description

Parameter	Description
`pipeline_name`	Specifies a name for the pipeline. This parameter is mandatory.
`force`	If force parameter is passed as `TRUE`, then it will terminate any running jobs for the pipeline. Valid values: `TRUE, FALSE` Default value: `FALSE`

pipeline_name

Specifies a name for the pipeline.

This parameter is mandatory.

force

If force parameter is passed as TRUE, then it will terminate any running jobs for the pipeline.

Valid values: TRUE, FALSE

Default value: FALSE

DBMS_CLOUD_PIPELINE Attributes

Attributes help to control and configure the behavior of a data pipeline.

Attributes

Note:

As indicated in the Pipeline Type column, depending on the pipeline type LOAD or EXPORT, a pipeline supports a different set of attributes.

Attribute Name	Description	Pipeline Type	Modifiable After Pipeline Starts
`credential_name`	The name of the credential to access the source Cloud Object Storage. Default value: `NULL`. If you do not supply a `credential_name`, `credential_name` is set to `NULL`.	`LOAD`, `EXPORT`	Yes
`field_list`	Identifies the fields in the source files and their data types. This argument's syntax is the same as the `field_list` clause in regular Oracle external tables. For more information about `field_list` see Oracle® Database Utilities. Default value: `NULL` The default value specifies the fields and their data types are determined by the columns in the table specified in `table_name` attribute.	`LOAD`	Yes
`format`	The options describing the format for the type of pipeline. For a load pipeline, see DBMS_CLOUD Package Format Options. For an export pipeline, see DBMS_CLOUD Package Format Options for EXPORT_DATA Datapump `format` is not supported for an export pipeline. This attribute is mandatory for both `LOAD` and `EXPORT` pipelines.	`LOAD`, `EXPORT`	Yes
`interval`	The time interval in minutes between consecutive executions of scheduled pipeline job. Default value: 15 minutes	`LOAD`, `EXPORT`	Yes
`key_column`	A timestamp or date column in the specified `table` or `query` for exporting newer data continuously to object store. The last execution timestamp or date is tracked by Export pipeline and compared with the value in the `key_column` to identify new data for exporting to object store. Default value: NULL If `key_column` is not specified for an export pipeline, the entire contents of the `table` or `query` are uploaded to object store in each pipeline job execution.	`EXPORT`	No
`location`	Specifies a URI that points to an Object Storage location. The format of the URI depends on the Cloud Object Storage service. See Cloud Object Storage URI Formats for more information. This attribute is mandatory for both `LOAD` and `EXPORT` pipelines.	`LOAD`, `EXPORT`	No
`priority`	Specifies a string value that determines the number of parallel operations performed for the pipeline. In a load pipeline, it determines the number of files loaded in parallel. In an export pipeline, it determines the degree of parallelism for fetching data from the database. An operation with a higher priority consumes more database resources and is completed sooner. Valid values: `HIGH`: Determines the number of parallel files handled using the database's ECPU count (OCPU count if your database uses OCPUs). `MEDIUM`: Determines the number of simultaneous processes using the concurrency limit for Medium service. The default value is 4. `LOW`: Run the pipeline job in serial order. Default value: `MEDIUM` The maximum number of concurrent file operations is limited to 64.	`LOAD`, `EXPORT`	Yes
`query`	Specifies a `SELECT` statement so that only the required data is exported. The query determines the contents of the files you export as text files (CSV, JSON, Parquet, or XML) or dump files. For example: `SELECT warehouse_id, quantity FROM inventories` Default value: `NULL` For an export pipeline, either `table_name` or `query` is mandatory.	`EXPORT`	No
`table_name`	Specifies the name of the target table for loading or exporting data. For a load pipeline `table_name` is mandatory. For an export pipeline, either `table_name` or `query` is mandatory.	`LOAD`, `EXPORT`	No
`table_owner`	The name of the schema where the target table resides for loading or exporting data. Default value: `NULL` With a `NULL` value the target table is in the same schema as the user running the procedure.	`LOAD`, `EXPORT`	No

Title and Copyright Information

DBMS_CLOUD_PIPELINE Package

Oracle and/or its affiliates.

Documentation Accessibility

For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.

Access to Oracle Support

Oracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.