Summary of DBMS_DCAT Subprograms
This table lists the DBMS_DCAT package procedures used to create, query and drop Data Catalog connections.
| Subprogram | Description |
|---|---|
| SET_DATA_CATALOG_CREDENTIAL Procedure | Set the data catalog access credential used by a specific connection to the data catalog |
| SET_OBJECT_STORE_CREDENTIAL Procedure | Set the credential used by the given unique connection identifier for accessing the Object Store |
| SET_DATA_CATALOG_CONN Procedure | Create a connection to the given data catalog |
| UNSET_DATA_CATALOG_CONN Procedure | Remove an existing Data Catalog connection |
SET_DATA_CATALOG_CREDENTIAL Procedure
This procedure sets the Data Catalog access credential used by a specific connection to the Data Catalog.
Syntax
PROCEDURE DBMS_DCAT.SET_DATA_CATALOG_CREDENTIAL(
credential_name VARCHAR2(128) DEFAULT NULL,
dcat_con_id VARCHAR2 DEFAULT NULL
);
Parameters
| Parameter | Description |
|---|---|
credential_name |
The credential used for accessing the Data Catalog. |
dcat_con_id |
The unique Data Catalog connection identifier. This credential is used for the connection identified by dcat_con_id. The default is Null. |
This credential must have Manage Data Catalog permissions; see Data Catalog Policies.
SET_OBJECT_STORE_CREDENTIAL Procedure
This procedure sets the credential that is used by the given unique connection identifier for accessing the Object Store. Changing the Object Store access credential alters all existing synced tables to use the new credential.
Syntax
PROCEDURE DBMS_DCAT.SET_OBJECT_STORE_CREDENTIAL(
credential_name VARCHAR2(128),
dcat_con_id IN VARCHAR2 DEFAULT NULL
);
Parameters
| Parameter | Description |
|---|---|
credential_name |
The credential used by the external tables for accessing the Object Store. |
dcat_con_id |
The unique Data Catalog connection identifier. The default is NULL. |
SET_DATA_CATALOG_CONN Procedure
This procedure creates a connection to the given Data Catalog. The connection is required to synchronize metadata with Data Catalog. An Autonomous AI Database instance can connect to multiple Data Catalog instances and supports connecting to OCI Data Catalogs and AWS Glue Data Catalogs.
Syntax
PROCEDURE DBMS_DCAT.SET_DATA_CATALOG_CONN (
region VARCHAR2 DEFAULT NULL,
endpoint VARCHAR2 DEFAULT NULL,
catalog_id VARCHAR2 DEFAULT NULL,
dcat_con_id VARCHAR2 DEFAULT NULL,
catalog_type VARCHAR2 DEFAULT NULL,
token_endpoint VARCHAR2 DEFAULT Null
);
Parameters
| Parameter | Description |
|---|---|
region |
The Data Catalog region. If the endpoint is specified, region is optional. If both endpoint and region are given, then the endpoint takes precedence. Default is NULL. |
endpoint |
The Data Catalog endpoint. If the region is specified, endpoint is optional. If both endpoint and region are given, then the endpoint takes precedence. Default is NULL. |
catalog_id |
The unique Oracle Cloud Identifier (OCID) for the Data Catalog instance. When connecting to AWS Glue Data Catalogs, catalog_id is optional. |
dcat_con_id |
A unique Data Catalog connection identifier. This identifier is required when connecting to multiple Data Catalogs and is optional when connecting to only one. It is used to refer to the Data Catalog connection in subsequent calls or when querying views. If no identifier is specified this procedure generates a NULL connection identifier. The following restrictions apply for dcat_con_id:
|
catalog_type |
The type of data catalog to connect. Allowed values:
|
token_endpoint |
The access token endpoint for OAuth authentication. Default is NULL. |
Usage
You only need to call this procedure once to set the connection. As part of the connection process, Autonomous AI Database adds custom properties to Data Catalog. These custom properties are accessible to Data Catalog users and allow you to override default names (for schemas, tables and columns) and column data types.
Before creating a connection, credentials must be created and set. For a description of the connection process, see Typical Workflow with Data Catalog for OCI Data Catalogs and User Workflow for Querying with AWS Glue Data Catalog for AWS Glue Data Catalogs.
Examples
Example: Connecting with a known OCID
In this example, Autonomous AI Database is connecting to Data Catalog in the uk-london-1 region. The catalog_id parameter uses the Oracle Cloud Identifier (ocid) for the Data Catalog instance. The type of Data Catalog is automatically determined: AWS Glue Data Catalog or OCI Data Catalog.
BEGIN
DBMS_DCAT.SET_DATA_CATALOG_CONN(
region=>'uk-london-1',
catalog_id=>'ocid1.datacatalog.oc1.uk-london-1...');
END;
/
Example: Connecting to an AWS Glue Data Catalog
A connection is the association between an Autonomous AI Database instance and an AWS Glue Data Catalog. After a successful connection, the Autonomous AI Database instance is able to synchronize with AWS Glue. Each AWS account has one AWS Glue Data Catalog per region and each catalog can be accessed using the corresponding service endpoint for each region. An Autonomous AI Database instance can be associated with an AWS Glue Data Catalog by invoking the API DBMS_DCAT.SET_DATA_CATALOG_CONN and specify the endpoint for the region where the catalog resides.
See AWS Glue endpoints and quotas.
In this example, Autonomous AI Database is connecting to an AWS Glue Data Catalog in the uk-london-1 region. Because this is an AWS Glue Data Catalog connection, the catalog_id parameter is not needed.
BEGIN
DBMS_DCAT.SET_DATA_CATALOG_CONN(
region=>'uk-london-1',
catalog_type=>'AWS_GLUE'
END;
/
UNSET_DATA_CATALOG_CONN Procedure
This procedure removes an existing Data Catalog connection.
Syntax
Note: Invoking this procedure drops all of the protected schemas and external tables that were created as part of previous synchronizations. It does not impact the metadata in Data Catalog.
PROCEDURE DBMS_DCAT.UNSET_DATA_CATALOG_CONN (
dcat_con_id IN VARCHAR2 DEFAULT NULL
);
Parameters
| Parameter | Description |
|---|---|
dcat_con_id |
The unique Data Catalog connection identifier. Default is Null. |
Synchronization Subprograms
Running a synchronization, creating and dropping a synchronization job, and dropping synchronized schemas can be performed with the procedures listed in this table.
Note: On April 4, 2022, the sync_option and grant_read parameters were added to the DBMS_DCAT.RUN_SYNC procedure. To ensure correct performance of scheduled sync jobs created prior to that date, you need to drop and recreate the scheduled sync jobs. See DROP_SYNC_JOB Procedure and CREATE_SYNC_JOB Procedure.
| Subprogram | Description |
|---|---|
| CREATE_SYNC_JOB Procedure | Create a scheduler job to invoke RUN_SYNC periodically |
| DROP_SYNC_JOB Procedure | Drop an existing sync job for the given unique connection identifier |
| DROP_SYNCED_SCHEMAS Procedure | Drop all previously synchronized schemas for the given unique connection identifier |
| RUN_SYNC Procedure | Run a synchronization operation |
RUN_SYNC Procedure
This procedure runs a synchronization operation and is the entry point to the synchronization. As an input, it takes lists of selected data catalog assets, folders and entities and materializes them by creating, dropping, and altering external tables.
The sync_option parameter specifies which operation the RUN_SYNC procedure performs: SYNC, DELETE or REPLACE. The operation is performed over entities within the scope of the synced_objects parameter.
Every call to the RUN_SYNC procedure returns a unique operation_id that can be used to query the USER_LOAD_OPERATIONS view to obtain information about the status of the sync and the corresponding log_table. The DBMS_DCAT$SYNC_LOG view can be queried for easy access to the log_table for the last sync operation executed by the current user. For further details, see DBMS_DCAT$SYNC_LOG View.
Note: On April 4, 2022, the sync_option and grant_read parameters were added to the DBMS_DCAT.RUN_SYNC procedure. To ensure correct performance of scheduled sync jobs created prior to that date, you need to drop and recreate the scheduled sync jobs. See DROP_SYNC_JOB Procedure and CREATE_SYNC_JOB Procedure .
Synchronizing Partitioned Logical Entities or Glue Tables
The RUN_SYNC procedure creates a partitioned external table for each logical entity or Glue table when all three of the following apply:
-
The OCI data catalog logical entity or Glue table has one or more partitioned attributes.
-
For OCI data catalogs, the logical entity is derived from a prefix-based filename pattern. Partitioned logical entities derived from regex-based patterns are not supported.
-
For OCI data catalogs, the logical entity is based on partitioned data that follows the hive-style or non-hive folder format. Logical entities based on partitioned data that follow the non-hive style format using object names are not supported.
-
Example 1. Logical entities based on harvested objects that follow the Hive style partitioning format with prefix-based filename patterns.
Consider the following objects:
Bucket: MYBUCKET cluster1/db1.db/sales/country=USA/year=2020/month=01/sales1.csv cluster1/db1.db/sales/country=USA/year=2020/month=01/sales2.csv cluster1/db1.db/sales/country=USA/year=2020/month=02/sales1.csvHarvesting the bucket using a filename pattern with a starting folder prefix of
cluster1/db1.dbgenerates a logical entity namedSALESwith three partition attributes:country,year, andmonth. The type for partitioned attributes isPartitionwhile the type for non-partitioned attributes isPrimitive. -
Example 2. Logical entities based on harvested objects that follow the non-Hive style partitioning format with prefix-based filename patterns.
Consider the following objects:
Bucket: MYBUCKET cluster2/db2.db/sales/USA/2020/01/sales1.csv cluster2/db2.db/sales/USA/2020/01/sales2.csv cluster2/db2.db/sales/USA/2020/02/sales1.csvHarvesting the bucket using a filename pattern with a starting folder prefix of
cluster2/db2.dbgenerates a logical entity namedSALESwith three partition attributes:name0, name1, andname2. The only difference between the generated logical entity compared to Example 1, is that the names of partitioned attributes are auto generated, while in Example 1 they are extracted from the URL (country,year, andmonthrespectively).
-
For a complete end to end example of synchronizing partitioned logical entities, see Example: A Partitioned Data Scenario.
Incremental Synchronization of Partitioned Logical Entities/Glue Tables
Every call to the RUN_SYNC procedure specifies a set of OCI data catalog logical entities or AWS Glue tables to be synced with the database. When a logical entity or Glue table is present in two RUN_SYNC calls, the second call preserves and possibly alters existing external tables. The following table shows which logical entity or Glue table changes are supported when the logical entity or Glue table is partitioned:
| Logical Entity or Glue Table Change | Action |
|---|---|
| Addition, removal, or update of a partition | All partitions of the external partitioned table are updated, regardless of whether a change has been detected by the data catalog. |
| Addition of a partitioned attribute | Adding a partitioned column to an external partitioned table is not supported. An exception is raised. |
| Deletion of a partition attribute | Dropping a partitioned column from an external partitioned table is not supported. An exception is raised. |
| Renaming of a partitioned attribute | Renaming a partitioned column in an external partitioned table is not supported. An exception is raised. |
Syntax
PROCEDURE DBMS_DCAT.RUN_SYNC (
synced_objects IN CLOB,
sync_option IN VARCHAR2 DEFAULT 'SYNC',
error_semantics IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
log_level IN VARCHAR2 DEFAULT 'INFO',
grant_read IN VARCHAR2 DEFAULT NULL,
dcat_con_id IN VARCHAR2 DEFAULT NULL
);
PROCEDURE DBMS_DCAT.RUN_SYNC (
synced_objects IN CLOB,
sync_option IN VARCHAR2 DEFAULT 'SYNC',
error_semantics IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
log_level IN VARCHAR2 DEFAULT 'INFO',
grant_read IN VARCHAR2 DEFAULT NULL,
operation_id OUT NOCOPY NUMBER,
dcat_con_id IN VARCHAR2 DEFAULT NULL
);
Parameters
| Parameter | Description |
|---|---|
synced_objects |
This parameter is a JSON document that specifies the data catalog objects to synchronize. For OCI Data Catalogs, the JSON document specifies a set of entities in multiple granularity: data assets, folders (Object Store buckets) or logical entities. It contains an For AWS Glue Data Catalogs, the JSON document specifies a list of tables in multiple granularity: databases, tables. The document specifies a list of databases. Users can restrict the set of tables to be synced by specifying individual tables within a database. |
sync_option |
(Optional) There are three options:
|
error_semantics |
(Optional) This parameter specifies the error behavior. If set to SKIP_ERRORS, the sync attempts to continue despite errors encountered for individual entities. If set to STOP_ON_ERROR, the procedure fails on the first encountered error. The default is SKIP_ERRORS. |
log_level |
(Optional) This parameter specifies the following values in increasing level of logging detail: (OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE, ALL). The default is INFO. |
grant_read |
(Optional) This parameter is a list of users/roles that are automatically granted READ privileges on all external tables processed by this invocation of RUN_SYNC. All users/roles in the grant_read list are given READ privileges on all new or already existing external tables that correspond to entities specified by the synced_objects parameter. The RUN_SYNC procedure preserves already granted privileges on synced external tables. |
operation_id |
(Optional) This parameter is used to find the corresponding entry in Note: A version of |
dcat_con_id |
This parameter is the unique data catalog connection identifier that was specified when the connection to the data catalog was created. See SET_DATA_CATALOG_CONN Procedure . This parameter identifies which connection is used for synchronization and becomes a part of the derived schema name. See Synchronization Mapping for a description of how the schema name is derived. The parameter default is NULL. |
Examples
Example: Synchronize all OCI Data Catalog entities.
In the following example, all Data Catalog entities are synchronized.
EXEC DBMS_DCAT.RUN_SYNC(synced_objects=>'{"asset_list":["*"]}');
Example: synced_objects parameter for synchronizing all OCI Data Catalog data assets.
The following is an example synced_objects parameter for synchronizing all (Object Storage) data assets in the Data Catalog.
{"asset_list" : ["*"]}
Example: synced_objects parameter for synchronizing specific OCI Data Catalog data assets.
The following is an example synced_objects parameter for synchronizing two data assets.
{"asset_list": [
{
"asset_id":"...-9e2e7f78fd5f"
},
{
"asset_id":"...-9e2e7f74523"
}
]}
Example: synced_objects parameter for synchronizing specific OCI Data Catalog entities within a data asset.
The following shows an example synced_objects parameter for synchronizing two entities within the data asset.
{"asset_list": [
{
"asset_id":"...-9e2e7f78fd5f",
"folder_list":[
"f1",
"f2"
]
}
]}
Example: synced_objects parameter for synchronizing specific OCI Data Catalog folders and entities within a data asset.
The following shows an example synced_objects parameter for synchronizing two folders and two entities within the data asset.
{"asset_list":[
{
"asset_id":"...-9e2e7f78fd5f",
"entity_list": [
"entity1",
"entity2"
],
"folder_list": [
"f1",
"f2"
]
}
]}
Example: synced_objects Parameter for Synchronizing All AWS Glue Data Catalog Databases.
The following shows an example synced_objects parameter for synchronizing all databases in the AWS Glue Data Catalog.
{"database_list":["*"]}
Example: synced_objects Parameter for Synchronizing Two AWS Glue Data Catalog Databases.
The following shows an example synced_objects parameter for synchronizing two AWS Glue Data Catalog databases.
{"database_list":[
{"database":"tpcdscsv"},
{"database":"tpcdsparquet"} ]}
Example: synced_objects Parameter for Synchronizing Three AWS Glue Data Catalog Databases.
The following shows an example synced_objects parameter for synchronizing three tables from an AWS Glue Data Catalog database.
{"database_list":[
{"database":"tpcdsparquet",
"table_list": [ "tpcdsparquet_customer",
"tpcdsparquet_item",
"tpcdsparquet_web_sales" ] } ]}
CREATE_SYNC_JOB Procedure
This procedure creates a scheduler job to invoke RUN_SYNC periodically
It takes as input the set of objects to be synced, the error semantics, the log level, and a repeat interval. See RUN_SYNC Procedure for further details on how synchronization works.
There can only be a single sync job. The CREATE_SYNC_JOB procedure fails if another job is already specified, unless the force parameter is set to TRUE. If force is set to TRUE the previous job is dropped.
If a scheduler job attempts to run while another sync is in progress, the scheduler job fails.
Note: On April 4, 2022, the sync_option and grant_read parameters were added to the DBMS_DCAT.RUN_SYNC procedure. To ensure correct performance of scheduled sync jobs created prior to that date, you need to drop and recreate the scheduled sync jobs. See DROP_SYNC_JOB Procedure and CREATE_SYNC_JOB Procedure .
Syntax
PROCEDURE DBMS_DCAT.CREATE_SYNC_JOB (
synced_objects IN CLOB,
error_semantics IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
log_level IN VARCHAR2 DEFAULT 'INFO',
repeat_interval IN VARCHAR2,
force IN VARCHAR2 DEFAULT 'FALSE',
grant_read IN VARCHAR2 DEFAULT NULL,
sync_option IN VARCHAR2 DEFAULT 'SYNC',
dcat_con_id IN VARCHAR2 DEFAULT NULL
);
Parameters
| Parameter | Description |
|---|---|
synced_objects |
A JSON object specifying the objects to be synced, as described in the RUN_SYNC procedure. See RUN_SYNC Procedure. |
error_semantics |
(Optional) Error behavior, as specified for RUN_SYNC. Default is SKIP_ERRORS. |
log_level |
(Optional) Logging level, as specified for RUN_SYNC. Default is INFO. |
repeat_interval |
Repeat interval for the job, with the same semantics as the repeat interval parameter of the DBMS_SCHEDULER.CREATE_JOB procedure. For details on the repeat_interval, see Overview of Creating Jobs. |
force |
(Optional) If TRUE, existing sync jobs are deleted first. If FALSE, the CREATE_SYNC_JOB procedure fails if a sync job already exists. Default is FALSE. |
grant_read |
(Optional) List of users/roles to be granted READ on the synced external tables, as described for procedure RUN_SYNC. See RUN_SYNC Procedure. |
sync_option |
(Optional) Behavior with respect to entities that have already been synced through a previous RUN_SYNC operation, as described for procedure RUN_SYNC. See RUN_SYNC Procedure. |
dcat_con_id |
This parameter is the unique Data Catalog connection identifier that was specified when the connection to Data Catalog was created. See SET_DATA_CATALOG_CONN Procedure. This parameter identifies which connection is used for synchronization and becomes a part of the derived schema name. See Synchronization Mapping for a description of how the schema name is derived. The parameter default is NULL. |
DROP_SYNC_JOB Procedure
This procedure drops an existing sync job for the given unique connection identifier.
Syntax
PROCEDURE DBMS_DCAT.DROP_SYNC_JOB (
dcat_con_id IN VARCHAR2 DEFAULT NULL
);
Parameters
| Parameter | Description |
|---|---|
dcat_con_id |
The unique Data Catalog connection identifier. The default is NULL. |
DROP_SYNCED_SCHEMAS Procedure
This procedure drops all previously synchronized schemas for the given unique connection identifier.
Syntax
PROCEDURE DBMS_DCAT.DROP_SYNCED_SCHEMAS (
dcat_con_id IN VARCHAR2 DEFAULT NULL
);
Parameters
| Parameter | Description |
|---|---|
dcat_con_id |
The unique Data Catalog connection identifier. The default is NULL. |