Summary of DBMS_DCAT Subprograms
This table lists the DBMS_DCAT package procedures used to
create, query and drop Data Catalog connections.
| Subprogram | Description |
|---|---|
| SET_DATA_CATALOG_CREDENTIAL Procedure | Set the data catalog access credential used by a specific connection to the data catalog |
| SET_OBJECT_STORE_CREDENTIAL Procedure | Set the credential used by the given unique connection identifier for accessing the Object Store |
| SET_DATA_CATALOG_CONN Procedure | Create a connection to the given data catalog |
| UNSET_DATA_CATALOG_CONN Procedure | Remove an existing Data Catalog connection |
SET_DATA_CATALOG_CREDENTIAL Procedure
This procedure sets the Data Catalog access credential used by a specific connection to the Data Catalog.
PROCEDURE DBMS_DCAT.SET_DATA_CATALOG_CREDENTIAL(
credential_name VARCHAR2(128) DEFAULT NULL,
dcat_con_id VARCHAR2 DEFAULT NULL
);| Parameter | Description |
|---|---|
|
|
The credential used for accessing the Data Catalog. |
dcat_con_id |
The unique
Data Catalog connection identifier. This credential is used for
the connection identified by dcat_con_id. The
default is Null.
|
This credential must have Manage Data Catalog permissions; see Data Catalog Policies.
SET_OBJECT_STORE_CREDENTIAL Procedure
This procedure sets the credential that is used by the given unique connection identifier for accessing the Object Store. Changing the Object Store access credential alters all existing synced tables to use the new credential.
PROCEDURE DBMS_DCAT.SET_OBJECT_STORE_CREDENTIAL(
credential_name VARCHAR2(128),
dcat_con_id IN VARCHAR2 DEFAULT NULL
);| Parameter | Description |
|---|---|
|
|
The credential used by the external tables for accessing the Object Store. |
dcat_con_id |
The unique Data Catalog connection identifier. The default is NULL. |
SET_DATA_CATALOG_CONN Procedure
This procedure creates a connection to the given Data Catalog. The connection is required to synchronize metadata with Data Catalog. An Autonomous AI Database instance can connect to multiple Data Catalog instances and supports connecting to OCI Data Catalogs and AWS Glue Data Catalogs.
PROCEDURE DBMS_DCAT.SET_DATA_CATALOG_CONN (
region VARCHAR2 DEFAULT NULL,
endpoint VARCHAR2 DEFAULT NULL,
catalog_id VARCHAR2 DEFAULT NULL,
dcat_con_id VARCHAR2 DEFAULT NULL,
catalog_type VARCHAR2 DEFAULT NULL,
token_endpoint VARCHAR2 DEFAULT Null
);| Parameter | Description |
|---|---|
|
|
The Data Catalog region. If the
|
|
|
The Data Catalog endpoint. If the
|
|
|
The unique Oracle Cloud Identifier (OCID) for the
Data Catalog instance. When connecting to AWS Glue Data
Catalogs, |
dcat_con_id |
A unique Data
Catalog connection identifier. This identifier is required when
connecting to multiple Data Catalogs and is optional when
connecting to only one. It is used to refer to the Data Catalog
connection in subsequent calls or when querying views. If no
identifier is specified this procedure generates a NULL
connection identifier. The following restrictions apply for
dcat_con_id:
|
catalog_type |
The type of
data catalog to connect. Allowed values:
|
token_endpoint |
The access token endpoint for OAuth authentication. Default is NULL. |
Usage:
You only need to call this procedure once to set the connection. As part of the connection process, Autonomous AI Database adds custom properties to Data Catalog. These custom properties are accessible to Data Catalog users and allow you to override default names (for schemas, tables and columns) and column data types.
Before creating a connection, credentials must be created and set. For a description of the connection process, see Typical Workflow with Data Catalog for OCI Data Catalogs and User Workflow for Querying with AWS Glue Data Catalog for AWS Glue Data Catalogs.
Example: Connecting with a known OCID:
uk-london-1
region. The catalog_id parameter uses the Oracle Cloud Identifier
(ocid) for the Data Catalog instance. The type of Data Catalog
is automatically determined: AWS Glue Data Catalog or OCI Data Catalog.
BEGIN
DBMS_DCAT.SET_DATA_CATALOG_CONN(
region=>'uk-london-1',
catalog_id=>'ocid1.datacatalog.oc1.uk-london-1...');
END;
/
Example: Connecting to an AWS Glue Data Catalog
A connection is the association between an Autonomous AI Database instance and an AWS Glue
Data Catalog. After a successful connection, the Autonomous AI Database instance is able to synchronize with AWS Glue. Each
AWS account has one AWS Glue Data Catalog per region and each catalog can be
accessed using the corresponding service endpoint for each region. An Autonomous AI Database instance can be associated
with an AWS Glue Data Catalog by invoking the API
DBMS_DCAT.SET_DATA_CATALOG_CONN and specify the endpoint for
the region where the catalog resides.
See AWS Glue endpoints and quotas.
uk-london-1 region. Because this is an AWS Glue Data Catalog
connection, the catalog_id parameter is not
needed.BEGIN
DBMS_DCAT.SET_DATA_CATALOG_CONN(
region=>'uk-london-1',
catalog_type=>'AWS_GLUE'
END;
/UNSET_DATA_CATALOG_CONN Procedure
This procedure removes an existing Data Catalog connection.
Note:
Invoking this procedure drops all of the protected schemas and external tables that were created as part of previous synchronizations. It does not impact the metadata in Data Catalog.PROCEDURE DBMS_DCAT.UNSET_DATA_CATALOG_CONN (
dcat_con_id IN VARCHAR2 DEFAULT NULL
);| Parameter | Description |
|---|---|
dcat_con_id |
The unique Data Catalog connection identifier. Default is Null. |
Synchronization Subprograms
Running a synchronization, creating and dropping a synchronization job, and dropping synchronized schemas can be performed with the procedures listed in this table.
Note:
On April 4, 2022, thesync_option and grant_read parameters were
added to the DBMS_DCAT.RUN_SYNC procedure. To ensure correct
performance of scheduled sync jobs created prior to that date, you need to drop and
recreate the scheduled sync jobs. See DROP_SYNC_JOB Procedure and CREATE_SYNC_JOB Procedure.
| Subprogram | Description |
|---|---|
| CREATE_SYNC_JOB Procedure | Create a
scheduler job to invoke RUN_SYNC
periodically
|
| DROP_SYNC_JOB Procedure | Drop an existing sync job for the given unique connection identifier |
| DROP_SYNCED_SCHEMAS Procedure | Drop all previously synchronized schemas for the given unique connection identifier |
| RUN_SYNC Procedure | Run a synchronization operation |
RUN_SYNC Procedure
This procedure runs a synchronization operation and is the entry point to the synchronization. As an input, it takes lists of selected data catalog assets, folders and entities and materializes them by creating, dropping, and altering external tables.
The sync_option parameter specifies which operation the
RUN_SYNC procedure performs: SYNC,
DELETE or REPLACE. The operation is performed
over entities within the scope of the synced_objects parameter.
Every call to the RUN_SYNC procedure returns a unique
operation_id that can be used to query the
USER_LOAD_OPERATIONS view to obtain information about the
status of the sync and the corresponding log_table. The
DBMS_DCAT$SYNC_LOG view can be queried for easy access to the
log_table for the last sync operation executed by the current
user. For further details, see DBMS_DCAT$SYNC_LOG View.
Note:
On April 4, 2022, thesync_option and grant_read parameters were
added to the DBMS_DCAT.RUN_SYNC procedure. To ensure correct
performance of scheduled sync jobs created prior to that date, you need to drop and
recreate the scheduled sync jobs. See DROP_SYNC_JOB Procedure and CREATE_SYNC_JOB Procedure .
Synchronizing Partitioned Logical Entities or Glue Tables
RUN_SYNC procedure creates a partitioned external
table for each logical entity or Glue table when all three of the following
apply:
- The OCI data catalog logical entity or Glue table has one or more partitioned attributes.
- For OCI data catalogs, the logical entity is derived from a prefix-based filename pattern. Partitioned logical entities derived from regex-based patterns are not supported.
- For OCI data catalogs, the logical entity is based on
partitioned data that follows the hive-style or non-hive folder format.
Logical entities based on partitioned data that follow the non-hive style
format using object names are not supported.
-
Example 1. Logical entities based on harvested objects that follow the Hive style partitioning format with prefix-based filename patterns.
Consider the following objects:Bucket: MYBUCKET cluster1/db1.db/sales/country=USA/year=2020/month=01/sales1.csv cluster1/db1.db/sales/country=USA/year=2020/month=01/sales2.csv cluster1/db1.db/sales/country=USA/year=2020/month=02/sales1.csvHarvesting the bucket using a filename pattern with a starting folder prefix of
cluster1/db1.dbgenerates a logical entity namedSALESwith three partition attributes:country,year, andmonth. The type for partitioned attributes isPartitionwhile the type for non-partitioned attributes isPrimitive. - Example 2. Logical entities based on harvested objects
that follow the non-Hive style partitioning format with prefix-based
filename patterns.
Consider the following objects:
Bucket: MYBUCKET cluster2/db2.db/sales/USA/2020/01/sales1.csv cluster2/db2.db/sales/USA/2020/01/sales2.csv cluster2/db2.db/sales/USA/2020/02/sales1.csvHarvesting the bucket using a filename pattern with a starting folder prefix of
cluster2/db2.dbgenerates a logical entity namedSALESwith three partition attributes:name0, name1, andname2. The only difference between the generated logical entity compared to Example 1, is that the names of partitioned attributes are auto generated, while in Example 1 they are extracted from the URL (country,year, andmonthrespectively).
-
For a complete end to end example of synchronizing partitioned logical entities, see Example: A Partitioned Data Scenario.
Incremental Synchronization of Partitioned Logical Entities/Glue Tables
RUN_SYNC procedure specifies a set of
OCI data catalog logical entities or AWS Glue tables to be synced with the database.
When a logical entity or Glue table is present in two RUN_SYNC
calls, the second call preserves and possibly alters existing external tables. The
following table shows which logical entity or Glue table changes are supported when
the logical entity or Glue table is partitioned:
| Logical Entity or Glue Table Change | Action |
|---|---|
|
Addition, removal, or update of a partition |
All partitions of the external partitioned table are updated, regardless of whether a change has been detected by the data catalog. |
| Addition of a partitioned attribute |
Adding a partitioned column to an external partitioned table is not supported. An exception is raised. |
|
Deletion of a partition attribute |
Dropping a partitioned column from an external partitioned table is not supported. An exception is raised. |
|
Renaming of a partitioned attribute |
Renaming a partitioned column in an external partitioned table is not supported. An exception is raised. |
PROCEDURE DBMS_DCAT.RUN_SYNC (
synced_objects IN CLOB,
sync_option IN VARCHAR2 DEFAULT 'SYNC',
error_semantics IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
log_level IN VARCHAR2 DEFAULT 'INFO',
grant_read IN VARCHAR2 DEFAULT NULL,
dcat_con_id IN VARCHAR2 DEFAULT NULL
);
PROCEDURE DBMS_DCAT.RUN_SYNC (
synced_objects IN CLOB,
sync_option IN VARCHAR2 DEFAULT 'SYNC',
error_semantics IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
log_level IN VARCHAR2 DEFAULT 'INFO',
grant_read IN VARCHAR2 DEFAULT NULL,
operation_id OUT NOCOPY NUMBER,
dcat_con_id IN VARCHAR2 DEFAULT NULL
);| Parameter | Description |
|---|---|
|
|
This parameter is a JSON document that specifies the data catalog objects to synchronize. For OCI Data Catalogs, the JSON document
specifies a set of entities in multiple granularity: data
assets, folders (Object Store buckets) or logical entities.
It contains an For AWS Glue Data Catalogs, the JSON document specifies a list of tables in multiple granularity: databases, tables. The document specifies a list of databases. Users can restrict the set of tables to be synced by specifying individual tables within a database. |
sync_option |
(Optional)
There are three options:
|
|
|
(Optional) This parameter specifies the error
behavior. If set to |
|
|
(Optional)
This parameter specifies the following values in increasing
level of logging detail: (OFF,
FATAL, ERROR,
WARN, INFO,
DEBUG, TRACE,
ALL). The default is
INFO.
|
grant_read |
(Optional)
This parameter is a list of users/roles that are automatically
granted READ privileges on all external tables processed by this
invocation of RUN_SYNC. All users/roles in the
grant_read list are given READ privileges
on all new or already existing external tables that correspond
to entities specified by the synced_objects
parameter. The RUN_SYNC procedure preserves
already granted privileges on synced external tables.
|
|
|
(Optional) This parameter is used to find the
corresponding entry in Note: A version of |
dcat_con_id |
This parameter is the unique data catalog connection identifier that was specified when the connection to the data catalog was created. SeeSET_DATA_CATALOG_CONN Procedure . This parameter identifies which connection is used for synchronization and becomes a part of the derived schema name. See Synchronization Mapping for a description of how the schema name is derived. The parameter default is NULL. |
Example: Synchronize all OCI Data Catalog entities
EXEC DBMS_DCAT.RUN_SYNC(synced_objects=>'{"asset_list":["*"]}');Example: synced_objects parameter for synchronizing all OCI Data
Catalog data assets
synced_objects parameter for
synchronizing all (Object Storage) data assets in the Data Catalog.
{"asset_list" : ["*"]}Example: synced_objects parameter for synchronizing specific OCI
Data Catalog data assets
synced_objects parameter for
synchronizing two data
assets.{"asset_list": [
{
"asset_id":"...-9e2e7f78fd5f"
},
{
"asset_id":"...-9e2e7f74523"
}
]}
Example: synced_objects parameter for synchronizing specific OCI
Data Catalog entities within a data asset
synced_objects parameter for
synchronizing two entities within the data
asset.{"asset_list": [
{
"asset_id":"...-9e2e7f78fd5f",
"folder_list":[
"f1",
"f2"
]
}
]}Example: synced_objects parameter for synchronizing specific OCI
Data Catalog folders and entities within a data asset
synced_objects parameter for
synchronizing two folders and two entities within the data
asset.{"asset_list":[
{
"asset_id":"...-9e2e7f78fd5f",
"entity_list": [
"entity1",
"entity2"
],
"folder_list": [
"f1",
"f2"
]
}
]}Example: synced_objects Parameter for Synchronizing All AWS Glue
Data Catalog Databases
synced_objects parameter for
synchronizing all databases in the AWS Glue Data
Catalog.{"database_list":["*"]}Example: synced_objects Parameter for Synchronizing Two AWS Glue
Data Catalog Databases
synced_objects parameter for
synchronizing two AWS Glue Data Catalog
databases.{"database_list":[
{"database":"tpcdscsv"},
{"database":"tpcdsparquet"} ]}Example: synced_objects Parameter for Synchronizing Three AWS Glue
Data Catalog Databases
synced_objects parameter for
synchronizing three tables from an AWS Glue Data Catalog
database.{"database_list":[
{"database":"tpcdsparquet",
"table_list": [ "tpcdsparquet_customer",
"tpcdsparquet_item",
"tpcdsparquet_web_sales" ] } ]}CREATE_SYNC_JOB Procedure
This procedure creates a scheduler job to invoke RUN_SYNC
periodically
It takes as input the set of objects to be synced, the error semantics, the log level, and a repeat interval. See RUN_SYNC Procedure for further details on how synchronization works.
There can only be a single sync job. The
CREATE_SYNC_JOB procedure fails if another job is already
specified, unless the force parameter is set to TRUE. If force is
set to TRUE the previous job is dropped.
Note:
On April 4, 2022, thesync_option and
grant_read parameters were added to the
DBMS_DCAT.RUN_SYNC procedure. To ensure correct performance
of scheduled sync jobs created prior to that date, you need to drop and recreate
the scheduled sync jobs. See DROP_SYNC_JOB Procedure and CREATE_SYNC_JOB Procedure .
PROCEDURE DBMS_DCAT.CREATE_SYNC_JOB (
synced_objects IN CLOB,
error_semantics IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
log_level IN VARCHAR2 DEFAULT 'INFO',
repeat_interval IN VARCHAR2,
force IN VARCHAR2 DEFAULT 'FALSE',
grant_read IN VARCHAR2 DEFAULT NULL,
sync_option IN VARCHAR2 DEFAULT 'SYNC',
dcat_con_id IN VARCHAR2 DEFAULT NULL
);| Parameter | Description |
|---|---|
|
|
A JSON object specifying the objects to be
synced, as described in the |
error_semantics |
(Optional) Error behavior, as specified for
|
log_level |
(Optional) Logging level, as specified for
|
repeat_interval |
Repeat interval for the job, with the same
semantics as the repeat interval parameter of the
|
force |
(Optional) If |
grant_read |
(Optional)
List of users/roles to be granted READ on the synced external
tables, as described for procedure RUN_SYNC.
See RUN_SYNC Procedure.
|
sync_option |
(Optional)
Behavior with respect to entities that have already been synced
through a previous RUN_SYNC operation, as
described for procedure RUN_SYNC. See RUN_SYNC Procedure.
|
dcat_con_id |
This parameter is the unique Data Catalog connection identifier that was specified when the connection to Data Catalog was created. See SET_DATA_CATALOG_CONN Procedure. This parameter identifies which connection is used for synchronization and becomes a part of the derived schema name. See Synchronization Mapping for a description of how the schema name is derived. The parameter default is NULL. |
DROP_SYNC_JOB Procedure
This procedure drops an existing sync job for the given unique connection identifier.
PROCEDURE DBMS_DCAT.DROP_SYNC_JOB (
dcat_con_id IN VARCHAR2 DEFAULT NULL
);| Parameter | Description |
|---|---|
dcat_con_id |
The unique Data Catalog connection identifier. The default is NULL. |
DROP_SYNCED_SCHEMAS Procedure
This procedure drops all previously synchronized schemas for the given unique connection identifier.
PROCEDURE DBMS_DCAT.DROP_SYNCED_SCHEMAS (
dcat_con_id IN VARCHAR2 DEFAULT NULL
);| Parameter | Description |
|---|---|
dcat_con_id |
The unique Data Catalog connection identifier. The default is NULL. |