Running Synchronizations

Running a synchronization, creating and dropping a synchronization job, and dropping synchronized schemas can be performed with these procedures.

Topics

Note:

On April 4, 2022, the sync_option and grant_read parameters were added to the DBMS_DCAT.RUN_SYNC procedure. To ensure correct performance of scheduled sync jobs created prior to that date, you need to drop and recreate the scheduled sync jobs. See DBMS_DCAT.DROP_SYNC_JOB Procedure and DBMS_DCAT.CREATE_SYNC_JOB Procedure.

RUN_SYNC Procedure

This procedure runs a synchronization operation and is the entry point to the synchronization. As an input, it takes lists of selected Data Catalog assets, folders and entities and materializes them by creating, dropping, and altering external tables.

The sync_option parameter specifies which operation the RUN_SYNC procedure performs: SYNC, DELETE or REPLACE. The operation is performed over entities within the scope of the synced_objects parameter. Every call to RUN_SYNC returns a unique operation_id that can be used to query the USER_LOAD_OPERATIONS view to obtain information about the status of the sync and the corresponding log_table. For further details, see Monitoring and Troubleshooting Loads.

Note:

On April 4, 2022, the sync_option and grant_read parameters were added to the RUN_SYNC procedure. To ensure correct performance of scheduled sync jobs created prior to that date, you need to drop and recreate the scheduled sync jobs. See DBMS_DCAT.DROP_SYNC_JOB Procedure and DBMS_DCAT.CREATE_SYNC_JOB Procedure.

Synchronizing Partitioned Logical Entities

The RUN_SYNC procedure creates a partitioned external table for each logical entity when all of the following apply:
  • The logical entity is based on partitioned data that follows the hive-style or non-hive folder format. Logical entities based on partitioned data that follow the non-hive style format using object names are not supported.
  • The logical entity has one or more partitioned attributes.
  • The logical entity is derived from a prefix-based filename pattern. Partitioned logical entities derived from regex-based patterns are not supported.

For an example of synchronizing partitioned logical entities, see Example: A Partitioned Data Scenario.

Incremental Synchronization of Partitioned Logical Entities

Every call to the RUN_SYNC procedure specifies a set of logical entities to be synced with the database. When a logical entity is present in two RUN_SYNC calls, the second call preserves and possibly alters existing external tables. The following table shows which logical entity changes are supported when the logical entity is partitioned:

Logical Entity Change Action

Addition, removal, or update of a partition

All partitions of the external partitioned table are updated, regardless of whether a change has been detected by the data catalog.

Addition of a partitioned attribute

Adding a partitioned column to an external partitioned table is not supported. An exception is raised.

Deletion of a partition attribute

Dropping a partitioned column from an external partitioned table is not supported. An exception is raised.

Renaming of a partitioned attribute

Renaming a partitioned column in an external partitioned table is not supported. An exception is raised.

Syntax

PROCEDURE DBMS_DCAT.RUN_SYNC (
        synced_objects   IN  CLOB,
        sync_option      IN VARCHAR2 DEFAULT 'SYNC',
        error_semantics  IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
        log_level        IN VARCHAR2 DEFAULT 'INFO',
        grant_read       IN VARCHAR2 DEFAULT NULL,
     );
PROCEDURE DBMS_DCAT.RUN_SYNC (
        synced_objects   IN  CLOB,
        sync_option      IN VARCHAR2 DEFAULT 'SYNC',
        error_semantics  IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
        log_level        IN VARCHAR2 DEFAULT 'INFO',
        grant_read       IN VARCHAR2 DEFAULT NULL,
        operation_id     OUT NOCOPY NUMBER
    );

Parameters

Parameter Description

synced_objects

This parameter specifies a set of entities in multiple granularity: data assets, folders (Object Store buckets) or logical entities. It contains an asset_list that is either an array of asset objects or an array containing a single "*" string that stands for 'sync all (object store) data assets in the catalog'.

sync_option (Optional) There are three options:
  • SYNC (Default) - This option ensures that what is in the data catalog, over the synced_objects scope, is represented in the Autonomous Database. If a logical entity was deleted from the data catalog, since the last sync operation, then it is deleted in the Autonomous Database. The following operations are performed over the synced_objects scope:
    • Adds tables for new data catalog entities
    • Removes tables for deleted data catalog entities
    • Updates properties (such as name, columns and data types) for existing tables
  • DELETE - Deletes tables within the synced_objects scope.
  • REPLACE - Replaces all currently synced objects with the objects within the synced_objects scope.

error_semantics

(Optional) This parameter specifies the error behavior. If set to SKIP_ERRORS, the sync attempts to continue despite errors encountered for individual entities. If set to STOP_ON_ERROR, the procedure fails on the first encountered error. The default is SKIP_ERRORS.

log_level

(Optional) This parameter specifies the following values in increasing level of logging detail: (OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE, ALL). The default is INFO.
grant_read (Optional) This parameter is a list of users/roles that are automatically granted READ privileges on all external tables processed by this invocation of RUN_SYNC. All users/roles in the grant_read list are given READ privileges on all new or already existing external tables that correspond to entities specified by the synced_objects parameter. The RUN_SYNC procedure preserves already granted privileges on synced external tables.

operation_id

(Optional) This parameter is used to find the corresponding entry in USER_LOAD_OPERATIONS for the sync and determine the name of the log table.

Note: A version of RUN_SYNC that does not return an operation_id is available so users can query USER_LOAD_OPERATIONS for the latest sync.

Example: Synchronize All Data Catalog Entities

In the following example, all Data Catalog entities are synchronized.

EXEC DBMS_DCAT.RUN_SYNC(synced_objects=>'{"asset_list":["*"]}');

Example: synced_objects Parameter for Synchronizing All Data Assets

The following is an example synced_objects parameter for synchronizing all (Object Storage) data assets in the Data Catalog.

{"asset_list" : ["*"]}

Example: synced_objects Parameter for Synchronizing Specific Data Assets

The following is an example synced_objects parameter for synchronizing two data assets.

{"asset_list": [
        {
            "asset_id":"0b320de9-8411-4448-91fb-9e2e7f78fd5f"
        },
        {
            "asset_id":"0b320de9-8411-4448-91fb-9e2e7f74523"
        }
    ]}

Example: synced_objects Parameter for Synchronizing Specific Entities within a Data Asset

The following shows an example synced_objects parameter for synchronizing two entities within the data asset.

{"asset_list": [
        {
            "asset_id":"0b320de9-8411-4448-91fb-9e2e7f78fd5f",
            "folder_list":[
                "f1",         
                "f2" 
           ]
        }   
    ]}

Example: synced_objects Parameter for Synchronizing Specific Folders and Entities within a Data Asset

The following shows an example synced_objects parameter for synchronizing two folders and two entities within the data asset.

{"asset_list":[
        {
            "asset_id":"0b320de9-8411-4448-91fb-9e2e7f78fd5f",
            "entity_list": [
                "entity1",     
                "entity2"
            ],
             "folder_list": [
                "f1",         
                "f2"
            ]
        }   
    ]}

CREATE_SYNC_JOB Procedure

This procedure creates a scheduler job to invoke RUN_SYNC periodically.

It takes as input the set of objects to be synced, the error semantics and the log level, as described in the RUN_SYNC API and a repeat interval.

There can only be a single sync job. The CREATE_SYNC_JOB procedure fails if another job is already specified, unless the force parameter is set to TRUE. If force is set to TRUE the previous job is dropped.

If a scheduler job attempts to run while another sync is in progress, the scheduler job fails.

Note:

On April 4, 2022, the sync_option and grant_read parameters were added to the RUN_SYNC procedure. To ensure correct performance of scheduled sync jobs created prior to that date, you need to drop and recreate the scheduled sync jobs. See DBMS_DCAT.DROP_SYNC_JOB Procedure and DBMS_DCAT.CREATE_SYNC_JOB Procedure.

Syntax

PROCEDURE DBMS_DCAT.CREATE_SYNC_JOB (
  synced_objects   IN CLOB,
  error_semantics  IN VARCHAR2 DEFAULT 'SKIP_ERRORS',
  log_level        IN VARCHAR2 DEFAULT 'INFO',
  repeat_interval  IN VARCHAR2,
  force            IN VARCHAR2 DEFAULT 'FALSE',
  grant_read       IN VARCHAR2 DEFAULT NULL,
  sync_option      IN VARCHAR2 DEFAULT 'SYNC'
);

Parameters

Parameter Description

synced_objects

A JSON object specifying the objects to be synced, as described in the RUN_SYNC procedure.

error_semantics

(Optional) Error behavior, as specified for RUN_SYNC. Default is SKIP_ERRORS.

log_level

(Optional) Logging level, as specified for RUN_SYNC. Default is INFO.

repeat_interval

Repeat interval for the job, with the same semantics as the repeat interval parameter of the DBMS_SCHEDULER.CREATE_JOB procedure. For details on the repeat_interval, see Overview of Creating Jobs.

force

(Optional) If TRUE, existing sync jobs are deleted first. If FALSE, the CREATE_SYNC_JOB procedure fails if a sync job already exists. Default is FALSE.

grant_read (Optional) List of users/roles to be granted READ on the synced external tables, as described for procedure RUN_SYNC. See DBMS_DCAT.RUN_SYNC Procedure.
sync_option (Optional) Behavior with respect to entities that have already been synced through a previous RUN_SYNC operation, as described for procedure RUN_SYNC. See DBMS_DCAT.RUN_SYNC Procedure.

DROP_SYNC_JOB Procedure

This procedure drops an existing sync job.

Syntax

PROCEDURE DBMS_DCAT.DROP_SYNC_JOB;

Parameters

None.

DROP_SYNCED_SCHEMAS Procedure

This procedure drops all previously synchronized schemas.

Syntax

PROCEDURE DBMS_DCAT.DROP_SYNCED_SCHEMAS;

Parameters

None.