9.2.29 Oracle AI Data Platform

Overview

Oracle AI Data Platform is a new product offering that brings together data lake, analytics, and orchestration services into one unified platform. With centralized governance, built-in collaboration, and seamless integration across internal and external data sources, it helps enterprises to accelerate their data-driven decision-making and enables them to perform data analytics at scale with greater speed and efficiency.

9.2.29.1 Detailed Functionality

Overview

Replication to Oracle AI Data Platform uses the stage and merge data flow. The change data from the Oracle GoldenGate trails is staged in micro-batches at a temporary staging location (external stage) in OCI Object Storage. The staged records are then merged into the Oracle AI Data Platform target tables using a merge SQL statement. By default, staged records are merged in every 3 minutes.

9.2.29.1.1 Staging Location

The change data records from the Oracle GoldenGate trail files are formatted into Avro OCF (Object Container Format) and are then uploaded to the staging location.

Change data can be staged on the following object store:
  • OCI Object storage

9.2.29.1.2 Database User Privileges

For replicating into the Oracle AI Data Platform the database user requires the following privileges:
  • SELECT, CREATE, INSERT, UPDATE, and DELETE permissions on the target tables.
  • SELECT, CREATE, ALTER, and DROP permissions on the external tables.

9.2.29.1.3 Prerequisites

  • You must have Oracle Cloud Infrastructure account set up for Oracle AI Data Platform.
  • Simba JDBC driver for Apache Spark. You can download the Simba JDBC driver from the cluster detail page of the Oracle AI Data Platform console. For more information, see Simba JDBC Driver for Apache Spark.

9.2.29.2 Configuration

The Oracle AI Data Platform replication properties configuration is stored in the Replicat properties file.

9.2.29.2.1 Automatic Configuration

Oracle AI Data Platform replication involves configuring multiple components, such as the File Writer Handler, the OCI Event Handler, and the target Oracle AI Data Platform Event Handler.

The Automatic Configuration functionality helps you to configure these components automatically to minimize the manual configuration.

The properties modified by automatic configuration is also logged in the handler log file.

The parameter gg.stage determines the staging location.

To enable auto-configuration to replicate to the Oracle AI Data Platform target, set the parameter gg.target=aidp.

If gg.stage is not set, the OCI object storage will be used as the staging location.

The JDBC Metadata provider is also automatically enabled to retrieve target table metadata from the Oracle AI Data Platform.

Target tables are automatically created if missing.

9.2.29.2.1.1 File Writer Handler Configuration

The File Writer Handler name is pre-set to the value aidp and its properties are automatically set to the required values for Oracle AI Data Platform.

9.2.29.2.1.2 OCI Event Handler Configuration

The OCI Event Handler name is pre-set to the value oci and must be configured to match your OCI configuration.

The following is an example of editing the OCI Event property:
gg.eventhandler.oci.bucketMappingTemplate=container1
9.2.29.2.1.3 Oracle AI Data Platform Event Handler Configuration

The Oracle AI Data Platform Event Handler name is pre-set to the value aidp.

The following are the configuration properties available for the Oracle AI Data Platform Event handler, the required ones must be changed to match your Oracle AI Data Platform configuration.

Table 9-36 Oracle AI Data Platform Event Handler Configuration properties

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.aidp.connectionURL Required

jdbc:spark://<server-hostname>;SparkServerType=AIDP;httpPath=<http-path>[;<setting1>=<value1>;<setting2>=<value2>;<settingN>=<valueN>]`

None Specifies the JDBC URL to connect to Oracle AI Data Platform.
gg.eventhandler.aidp.maxConnections

Optional

Integer Value

10

You can use this parameter to control the number of concurrent JDBC database connections to the target database.

gg.eventhandler.aidp.connectionRetries

Optional

Integer Value

3

Specifies the number of times connections to the target data warehouse will be retried.

gg.eventhandler.aidp.connectionRetryIntervalSeconds

Optional

Integer Value

30

Specifies the delay in seconds between connection retry attempts.

gg.eventhandler.aidp.createTable

Optional

True or False

True

If the parameter value is set to true. The missing target tables are automatically created.

gg.eventhandler.aidp.detectMissingBaseRow

Optional

True or False

False

Diagnostic parameter to find the UPDATE operations without base row. If set to true, Replicat will ABEND if there are Confidential – Oracle Internal UPDATE operations without base row. These rows are collected into another table that can be investigated.

gg.eventhandler.aidp.dropStagingTablesOnShutdown

Optional

True or False

False

If the parameter value is set to true, the temporary staging tables created by Oracle GoldenGate will be dropped on replicat graceful stop.

gg.handler.aidp.fileRollInterval

Optional

The default unit of measure is milliseconds. You can stipulate ms, s, m, h, to signify milliseconds, seconds, minutes, or hours respectively. Examples of legal values include 10000, 10000ms, 10s, 10m, or 1.5h. Values of 0 or lower indicate that file roll-out operation is disabled.

3m

The parameter determines how often the data will be merged into AIDP.

Caution: Increasing the default value from 3m to higher value will increase the amount of data stored in the Replicat internal memory. This can cause out of memory errors and stop the Replicat process.

9.2.29.2.2 Authentication to Oracle AI Data Platform

The Oracle AI Data Platform JDBC connection URLs use the following format for authentication:
jdbc:spark://<Host>;SparkServerType=AIDP;httpPath=<http-path>[;<setting1>=<value1>;<setting2>=<value2>;<settingN>=<valueN>]
9.2.29.2.2.1 Oracle AI Data Platform JDBC Driver Compute Settings

The Oracle AI Data Platform JDBC driver requires the following compute resource configuration settings.

Table 9-37 Oracle AI Data Platform JDBC Driver compute resource configuration settings

Setting Description

Host

Specifies the Oracle AI Data Platform compute resource’s Server hostname value.

SparkServerType

Specifies the Oracle AI Data Platform compute resource’s spark server type, that is, AIDP.

httpPath

Specifies the Oracle AI Data Platform compute resource’s HTTP path value.

9.2.29.2.2.2 Connection Details for Oracle AI Data Platform Compute Cluster

To get the connection details for the Oracle AI Data Platform compute cluster:
  1. Log in to your Oracle AI Data Platform instance.
  2. In the sidebar, select the required workspace and then click Compute.
  3. From the list of available clusters, click the target cluster’s name.
  4. Click on the Connection Details tab that mentions the JDBC URL details and the driver details associated with that cluster.
  5. In the JDBC tab, the JDBC URL is available for the use.
9.2.29.2.2.3 Simba JDBC Driver for Apache Spark

Use the following steps to get the Simba JBDC driver for the Oracle AI Data Platform compute cluster:
  1. Log in to your Oracle AI Data Platform instance.
  2. Select the required workspace.
  3. Click Compute.
  4. Select the target cluster’s name from the list of available clusters.
  5. Click the Connection Details tab that mentions the JDBC URL details and the driver details associated with that cluster.
  6. From the JDBC tab, select Download JDBC Driver option to obtain the JDBC driver jar.

Note:

Oracle recommends to use the Simba JDBC jar lean version included in the ZIP package downloaded from the Oracle AI Data Platform console. The Simba JDBC jar lean version includes shaded dependencies. These shaded dependencies help prevent dependency resolution conflicts between Oracle GoldenGate and the Simba JDBC driver, thereby ensuring smoother integration and compatibility. The lean Simba JDBC driver jar is available in the downloaded ZIP archive's libs directory.
9.2.29.2.2.4 Oracle AI Data Platform Authentication Methods

OCI API Key

The Oracle AI Data Platform JDBC driver supports the following authentication methods:

Use the following steps to generate the OCI API key and the corresponding profile:
  1. In your OCI tenancy where AI Data Platform instance is present, navigate to the User icon available at the top-right corner.
  2. Click User Settings, Tokens & Keys, Add API Key.
  3. Click Generate API key pair.
  4. Download the Public and Private keys.
  5. Click Add to add the Public and Private keys.
  6. From the displayed window, copy the configuration file preview and add the contents in the ~/.oci/config file.

    Sample OCI Configuration File

    [DEFAULT]
    user=ocid1.user.oc1..mockValue
    fingerprint=mockFingerPrintValue
    tenancy=ocid1.compartment.oc1..mockValue
    region=us-phoenix-1
    key_file=<path to your private keyfile>

    Note:

    • The Simba JDBC driver has default configuration of the OCI configurations in the ~/.oci/config file with the profile name as DEFAULT.
    • If you change any of the property, provide the similar details in the JDBC URL using the keys, OCIConfigFile and OCIProfile to indicate the new configuration file path and new name of the OCI profile. For Example:
      jdbc:spark://sample.url.com/default;SparkServerType=AIDP;httpPath=cliservice/mockPath;OCIConfigFile=/tmp/config;OCIProfile=MY_PROFILE

9.2.29.2.3 Classpath Configuration

Oracle AI Data Platform Event Handler uses the Simba JDBC driver for Apache Spark.

Ensure the following for Simba JDBC driver classpath configuration:

  • The classpath includes the path to the JDBC driver.
  • Include the OCI object store event handler's dependencies in the classpath.

  • Edit the gg.classpath configuration parameter to include the path to the object store event handler dependencies and the Simba JDBC driver for Apache Spark.

9.2.29.2.4 INSERTALLRECORDS Support

To enable bulk insert operations for Replicat, set the INSERTALLRECORDS parameter in the Replicat parameter file (.prm). This optimizes data loading into the target table.

You can adjust the bulk insert batch size and frequency using the following File writer properties:
  • gg.handler.aidp.maxFileSize with 1GB default value for batch size.
  • gg.handler.aidp.fileRollInterval with 3 minutes default value for bulk insert frequency

9.2.29.2.5 Operation Aggregation

Operation aggregation combines multiple operations on the same row into a single output operation based on a threshold.

9.2.29.2.5.1 In-Memory Operation Aggregation

By default, operation records are aggregated in-memory. You can adjust the merge interval using the gg.handler.aidp.fileRollInterval property. The default value for gg.handler.aidp.fileRollInterval property is 3 minutes.

Note:

The in-memory operation aggregation requires additional JVM memory configuration.
9.2.29.2.5.2 SQL-Based Operation Aggregation

Currently, Oracle AI Data Platform does not support the operation aggregation using SQL.

9.2.29.2.6 End-to-End Configuration

The following is an end-to-end configuration example which uses auto-configuration.

Oracle AI Data Platform with OCI Object Storage

# Properties file for Replicat
# Configuration to load GoldenGate trail operation records into AI Data Platform using OCI object store staging location.
# Note: Recommended to only edit the configuration marked as TODO
gg.target=aidp

# OCI Event handler.
#TODO: Edit the OCI region
gg.eventhandler.oci.region=<oci-region>
#TODO: Edit the OCI compartment OCID
gg.eventhandler.oci.compartmentID=<oci-compartment-ocid>
#TODO: Edit the OCI bucket name
gg.eventhandler.oci.bucketMappingTemplate=<oci-bucket-name>
#TODO: Edit the OCI Config Profile name
gg.eventhandler.oci.profile=<oci-profile-name>
#TODO: Edit the OCI Config file path
gg.eventhandler.oci.configFilePath=./oci/config

# AI Data Platform Event Handler.
#TODO: Edit JDBC ConnectionUrl
gg.eventhandler.aidp.connectionURL=<connection-url>

#TODO: Edit the classpath to include OCI Event handler dependencies and Simba JDBC driver.
gg.classpath=/path/to/oci-dep/*:/path/to/simba-jdbc-driver/*

9.2.29.2.7 Table Mapping

If the MAP statement does not specify a target catalog, then the Oracle AI Data Platform workspace use hive as the default catalog. The event handler will use this as the default catalog during initialization and display the log message as, for example Connection catalog is set to [hive].

Table 9-38 Mapping Table

MAP statement in the Replicat parameter file AIDP Catalog AIDP Schema AIDP Table
MAP SCHEMA_1.TABLE_1, TARGET "schema_1"."table_1"; Default catalog, that is, hive schema_1 table_1
MAP DB_1.SCHEMA_1.TABLE_1, TARGET "db_1"."schema_1"."table_1" db_1 schema_1 table_1

9.2.29.3 Troubleshooting and Diagnostics

  • Unsupported Oracle AI Data Platform datatypes:
    1. ARRAY
    2. MAP
    3. STRUCT
  • Avoid Creating Bucket Mapping Templates with Dynamic Values:

    Avoid creating bucket mapping templates for OCI event handler that use dynamic values, as this may lead to the generation of multiple buckets for a single table. This can result in inconsistent results as the application is using the REFRESH SQL statement to update the external table with the new available staged file, so the location of all the staging files of a particular table should be consistent.
    1. Examples of invalid bucket mapping templates:
      • Using test_run1_${tableName}_${currentTimestamp} where currentTimestamp is replaced with a dynamic timestamp value.

        For example, 2025-09-02_05-54-26.166, resulting in multiple buckets, such as test_run1_HISTORY_2025-09-02_05-54-26.166, test_run1_HISTORY_2025-09-02_06-21-46.006, test_run1_NEW_ORDER_2025-09-02_05-54-28.005.

    2. Examples of valid bucket mapping templates:
      • Using a constant bucket mapping template, such as testrun1, which would ensure that there is a unique single path for each table within the same single bucket for the entire execution of the replicat.
      • Using a template, such as test_run1_${tableName}, which would result in unique single bucket for each source table, such as test_run1_HISTORY, test_run1_NEW_ORDER.
  • Co-existence of the components:

    When using an external stage location, the location/region of the machine where Replicat process is running and the object store's region will have an impact on the overall throughput of the apply process. For the best possible throughput, the components need to be located ideally in the same region or as close as possible.

  • DDL not applied on the target table :

    Oracle GoldenGate for Distributed Applications and Analytics does not support DDL replication

  • Handling Timestamp Values Before 1900-01-01:

    Oracle AI Data Platform currently do not support timestamp datatype values before 1900-01-01 by default due to its internal date representation.

    To enable writing timestamp values before 1900-01-01, set the following property in your cluster configuration:
    spark.sql.parquet.int96RebaseModeInWrite=CORRECTED
    Restart the cluster to apply the change.