Oracle AI Data Platform

9.2.29 Oracle AI Data Platform

Overview

Oracle AI Data Platform is a new product offering that brings together data lake, analytics, and orchestration services into one unified platform. With centralized governance, built-in collaboration, and seamless integration across internal and external data sources, it helps enterprises to accelerate their data-driven decision-making and enables them to perform data analytics at scale with greater speed and efficiency.

Parent topic: Target

9.2.29.1 Detailed Functionality

Overview

Replication to Oracle AI Data Platform uses the stage and merge data flow. The change data from the Oracle GoldenGate trails is staged in micro-batches at a temporary staging location (external stage) in OCI Object Storage. The staged records are then merged into the Oracle AI Data Platform target tables using a merge SQL statement. By default, staged records are merged in every 3 minutes.

Parent topic: Oracle AI Data Platform

9.2.29.1.1 Staging Location

The change data records from the Oracle GoldenGate trail files are formatted into Avro OCF (Object Container Format) and are then uploaded to the staging location.

Change data can be staged on the following object store:

OCI Object storage

Parent topic: Detailed Functionality

9.2.29.1.2 Database User Privileges

For replicating into the Oracle AI Data Platform the database user requires the following privileges:

SELECT, CREATE, INSERT, UPDATE, and DELETE permissions on the target tables.
SELECT, CREATE, ALTER, and DROP permissions on the external tables.

Parent topic: Detailed Functionality

9.2.29.1.3 Prerequisites

You must have Oracle Cloud Infrastructure account set up for Oracle AI Data Platform.
Simba JDBC driver for Apache Spark. You can download the Simba JDBC driver from the cluster detail page of the Oracle AI Data Platform console. For more information, see Simba JDBC Driver for Apache Spark.

Parent topic: Detailed Functionality

9.2.29.2 Configuration

The Oracle AI Data Platform replication properties configuration is stored in the Replicat properties file.

Parent topic: Oracle AI Data Platform

9.2.29.2.1 Automatic Configuration

Oracle AI Data Platform replication involves configuring multiple components, such as the File Writer Handler, the OCI Event Handler, and the target Oracle AI Data Platform Event Handler.

The Automatic Configuration functionality helps you to configure these components automatically to minimize the manual configuration.

The properties modified by automatic configuration is also logged in the handler log file.

The parameter gg.stage determines the staging location.

To enable auto-configuration to replicate to the Oracle AI Data Platform target, set the parameter gg.target=aidp.

If gg.stage is not set, the OCI object storage will be used as the staging location.

The JDBC Metadata provider is also automatically enabled to retrieve target table metadata from the Oracle AI Data Platform.

Target tables are automatically created if missing.

Parent topic: Configuration

9.2.29.2.1.1 File Writer Handler Configuration

The File Writer Handler name is pre-set to the value aidp and its properties are automatically set to the required values for Oracle AI Data Platform.

Parent topic: Automatic Configuration

9.2.29.2.1.2 OCI Event Handler Configuration

The OCI Event Handler name is pre-set to the value oci and must be configured to match your OCI configuration.

The following is an example of editing the OCI Event property:

gg.eventhandler.oci.bucketMappingTemplate=container1

Related Topics

Oracle AI Data Platform Event Handler Configuration

Parent topic: Automatic Configuration

9.2.29.2.1.3 Oracle AI Data Platform Event Handler Configuration

The Oracle AI Data Platform Event Handler name is pre-set to the value aidp.

The following are the configuration properties available for the Oracle AI Data Platform Event handler, the required ones must be changed to match your Oracle AI Data Platform configuration.

Table 9-36 Oracle AI Data Platform Event Handler Configuration properties

Properties	Required/Optional	Legal Values	Default	Explanation
`gg.eventhandler.aidp.connectionURL`	Required	jdbc:spark://<server-hostname>;SparkServerType=AIDP;httpPath=<http-path>[;<setting1>=<value1>;<setting2>=<value2>;<settingN>=<valueN>]`	None	Specifies the JDBC URL to connect to Oracle AI Data Platform.
`gg.eventhandler.aidp.maxConnections`	Optional	Integer Value	10	You can use this parameter to control the number of concurrent JDBC database connections to the target database.
`gg.eventhandler.aidp.connectionRetries`	Optional	Integer Value	3	Specifies the number of times connections to the target data warehouse will be retried.
`gg.eventhandler.aidp.connectionRetryIntervalSeconds`	Optional	Integer Value	30	Specifies the delay in seconds between connection retry attempts.
`gg.eventhandler.aidp.createTable`	Optional	`True` or `False`	`True`	If the parameter value is set to true. The missing target tables are automatically created.
`gg.eventhandler.aidp.detectMissingBaseRow`	Optional	`True` or `False`	`False`	Diagnostic parameter to find the `UPDATE` operations without base row. If set to true, Replicat will `ABEND` if there are Confidential – Oracle Internal `UPDATE` operations without base row. These rows are collected into another table that can be investigated.
`gg.eventhandler.aidp.dropStagingTablesOnShutdown`	Optional	`True` or `False`	`False`	If the parameter value is set to true, the temporary staging tables created by Oracle GoldenGate will be dropped on replicat graceful stop.
`gg.handler.aidp.fileRollInterval`	Optional	The default unit of measure is milliseconds. You can stipulate `ms`, `s`, `m`, `h`, to signify milliseconds, seconds, minutes, or hours respectively. Examples of legal values include 10000, 10000ms, 10s, 10m, or 1.5h. Values of 0 or lower indicate that file roll-out operation is disabled.	3m	The parameter determines how often the data will be merged into AIDP. Caution: Increasing the default value from 3m to higher value will increase the amount of data stored in the Replicat internal memory. This can cause out of memory errors and stop the Replicat process.

Parent topic: Automatic Configuration

9.2.29.2.2 Authentication to Oracle AI Data Platform

The Oracle AI Data Platform JDBC connection URLs use the following format for authentication:

jdbc:spark://<Host>;SparkServerType=AIDP;httpPath=<http-path>[;<setting1>=<value1>;<setting2>=<value2>;<settingN>=<valueN>]

Parent topic: Configuration

9.2.29.2.2.1 Oracle AI Data Platform JDBC Driver Compute Settings

The Oracle AI Data Platform JDBC driver requires the following compute resource configuration settings.

Table 9-37 Oracle AI Data Platform JDBC Driver compute resource configuration settings

Setting	Description
Host	Specifies the Oracle AI Data Platform compute resource’s Server hostname value.
SparkServerType	Specifies the Oracle AI Data Platform compute resource’s spark server type, that is, AIDP.
httpPath	Specifies the Oracle AI Data Platform compute resource’s HTTP path value.

Parent topic: Authentication to Oracle AI Data Platform

9.2.29.2.2.2 Connection Details for Oracle AI Data Platform Compute Cluster

To get the connection details for the Oracle AI Data Platform compute cluster:

Log in to your Oracle AI Data Platform instance.
In the sidebar, select the required workspace and then click Compute.
From the list of available clusters, click the target cluster’s name.
Click on the Connection Details tab that mentions the JDBC URL details and the driver details associated with that cluster.
In the JDBC tab, the JDBC URL is available for the use.

Parent topic: Authentication to Oracle AI Data Platform

9.2.29.2.2.3 Simba JDBC Driver for Apache Spark

Use the following steps to get the Simba JBDC driver for the Oracle AI Data Platform compute cluster:

Log in to your Oracle AI Data Platform instance.
Select the required workspace.
Click Compute.
Select the target cluster’s name from the list of available clusters.
Click the Connection Details tab that mentions the JDBC URL details and the driver details associated with that cluster.
From the JDBC tab, select Download JDBC Driver option to obtain the JDBC driver jar.

Note:

Oracle recommends to use the Simba JDBC jar lean version included in the ZIP package downloaded from the Oracle AI Data Platform console. The Simba JDBC jar lean version includes shaded dependencies. These shaded dependencies help prevent dependency resolution conflicts between Oracle GoldenGate and the Simba JDBC driver, thereby ensuring smoother integration and compatibility. The lean Simba JDBC driver jar is available in the downloaded ZIP archive's libs directory.

Parent topic: Authentication to Oracle AI Data Platform

9.2.29.2.2.4 Oracle AI Data Platform Authentication Methods

OCI API Key

The Oracle AI Data Platform JDBC driver supports the following authentication methods:

Use the following steps to generate the OCI API key and the corresponding profile:

In your OCI tenancy where AI Data Platform instance is present, navigate to the User icon available at the top-right corner.
Click User Settings, Tokens & Keys, Add API Key.
Click Generate API key pair.
Download the Public and Private keys.
Click Add to add the Public and Private keys.
From the displayed window, copy the configuration file preview and add the contents in the ~/.oci/config file.
Sample OCI Configuration File
```
[DEFAULT]
user=ocid1.user.oc1..mockValue
fingerprint=mockFingerPrintValue
tenancy=ocid1.compartment.oc1..mockValue
region=us-phoenix-1
key_file=<path to your private keyfile>
```
Note:
- The Simba JDBC driver has default configuration of the OCI configurations in the ~/.oci/config file with the profile name as DEFAULT.
- If you change any of the property, provide the similar details in the JDBC URL using the keys, OCIConfigFile and OCIProfile to indicate the new configuration file path and new name of the OCI profile. For Example:
```
jdbc:spark://sample.url.com/default;SparkServerType=AIDP;httpPath=cliservice/mockPath;OCIConfigFile=/tmp/config;OCIProfile=MY_PROFILE
```

Parent topic: Authentication to Oracle AI Data Platform

9.2.29.2.3 Classpath Configuration

Oracle AI Data Platform Event Handler uses the Simba JDBC driver for Apache Spark.

Ensure the following for Simba JDBC driver classpath configuration:

The classpath includes the path to the JDBC driver.
Include the OCI object store event handler's dependencies in the classpath.
Edit the gg.classpath configuration parameter to include the path to the object store event handler dependencies and the Simba JDBC driver for Apache Spark.

Parent topic: Configuration

9.2.29.2.4 INSERTALLRECORDS Support

To enable bulk insert operations for Replicat, set the INSERTALLRECORDS parameter in the Replicat parameter file (.prm). This optimizes data loading into the target table.

You can adjust the bulk insert batch size and frequency using the following File writer properties:

gg.handler.aidp.maxFileSize with 1GB default value for batch size.
gg.handler.aidp.fileRollInterval with 3 minutes default value for bulk insert frequency

Parent topic: Configuration

9.2.29.2.5 Operation Aggregation

Operation aggregation combines multiple operations on the same row into a single output operation based on a threshold.

Parent topic: Configuration

9.2.29.2.5.1 In-Memory Operation Aggregation

By default, operation records are aggregated in-memory. You can adjust the merge interval using the gg.handler.aidp.fileRollInterval property. The default value for gg.handler.aidp.fileRollInterval property is 3 minutes.

Note:

The in-memory operation aggregation requires additional JVM memory configuration.

Parent topic: Operation Aggregation

9.2.29.2.5.2 SQL-Based Operation Aggregation

Currently, Oracle AI Data Platform does not support the operation aggregation using SQL.

Parent topic: Operation Aggregation

9.2.29.2.6 End-to-End Configuration

The following is an end-to-end configuration example which uses auto-configuration.

Oracle AI Data Platform with OCI Object Storage

# Properties file for Replicat
# Configuration to load GoldenGate trail operation records into AI Data Platform using OCI object store staging location.
# Note: Recommended to only edit the configuration marked as TODO
gg.target=aidp

# OCI Event handler.
#TODO: Edit the OCI region
gg.eventhandler.oci.region=<oci-region>
#TODO: Edit the OCI compartment OCID
gg.eventhandler.oci.compartmentID=<oci-compartment-ocid>
#TODO: Edit the OCI bucket name
gg.eventhandler.oci.bucketMappingTemplate=<oci-bucket-name>
#TODO: Edit the OCI Config Profile name
gg.eventhandler.oci.profile=<oci-profile-name>
#TODO: Edit the OCI Config file path
gg.eventhandler.oci.configFilePath=./oci/config

# AI Data Platform Event Handler.
#TODO: Edit JDBC ConnectionUrl
gg.eventhandler.aidp.connectionURL=<connection-url>

#TODO: Edit the classpath to include OCI Event handler dependencies and Simba JDBC driver.
gg.classpath=/path/to/oci-dep/*:/path/to/simba-jdbc-driver/*

Parent topic: Configuration

9.2.29.2.7 Table Mapping

If the MAP statement does not specify a target catalog, then the Oracle AI Data Platform workspace use hive as the default catalog. The event handler will use this as the default catalog during initialization and display the log message as, for example Connection catalog is set to [hive].

Table 9-38 Mapping Table

MAP statement in the Replicat parameter file	AIDP Catalog	AIDP Schema	AIDP Table
`MAP SCHEMA_1.TABLE_1, TARGET "schema_1"."table_1";`	Default catalog, that is, `hive`	`schema_1`	`table_1`
`MAP DB_1.SCHEMA_1.TABLE_1, TARGET "db_1"."schema_1"."table_1"`	`db_1`	`schema_1`	`table_1`

Parent topic: Configuration

9.2.29.3 Troubleshooting and Diagnostics

Unsupported Oracle AI Data Platform datatypes:
1. ARRAY
2. MAP
3. STRUCT
Avoid Creating Bucket Mapping Templates with Dynamic Values:
Avoid creating bucket mapping templates for OCI event handler that use dynamic values, as this may lead to the generation of multiple buckets for a single table. This can result in inconsistent results as the application is using the REFRESH SQL statement to update the external table with the new available staged file, so the location of all the staging files of a particular table should be consistent.
1. Examples of invalid bucket mapping templates:
  - Using test_run1_${tableName}_${currentTimestamp} where currentTimestamp is replaced with a dynamic timestamp value.
    For example, 2025-09-02_05-54-26.166, resulting in multiple buckets, such as test_run1_HISTORY_2025-09-02_05-54-26.166, test_run1_HISTORY_2025-09-02_06-21-46.006, test_run1_NEW_ORDER_2025-09-02_05-54-28.005.
2. Examples of valid bucket mapping templates:
  - Using a constant bucket mapping template, such as testrun1, which would ensure that there is a unique single path for each table within the same single bucket for the entire execution of the replicat.
  - Using a template, such as test_run1_${tableName}, which would result in unique single bucket for each source table, such as test_run1_HISTORY, test_run1_NEW_ORDER.
Co-existence of the components:
When using an external stage location, the location/region of the machine where Replicat process is running and the object store's region will have an impact on the overall throughput of the apply process. For the best possible throughput, the components need to be located ideally in the same region or as close as possible.
DDL not applied on the target table :
Oracle GoldenGate for Distributed Applications and Analytics does not support DDL replication
Handling Timestamp Values Before 1900-01-01:
Oracle AI Data Platform currently do not support timestamp datatype values before 1900-01-01 by default due to its internal date representation.
To enable writing timestamp values before 1900-01-01, set the following property in your cluster configuration:
```
spark.sql.parquet.int96RebaseModeInWrite=CORRECTED
```
Restart the cluster to apply the change.

Parent topic: Oracle AI Data Platform