9.2.29 Oracle AI Data Platform
Overview
Oracle AI Data Platform is a new product offering that brings together data lake, analytics, and orchestration services into one unified platform. With centralized governance, built-in collaboration, and seamless integration across internal and external data sources, it helps enterprises to accelerate their data-driven decision-making and enables them to perform data analytics at scale with greater speed and efficiency.
9.2.29.1 Detailed Functionality
Overview
Replication to Oracle AI Data Platform uses the stage and merge data flow. The change data from the Oracle GoldenGate trails is staged in micro-batches at a temporary staging location (external stage) in OCI Object Storage. The staged records are then merged into the Oracle AI Data Platform target tables using a merge SQL statement. By default, staged records are merged in every 3 minutes.
9.2.29.1.1 Staging Location
The change data records from the Oracle GoldenGate trail files are formatted into Avro OCF (Object Container Format) and are then uploaded to the staging location.
- OCI Object storage
Parent topic: Detailed Functionality
9.2.29.1.2 Database User Privileges
- SELECT, CREATE, INSERT, UPDATE, and DELETE permissions on the target tables.
- SELECT, CREATE, ALTER, and DROP permissions on the external tables.
Parent topic: Detailed Functionality
9.2.29.1.3 Prerequisites
- You must have Oracle Cloud Infrastructure account set up for Oracle AI Data Platform.
- Simba JDBC driver for Apache Spark. You can download the Simba JDBC driver from the cluster detail page of the Oracle AI Data Platform console. For more information, see Simba JDBC Driver for Apache Spark.
Parent topic: Detailed Functionality
9.2.29.2 Configuration
The Oracle AI Data Platform replication properties configuration is stored in the Replicat properties file.
- Automatic Configuration
- Authentication to Oracle AI Data Platform
- Classpath Configuration
- INSERTALLRECORDS Support
- Operation Aggregation
- End-to-End Configuration
- Table Mapping
Parent topic: Oracle AI Data Platform
9.2.29.2.1 Automatic Configuration
Oracle AI Data Platform replication involves configuring multiple components, such as the File Writer Handler, the OCI Event Handler, and the target Oracle AI Data Platform Event Handler.
The Automatic Configuration functionality helps you to configure these components automatically to minimize the manual configuration.
The properties modified by automatic configuration is also logged in the handler log file.
The parameter gg.stage
determines the staging location.
To enable auto-configuration to replicate to the Oracle AI Data Platform
target, set the parameter gg.target=aidp
.
If gg.stage
is not set, the OCI object storage will be used
as the staging location.
The JDBC Metadata provider is also automatically enabled to retrieve target table metadata from the Oracle AI Data Platform.
Target tables are automatically created if missing.
- File Writer Handler Configuration
- OCI Event Handler Configuration
- Oracle AI Data Platform Event Handler Configuration
Parent topic: Configuration
9.2.29.2.1.1 File Writer Handler Configuration
The File Writer Handler name is pre-set to the value
aidp
and its properties are
automatically set to the required values for Oracle
AI Data Platform.
Parent topic: Automatic Configuration
9.2.29.2.1.2 OCI Event Handler Configuration
The OCI Event Handler name is pre-set to the value oci
and
must be configured to match your OCI configuration.
gg.eventhandler.oci.bucketMappingTemplate=container1
Related Topics
Parent topic: Automatic Configuration
9.2.29.2.1.3 Oracle AI Data Platform Event Handler Configuration
The Oracle AI Data Platform Event Handler name is pre-set to the value
aidp
.
The following are the configuration properties available for the Oracle AI Data Platform Event handler, the required ones must be changed to match your Oracle AI Data Platform configuration.
Table 9-36 Oracle AI Data Platform Event Handler Configuration properties
Properties | Required/Optional | Legal Values | Default | Explanation |
---|---|---|---|---|
gg.eventhandler.aidp.connectionURL |
Required |
|
None | Specifies the JDBC URL to connect to Oracle AI Data Platform. |
gg.eventhandler.aidp.maxConnections |
Optional |
Integer Value |
10 |
You can use this parameter to control the number of concurrent JDBC database connections to the target database. |
gg.eventhandler.aidp.connectionRetries |
Optional |
Integer Value |
3 |
Specifies the number of times connections to the target data warehouse will be retried. |
gg.eventhandler.aidp.connectionRetryIntervalSeconds |
Optional |
Integer Value |
30 |
Specifies the delay in seconds between connection retry attempts. |
gg.eventhandler.aidp.createTable |
Optional |
|
|
If the parameter value is set to true. The missing target tables are automatically created. |
gg.eventhandler.aidp.detectMissingBaseRow |
Optional |
|
|
Diagnostic parameter to find the |
gg.eventhandler.aidp.dropStagingTablesOnShutdown |
Optional |
|
False |
If the parameter value is set to true, the temporary staging tables created by Oracle GoldenGate will be dropped on replicat graceful stop. |
gg.handler.aidp.fileRollInterval |
Optional |
The default unit of measure is milliseconds. You can
stipulate |
3m |
The parameter determines how often the data will be merged into AIDP. Caution: Increasing the default value from 3m to higher value will increase the amount of data stored in the Replicat internal memory. This can cause out of memory errors and stop the Replicat process. |
Parent topic: Automatic Configuration
9.2.29.2.2 Authentication to Oracle AI Data Platform
jdbc:spark://<Host>;SparkServerType=AIDP;httpPath=<http-path>[;<setting1>=<value1>;<setting2>=<value2>;<settingN>=<valueN>]
- Oracle AI Data Platform JDBC Driver Compute Settings
- Connection Details for Oracle AI Data Platform Compute Cluster
- Simba JDBC Driver for Apache Spark
- Oracle AI Data Platform Authentication Methods
Parent topic: Configuration
9.2.29.2.2.1 Oracle AI Data Platform JDBC Driver Compute Settings
Table 9-37 Oracle AI Data Platform JDBC Driver compute resource configuration settings
Setting | Description |
---|---|
Host |
Specifies the Oracle AI Data Platform compute resource’s Server hostname value. |
SparkServerType |
Specifies the Oracle AI Data Platform compute resource’s spark server type, that is, AIDP. |
httpPath |
Specifies the Oracle AI Data Platform compute resource’s HTTP path value. |
Parent topic: Authentication to Oracle AI Data Platform
9.2.29.2.2.2 Connection Details for Oracle AI Data Platform Compute Cluster
- Log in to your Oracle AI Data Platform instance.
- In the sidebar, select the required workspace and then click Compute.
- From the list of available clusters, click the target cluster’s name.
- Click on the Connection Details tab that mentions the JDBC URL details and the driver details associated with that cluster.
- In the JDBC tab, the JDBC URL is available for the use.
Parent topic: Authentication to Oracle AI Data Platform
9.2.29.2.2.3 Simba JDBC Driver for Apache Spark
- Log in to your Oracle AI Data Platform instance.
- Select the required workspace.
- Click Compute.
- Select the target cluster’s name from the list of available clusters.
- Click the Connection Details tab that mentions the JDBC URL details and the driver details associated with that cluster.
- From the JDBC tab, select Download JDBC Driver option to obtain the JDBC driver jar.
Note:
Oracle recommends to use the Simba JDBC jar lean version included in the ZIP package downloaded from the Oracle AI Data Platform console. The Simba JDBC jar lean version includes shaded dependencies. These shaded dependencies help prevent dependency resolution conflicts between Oracle GoldenGate and the Simba JDBC driver, thereby ensuring smoother integration and compatibility. The lean Simba JDBC driver jar is available in the downloaded ZIP archive'slibs
directory.
Parent topic: Authentication to Oracle AI Data Platform
9.2.29.2.2.4 Oracle AI Data Platform Authentication Methods
OCI API Key
The Oracle AI Data Platform JDBC driver supports the following authentication methods:
- In your OCI tenancy where AI Data Platform instance is present, navigate to the User icon available at the top-right corner.
- Click User Settings, Tokens & Keys, Add API Key.
- Click Generate API key pair.
- Download the Public and Private keys.
- Click Add to add the Public and Private keys.
- From the displayed window, copy the configuration file preview and
add the contents in the
~/.oci/config
file.Sample OCI Configuration File
[DEFAULT] user=ocid1.user.oc1..mockValue fingerprint=mockFingerPrintValue tenancy=ocid1.compartment.oc1..mockValue region=us-phoenix-1 key_file=<path to your private keyfile>
Note:
- The Simba JDBC driver has default configuration of the
OCI configurations in the
~/.oci/config
file with the profile name as DEFAULT. - If you change any of the property, provide the similar
details in the JDBC URL using the keys,
OCIConfigFile and
OCIProfile
to indicate the new configuration file path and new name of the OCI profile. For Example:jdbc:spark://sample.url.com/default;SparkServerType=AIDP;httpPath=cliservice/mockPath;OCIConfigFile=/tmp/config;OCIProfile=MY_PROFILE
- The Simba JDBC driver has default configuration of the
OCI configurations in the
Parent topic: Authentication to Oracle AI Data Platform
9.2.29.2.3 Classpath Configuration
Oracle AI Data Platform Event Handler uses the Simba JDBC driver for Apache Spark.
Ensure the following for Simba JDBC driver classpath configuration:
- The classpath includes the path to the JDBC driver.
-
Include the OCI object store event handler's dependencies in the classpath.
-
Edit the
gg.classpath configuration
parameter to include the path to the object store event handler dependencies and the Simba JDBC driver for Apache Spark.
Parent topic: Configuration
9.2.29.2.4 INSERTALLRECORDS Support
To enable bulk insert operations for Replicat, set the
INSERTALLRECORDS
parameter in the Replicat parameter file
(.prm
). This optimizes data loading into the target table.
gg.handler.aidp.maxFileSize
with 1GB default value for batch size.gg.handler.aidp.fileRollInterval
with 3 minutes default value for bulk insert frequency
Parent topic: Configuration
9.2.29.2.5 Operation Aggregation
Operation aggregation combines multiple operations on the same row into a single output operation based on a threshold.
9.2.29.2.5.1 In-Memory Operation Aggregation
By default, operation records are aggregated in-memory. You can adjust the merge interval using the gg.handler.aidp.fileRollInterval property. The default value for gg.handler.aidp.fileRollInterval property is 3 minutes.
Note:
The in-memory operation aggregation requires additional JVM memory configuration.Parent topic: Operation Aggregation
9.2.29.2.5.2 SQL-Based Operation Aggregation
Currently, Oracle AI Data Platform does not support the operation aggregation using SQL.
Parent topic: Operation Aggregation
9.2.29.2.6 End-to-End Configuration
The following is an end-to-end configuration example which uses auto-configuration.
Oracle AI Data Platform with OCI Object Storage
# Properties file for Replicat
# Configuration to load GoldenGate trail operation records into AI Data Platform using OCI object store staging location.
# Note: Recommended to only edit the configuration marked as TODO
gg.target=aidp
# OCI Event handler.
#TODO: Edit the OCI region
gg.eventhandler.oci.region=<oci-region>
#TODO: Edit the OCI compartment OCID
gg.eventhandler.oci.compartmentID=<oci-compartment-ocid>
#TODO: Edit the OCI bucket name
gg.eventhandler.oci.bucketMappingTemplate=<oci-bucket-name>
#TODO: Edit the OCI Config Profile name
gg.eventhandler.oci.profile=<oci-profile-name>
#TODO: Edit the OCI Config file path
gg.eventhandler.oci.configFilePath=./oci/config
# AI Data Platform Event Handler.
#TODO: Edit JDBC ConnectionUrl
gg.eventhandler.aidp.connectionURL=<connection-url>
#TODO: Edit the classpath to include OCI Event handler dependencies and Simba JDBC driver.
gg.classpath=/path/to/oci-dep/*:/path/to/simba-jdbc-driver/*
Parent topic: Configuration
9.2.29.2.7 Table Mapping
If the MAP
statement does not specify a target catalog, then the
Oracle AI Data Platform workspace use hive
as the default catalog. The
event handler will use this as the default catalog during initialization and display the
log message as, for example Connection catalog is set to [hive]
.
Table 9-38 Mapping Table
MAP statement in the Replicat parameter file | AIDP Catalog | AIDP Schema | AIDP Table |
---|---|---|---|
MAP SCHEMA_1.TABLE_1, TARGET
"schema_1"."table_1"; |
Default catalog, that is, hive |
schema_1 |
table_1 |
MAP DB_1.SCHEMA_1.TABLE_1, TARGET
"db_1"."schema_1"."table_1" |
db_1 |
schema_1 |
table_1 |
Parent topic: Configuration
9.2.29.3 Troubleshooting and Diagnostics
- Unsupported Oracle AI Data Platform datatypes:
ARRAY
MAP
STRUCT
-
Avoid Creating Bucket Mapping Templates with Dynamic Values:
Avoid creating bucket mapping templates for OCI event handler that use dynamic values, as this may lead to the generation of multiple buckets for a single table. This can result in inconsistent results as the application is using theREFRESH SQL
statement to update the external table with the new available staged file, so the location of all the staging files of a particular table should be consistent.- Examples of invalid bucket mapping templates:
- Using
test_run1_${tableName}_${currentTimestamp}
wherecurrentTimestamp
is replaced with a dynamic timestamp value.For example,
2025-09-02_05-54-26.166
, resulting in multiple buckets, such astest_run1_HISTORY_2025-09-02_05-54-26.166, test_run1_HISTORY_2025-09-02_06-21-46.006, test_run1_NEW_ORDER_2025-09-02_05-54-28.005
.
- Using
- Examples of valid bucket mapping templates:
- Using a constant bucket mapping template, such as
testrun1
, which would ensure that there is a unique single path for each table within the same single bucket for the entire execution of the replicat. - Using a template, such as
test_run1_${tableName}
, which would result in unique single bucket for each source table, such astest_run1_HISTORY, test_run1_NEW_ORDER
.
- Using a constant bucket mapping template, such as
- Examples of invalid bucket mapping templates:
- Co-existence of the components:
When using an external stage location, the location/region of the machine where Replicat process is running and the object store's region will have an impact on the overall throughput of the apply process. For the best possible throughput, the components need to be located ideally in the same region or as close as possible.
- DDL not applied on the target table :
Oracle GoldenGate for Distributed Applications and Analytics does not support DDL replication
- Handling Timestamp Values Before 1900-01-01:
Oracle AI Data Platform currently do not support timestamp datatype values before 1900-01-01 by default due to its internal date representation.
To enable writing timestamp values before 1900-01-01, set the following property in your cluster configuration:
Restart the cluster to apply the change.spark.sql.parquet.int96RebaseModeInWrite=CORRECTED
Parent topic: Oracle AI Data Platform