6 Replicate Data
The Replicate Data Task of Oracle Data Integration Platform Cloud captures new transactions in the source and streams them to target.
What is Replicate Data?
The Replicate Data Task of Oracle Data Integration Platform Cloud captures changes in a data source and updates the target in real time with that change.
In the Replicate Data task you select a source and a target connection for your task. Then, from the moment that you run this task, any new transaction in the source data is captured and delivered to the target. This task doesn't perform an initial copy of the source, so you'll get all the changes from the point of time that you started your job.
Here are some examples of when you would use this task:
- Cloud on-boarding: Replicate on-premises data to new cloud applications in real-time using Oracle Database Cloud Classic or Autonomous Data Warehouse Cloud.
- Real-time reporting: Capture real-time data from multiple on-premises data sources and deliver to Oracle Database Cloud Classic or Autonomous Data Warehouse Cloud to create reports.
- Query off-loading: Off-load production transactions to Oracle Database Cloud Classic.
- Real-time data warehousing: With Oracle database (on-premises or cloud) as your source and Oracle Autonomous Warehouse Cloud as your target, replicate data to a warehouse to set up a staging environment for downstream ETL or real-time data warehousing.
- High availability: Create a multi-site or multi-cloud high availability strategy with synchronized data sources.
- Streaming to Kafka topics: Get streams of data into your Kafka topics for a real time data analysis With Kafka Connect as a target. Along with the data that's streamed to your topics, customize delivery of additional information about the changed data such as what kind of transaction was performed on that data and when it was committed.
- Complex event processing and data analytics: By customizing what data should be sent to Kafka topics, you can prepare it to be further analyzed with Oracle Stream Analytics (OSA). For example, you can send real time information about clients, from your databases to Kafka topics and then with OSA, send real time message offers to clients when they enter a shopping center.
What’s Certified for Replicate Data?
Review the supported agents, data sources, and limitations before choosing your source and target for Replicate Data in Oracle Data Integration Platform Cloud.
-
All data sources must have x86_64, the 64 bit version of x86 operating systems, with the latest upgrade.
-
Kafka Connect target can only be on-premises, because Kafka Connect is not certified on Oracle Big Data Cloud yet.
-
For Kafka Connect data sources, a remote agent must be set up on the same machine as Kafka Connect, and Kafka Connect is on-premises only, so the agent must be configured on-premises.
Connection type | Version | OEL | RHEL | SLES | Windows | Source | Target |
---|---|---|---|---|---|---|---|
Autonomous Data Warehouse Cloud |
12.2 |
6.x |
6.x, 7.x |
11, 12 |
no |
no |
yes |
Oracle Database Cloud Classic |
12.2 |
6.x |
no |
no |
no |
yes |
yes |
Oracle Database Cloud Classic |
12.1 |
6.x |
no |
no |
no |
yes |
yes |
Oracle Database Cloud Classic |
11.2 |
6.x |
no |
no |
no |
yes |
yes |
Oracle Database |
12.2 |
6.x |
6.x, 7.x |
11, 12 |
no |
yes |
yes |
Oracle Database |
12.1 |
6.x |
6.x, 7.x |
11, 12 |
no |
yes |
yes |
Oracle Database |
11.2.0.4 |
6.x |
6.x, 7.x |
11, 12 |
no |
yes |
yes |
Kafka Connect by Confluent |
4.1.x |
6.x, 7.x |
6.x, 7.x |
11, 12 |
no |
no |
yes |
Kafka Connect by Confluent |
4.0.0 |
6.x, 7.x |
6.x, 7.x |
11, 12 |
no |
no |
yes |
Kafka Connect by Confluent |
3.2.x |
6.x, 7.x |
6.x, 7.x |
11, 12 |
no |
no |
yes |
Kafka Connect by Confluent |
3.1.x |
6.x, 7.x |
6.x, 7.x |
11, 12 |
no |
no |
yes |
Kafka Connect by Confluent |
3.0.x |
6.x, 7.x |
6.x, 7.x |
11, 12 |
no |
no |
yes |
After you verify your data source operating systems and versions, then you must set up agents only for data sources that are certified for your tasks. Check the operating systems that the agents are certified to run on, to ensure the agent install location is appropriate and connects to your data source. See Agent Certifications.
Replicate Data Task Limitations
Review the Replicate Data Task limitations before you perform the task.
- Replicate Data task doesn't include an initial load and captures the change in data from the moment that the job is started. If you want do an intial load from one data source to another before you synchronize them, then run a Synchronize Data Task and select the Initial Load option. See Synchronize Data.
- You can't always use the Synchronize Data task for an initial load of a data source that you want to use for Replicate Data, because certified data sources for Synchronize and Replicate Data are not the same. For example, you can stream transactions to Autonomous Data Warehouse Cloud, but you can't do an initial load to deliver data to it by using Synchronize Data.
- To replicate data you can either use one or two agents. If you use two agents, you set up one for the source and one for the target. If you use one agent, then you must provide both source and target information in the
agent.properties
file for that agent. A classpath points to a local directory and a Kafka Connect connection depends on Kafka libraries. Therefore, in a one agent option, you must install the DIPC agent on the Kafka machine. - The change in the source database isn't available for processing until the Data Manipulation Language (DML) commands such as insert, update and delete have been committed. So in that respect only transactions are replicated in Replicate Data. Data Definition Language (DDL) operations such as drop or create are not part of the replication process.
- Replicate Data is one to one and uni-directional, which means that it accepts one source and one target for each task. If you want to include several sources to deliver data to one target, you can create a Replicate Data Task for each source and have them all deliver to the same target.
Before You Replicate Data
To create a Replicate Data Task, you must first set up your agent(s) with their appropriate components and then create Connections to your source and target data sources.
Download, Register and Set up Agents
Oracle Data Integration Platform Cloud communicates with your data sources by offering agents to be installed on data sources or on machines with access to those data sources. These agents orchestrate tasks and communicate information among data sources and the Data Integration Platform Cloud server. You must download the following replication components with your agent download.
-
Oracle 12c (OGG) for Oracle Database 12.1, 12.2 and Autonomous Data Warehouse
-
Big Data (OGG) for Kafka Connect
Create Connections
When you create a connection, you enter the connectivity details for your data sources, so you can use them for different tasks when needed. Here's a list of connections that you can create for the Replicate Data Task. Select the one that works for you.
Generate Wallet and Master Key
For your data to be encrypted, when it's traveling across the network during the Replicate Data Task, you must generate a wallet with a master key.
By default, the data that the Replicate Data Task sends across the network is not encrypted. Data Integration Platform Cloud's agents can encrypt captured data from the source, before it's sent across the network, and then decrypt it in the target. You can select an Advanced Encryption Standard (AES) option when you set up your Replicate Data Task. However, you must also generate a wallet with a master key through your agent processes. The agents use the master key in the source to encrypt the data. You must copy the same master key to the target wallet directory, so that the target agent can decrypt the data with the same key.
To copy the same master key to the target wallet directory:
Troubleshoot
If ggsci
command doesn't work in the source agent environment, then add the path to the oci
directory to your library path.
The library paths and the ggsci
command are automatically applied after you register and start your agent, but if you haven't started your agent, then you can manually add the path to the oci
directory.
Run the following command from the gghome
or gghome11g
directory first and then run your ggsci
command.
[xxx:gghome] export LD_LIBRARY_PATH=<agent_unzip_loc>/dicloud/oci
[xxx:gghome] $ggsci
What's Auto-Match?
Auto-match is a mapping pattern in Replicate Data Task of Data Integration Platform Cloud.
Auto-match looks at the mapping rule in the source section of the Replicate Data Task and only keeps the data entities from the filtered source schema that match the target schema. This mapping pattern is applied to Oracle Database, Database Cloud Classic and Autonomous Data Warehouse Cloud targets.
Because Data Definition Language (DDL) operations such as drop or create are not part of the replication process, he Replicate Data Task doesn't create missing tables. Therefore, you must define the tables that are part of the replication, in the target schema before you run a Replicate Data Task.
Let's suppose the mapping rule is Include SRC.*
. If SRC
has tables A1, A2
and A3
, then auto-match will attempt to match each of these data entities with the target. If your target schema TGT
, has tables A1, A2, B1
and B2
, then only the data changes in tables A1
and A2
will be updated in the target.
If there is a transaction with an insert in SRC.A1
, then that change will be updated in TGT.A1
. If there are changes in SRC.A3
, then those changes will be ignored because the target doesn't have table A3
. Tables B1
and B2
in the target are not affected, because there are no B1
and B2
in the source to replicate their change in the target.
Monitor a Replicate Data Task
When you run a Replicate Data Task, you create a Replicate Data job. Replicate Data jobs are listed in the Replicate Data tab of the Monitor page. Click a task name to see its job details.
A Replicate Data job is composed of the following actions:
-
Initialize Capture: The Replicate Task plugin starts some applications for the capture process on the source. After the initialization is complete, status should be Successful.
-
Initialize OGG Data Pump: The Replicate Task plugin starts a process on the source to send data to the target. This process only applies to Kafka targets. After the initialization is complete, status should be Successful.
-
Initialize Delivery: The Replicate Task plugin starts some applications for the delivery process on the target. After the initialization is complete, status should be Successful.
-
Start Capture: The capture process starts when it detects a change in the source data entity that's to be mapped. The status should be Running.
-
Start OGG Data Pump: The Pump process sends the captured change data to the target. This action applies only to Kafka targets. The status should be Running.
-
Start delivery: The delivery process starts after the capture process has captured the change in the source data entity that's to be mapped. The status should be Running.
The actions are in the Summary page of every job.
See Monitor Jobs for more information.