Replicate Data

What is Replicate Data?

The Replicate Data Task of Oracle Data Integration Platform Cloud captures changes in a data source and updates the target in real time with that change.

In the Replicate Data task you select a source and a target connection for your task. Then, from the moment that you run this task, any new transaction in the source data is captured and delivered to the target. This task doesn't perform an initial copy of the source, so you'll get all the changes from the point of time that you started your job.

Here are some examples of when you would use this task:

Cloud on-boarding: Replicate on-premises data to new cloud applications in real-time using Oracle Database Cloud Classic or Autonomous Data Warehouse Cloud.
Real-time reporting: Capture real-time data from multiple on-premises data sources and deliver to Oracle Database Cloud Classic or Autonomous Data Warehouse Cloud to create reports.
Query off-loading: Off-load production transactions to Oracle Database Cloud Classic.
Real-time data warehousing: With Oracle database (on-premises or cloud) as your source and Oracle Autonomous Warehouse Cloud as your target, replicate data to a warehouse to set up a staging environment for downstream ETL or real-time data warehousing.
High availability: Create a multi-site or multi-cloud high availability strategy with synchronized data sources.
Streaming to Kafka topics: Get streams of data into your Kafka topics for a real time data analysis With Kafka Connect as a target. Along with the data that's streamed to your topics, customize delivery of additional information about the changed data such as what kind of transaction was performed on that data and when it was committed.
Complex event processing and data analytics: By customizing what data should be sent to Kafka topics, you can prepare it to be further analyzed with Oracle Stream Analytics (OSA). For example, you can send real time information about clients, from your databases to Kafka topics and then with OSA, send real time message offers to clients when they enter a shopping center.

What’s Certified for Replicate Data?

Review the supported agents, data sources, and limitations before choosing your source and target for Replicate Data in Oracle Data Integration Platform Cloud.

All data sources must have x86_64, the 64 bit version of x86 operating systems, with the latest upgrade.
Kafka Connect target can only be on-premises, because Kafka Connect is not certified on Oracle Big Data Cloud yet.
For Kafka Connect data sources, a remote agent must be set up on the same machine as Kafka Connect, and Kafka Connect is on-premises only, so the agent must be configured on-premises.

Connection type	Version	OEL	RHEL	SLES	Windows	Source	Target
Autonomous Data Warehouse Cloud	12.2	6.x	6.x, 7.x	11, 12	no	no	yes
Oracle Database Cloud Classic	12.2	6.x	no	no	no	yes	yes
Oracle Database Cloud Classic	12.1	6.x	no	no	no	yes	yes
Oracle Database Cloud Classic	11.2	6.x	no	no	no	yes	yes
Oracle Database	12.2	6.x	6.x, 7.x	11, 12	no	yes	yes
Oracle Database	12.1	6.x	6.x, 7.x	11, 12	no	yes	yes
Oracle Database	11.2.0.4	6.x	6.x, 7.x	11, 12	no	yes	yes
Kafka Connect by Confluent	4.1.x	6.x, 7.x	6.x, 7.x	11, 12	no	no	yes
Kafka Connect by Confluent	4.0.0	6.x, 7.x	6.x, 7.x	11, 12	no	no	yes
Kafka Connect by Confluent	3.2.x	6.x, 7.x	6.x, 7.x	11, 12	no	no	yes
Kafka Connect by Confluent	3.1.x	6.x, 7.x	6.x, 7.x	11, 12	no	no	yes
Kafka Connect by Confluent	3.0.x	6.x, 7.x	6.x, 7.x	11, 12	no	no	yes

After you verify your data source operating systems and versions, then you must set up agents only for data sources that are certified for your tasks. Check the operating systems that the agents are certified to run on, to ensure the agent install location is appropriate and connects to your data source. See Agent Certifications.

Replicate Data Task Limitations

Review the Replicate Data Task limitations before you perform the task.

Replicate Data task doesn't include an initial load and captures the change in data from the moment that the job is started. If you want do an intial load from one data source to another before you synchronize them, then run a Synchronize Data Task and select the Initial Load option. See Synchronize Data.
You can't always use the Synchronize Data task for an initial load of a data source that you want to use for Replicate Data, because certified data sources for Synchronize and Replicate Data are not the same. For example, you can stream transactions to Autonomous Data Warehouse Cloud, but you can't do an initial load to deliver data to it by using Synchronize Data.
To replicate data you can either use one or two agents. If you use two agents, you set up one for the source and one for the target. If you use one agent, then you must provide both source and target information in the agent.properties file for that agent. A classpath points to a local directory and a Kafka Connect connection depends on Kafka libraries. Therefore, in a one agent option, you must install the DIPC agent on the Kafka machine.
The change in the source database isn't available for processing until the Data Manipulation Language (DML) commands such as insert, update and delete have been committed. So in that respect only transactions are replicated in Replicate Data. Data Definition Language (DDL) operations such as drop or create are not part of the replication process.
Replicate Data is one to one and uni-directional, which means that it accepts one source and one target for each task. If you want to include several sources to deliver data to one target, you can create a Replicate Data Task for each source and have them all deliver to the same target.

Before You Replicate Data

To create a Replicate Data Task, you must first set up your agent(s) with their appropriate components and then create Connections to your source and target data sources.

Download, Register and Set up Agents

Oracle Data Integration Platform Cloud communicates with your data sources by offering agents to be installed on data sources or on machines with access to those data sources. These agents orchestrate tasks and communicate information among data sources and the Data Integration Platform Cloud server. You must download the following replication components with your agent download.

Oracle 12c (OGG) for Oracle Database 12.1, 12.2 and Autonomous Data Warehouse
Big Data (OGG) for Kafka Connect

Create Connections

When you create a connection, you enter the connectivity details for your data sources, so you can use them for different tasks when needed. Here's a list of connections that you can create for the Replicate Data Task. Select the one that works for you.

Generate Wallet and Master Key

For your data to be encrypted, when it's traveling across the network during the Replicate Data Task, you must generate a wallet with a master key.

By default, the data that the Replicate Data Task sends across the network is not encrypted. Data Integration Platform Cloud's agents can encrypt captured data from the source, before it's sent across the network, and then decrypt it in the target. You can select an Advanced Encryption Standard (AES) option when you set up your Replicate Data Task. However, you must also generate a wallet with a master key through your agent processes. The agents use the master key in the source to encrypt the data. You must copy the same master key to the target wallet directory, so that the target agent can decrypt the data with the same key.

To copy the same master key to the target wallet directory:

On the environment that is hosting your running DIPC agent for the source:
- For Oracle 12c: Navigate to the gghome folder of your agent, located in <agent_unzip_loc>/dicloud/gghome
- For Oracle 11g: Navigate to the gghome11g folder of your agent, located in <agent_unzip_loc>/dicloud/gghome11g.

Enter the following commands:

[xxx:gghome] $ggsci
GGSCI (xxx) 1 > create wallet
Created Wallet
Opened Wallet
GGSCI (xxx) 2 > add masterkey
<current time> INFO xxx created version 1 of master key 'OGG_DEFAULT_MASTERKEY' in Oracle Wallet.

GGSCI (xxx) 3 > info masterkey
Masterkey Name: OGG_DEFAULT_MASTERKEY

Version   Creation Date    Status
1         <creation time>  Current

GGSCI (xxx) 4 > exit

Ensure that the wallet, cwallet.sso is created in the dirwlt directory:
```
[xxx:gghome] cd dirwlt
[xxx:dirwlt] ls
cwallet.sso
```
Copy the cwallet.sso to the the environment that is hosting your running DIPC agent for the target in the following location:
- For Oracle 12c and Autonomous Data Warehouse Cloud: <agent_unzip_loc>/dicloud/gghome/dirwlt
- For Oracle 11g: <agent_unzip_loc>/dicloud/gghome11g/dirwlt
- For Kafka Connect: <agent_unzip_loc>/dicloud/gghomebigdata/dirwlt

Troubleshoot

If ggsci command doesn't work in the source agent environment, then add the path to the oci directory to your library path.

The library paths and the ggsci command are automatically applied after you register and start your agent, but if you haven't started your agent, then you can manually add the path to the oci directory.

Run the following command from the gghome or gghome11g directory first and then run your ggsci command.

[xxx:gghome] export LD_LIBRARY_PATH=<agent_unzip_loc>/dicloud/oci
[xxx:gghome] $ggsci

What's Auto-Match?

Auto-match is a mapping pattern in Replicate Data Task of Data Integration Platform Cloud.

Auto-match looks at the mapping rule in the source section of the Replicate Data Task and only keeps the data entities from the filtered source schema that match the target schema. This mapping pattern is applied to Oracle Database, Database Cloud Classic and Autonomous Data Warehouse Cloud targets.

Because Data Definition Language (DDL) operations such as drop or create are not part of the replication process, he Replicate Data Task doesn't create missing tables. Therefore, you must define the tables that are part of the replication, in the target schema before you run a Replicate Data Task.

Let's suppose the mapping rule is Include SRC.*. If SRC has tables A1, A2 and A3, then auto-match will attempt to match each of these data entities with the target. If your target schema TGT, has tables A1, A2, B1 and B2, then only the data changes in tables A1 and A2 will be updated in the target.

If there is a transaction with an insert in SRC.A1, then that change will be updated in TGT.A1. If there are changes in SRC.A3, then those changes will be ignored because the target doesn't have table A3. Tables B1 and B2 in the target are not affected, because there are no B1 and B2 in the source to replicate their change in the target.

Create a Replicate Data Task

Here’s how you can set up a Replicate Data Task:

From the Home page or the Create menu in the Catalog, select Create Replicate Data.
Name and describe your task in the General Information section. The Identifier is to identify this task through a string with no spaces. If you want to edit this auto-generated field, then you must only include capital letters, numbers, and underscores (_) with no space.
Select an encryption level for the captured data that's sent across the network. The higher the number, the harder it is for hackers to decrypt the data. Choosing None will send your original data, without encryption to the target.
If you choose an encryption level, then ensure that you have generated a wallet according to Generate Wallet and Master Key instructions, or you will generate one, before you run the task.
Click Design to select your connections and mapping pattern.
In the View section, click Source to display properties for Source.
In the Properties for Source, in the Details tab, select a Connection that serves as the source for this task:
In the Properties for Source, in the Schemas tab, select a schema from the list and then click the plus icon to add it as a mapping rule.

For example. if you select schema A and then click the plus icon, the rule Include A.* appears in the Rule Applied section.

Only one rule is allowed, so only one schema can be included in a mapping rule. You can further narrow down a mapping rule for a schema by including a pattern. For example, instead of A.*, you can edit the field to A.EMP* to only include tables in schema A that start with EMP.

To reset a mapping rule, click Reset. The default rule uses the schema assigned to the Source Connection. You can edit the default schema for a Connection in its detail page or override the rule by replacing it with another schema by using the plus icon.
In the View section, click Target to display properties for target.
In the Properties for Target, in the Details tab, select a connection that serves as the target for this task.
In the Properties for Target, in the Schemas/Topics tab:
- For an Oracle target:
  
  Select a schema in the target for an auto-match mapping pattern. See What's Auto-Match?
- For a Kafka Connect target:
  
  For Mapping Pattern and Key Mapping Pattern, either select an option from the drop down menu or enter your own patterns.
  
  You can override both of these fields. The default value for these fields come from the Topic Mapping Template and Key Mapping Template fields in the Kafka Connect Connection.
  
  Any mapping pattern that you enter can be the name of a topic or a key, so consider the power and the responsibility of defining your own mapping patterns. For example, you can name the mapping pattern ABC_CHANGES and all the delivered data will go to a topic called ABC_CHANGES. You can partition this topic based on the key. For example ${table_name} and that way all changes for the same table will go to the same partition. Because any value is possible, these fields are not validated. If you want your topics to be resolved based on predefined variables, then you must enter them correctly. There is no pre-check. For example, if instead of ${schemaName}_${tableName}, you enter schemaName_tableName for the Mapping Pattern, then the topic name will not resolve to the schema and table names. Instead, you will have one topic called schemaName_tableName and all the changed data is saved in this topic.
Do one of the following:
- Click Save to save your task and run it later.
- Click Save & Run, to save your task and create a job from this Replicate Data Task.

Monitor a Replicate Data Task

When you run a Replicate Data Task, you create a Replicate Data job. Replicate Data jobs are listed in the Replicate Data tab of the Monitor page. Click a task name to see its job details.

A Replicate Data job is composed of the following actions:

Initialize Capture: The Replicate Task plugin starts some applications for the capture process on the source. After the initialization is complete, status should be Successful.
Initialize OGG Data Pump: The Replicate Task plugin starts a process on the source to send data to the target. This process only applies to Kafka targets. After the initialization is complete, status should be Successful.
Initialize Delivery: The Replicate Task plugin starts some applications for the delivery process on the target. After the initialization is complete, status should be Successful.
Start Capture: The capture process starts when it detects a change in the source data entity that's to be mapped. The status should be Running.
Start OGG Data Pump: The Pump process sends the captured change data to the target. This action applies only to Kafka targets. The status should be Running.
Start delivery: The delivery process starts after the capture process has captured the change in the source data entity that's to be mapped. The status should be Running.

The actions are in the Summary page of every job.

See Monitor Jobs for more information.