18 Configure GoldenGate
Depending on your source or target environment setup, you may have to manually configure GoldenGate in order to perform certain tasks in Data Integration Platform Cloud.
Topics:
Set up Your GoldenGate Environment
Data Integration Platform provides four variants of GoldenGate bits, Oracle 12c, 11g, Big Data, and MySQL. Each variant has a setup file for the environment setup.
Note:
Make sure to set gghome in the path to access ggsci prompt :
PATH=%12CGGHOME%;%12CGGHOME%\lib12;%12CGGHOME%\crypto;%PATH%
Configure GoldenGate 11g for Database Cloud Service
Once the basic connectivity is established by the agent to GoldenGate, the following procedure can be used to configure the agent to work with 11g GoldenGateHome:.
- Stop the agent if it’s already running:
ps -ef | grep mgr
. - Stop any manager running in the system. This takes care of port conflicts, if there are any.
- To stop the manager, launch the GGSCI console:
# {GGHOME}/ggsci
. - Stop the manager.
- Set
agentGoldenGateHome=${gghome11g}
in the agent.properties file located under/u01/data/domains/jlsData/${agent_instance_home}/conf
- Start the agent.
Configure GoldenGate 12c for Database Cloud Service
Before you can use an Oracle Database Cloud Service database deployment as a replication target in Oracle GoldenGate Cloud Service, you must configure its database as a valid replication database.
You can configure the database during database deployment creation by selecting Enable Oracle GoldenGate on the Service Details page of the provisioning wizard. If you don’t, you can configure it manually after the database deployment is created by using the dbaascli
utility.
To configure the database manually after the database deployment is created:
Configure GoldenGate Manually to Work between On-Premises and Cloud
This chapter includes the following sections:
Configure Extracts and Data Pumps
Oracle Data Integration Platform Cloud replication requires you to configure Extract and data pump process on source.
-
To connect to the replication node, use one of the following options:
-
Local trail (
ExtTrail
) on the local system -
Remote trail (
RmtTrail
) on the remote system
Note:
Oracle Data Integration Platform Cloud trails support the continuous extraction and replication of database (on-premises or cloud) changes, storing these changes temporarily on cloud. A trail can reside on any platform that Oracle Data Integration Platform Cloud supports. (Oracle, MySQL, and Big Data databases are supported.)
You can configure one Replication node to process a trail for target databases. After all the data has been consumed, Replicat can then purge the data using the
MinKeepDays
parameter. As long as Replicat remains current, your temporary storage requirements for trails can be low. -
-
Format the Trail:
•By default, trails are formatted in canonical format, allowing them to be exchanged rapidly and accurately among databases.
Each trail file contains the following:
-
Record header area: Stored at the beginning of the file and contains information about the trail file itself.
Trail File Information
-
Compatibility level
-
Character set (globalization function with version 11.2.1 and later)
-
Creation time
-
File sequence number
-
File size
First and Last Record Information
-
Timestamp
-
Commit Sequence Number (CSN)
Extract Information
-
Oracle Data Integration Platform Cloud version
-
Group name
-
Host name and Hardware type
-
Operating system type and version
-
DB type, version, and character set
-
-
Record data area: Contains a header area as well as a data area.
-
Checkpoints: Both Extract and Replicat maintain checkpoints into the trails. Checkpoints provide persistent processing whenever a failure occurs. Each process resumes where the last checkpoint was saved, guaranteeing that no data is lost. One Extract can write to one or many trails. One or many Replicat processes are involved in processing each trail.
Note:
Instead of the default canonical format, you can use alternative formats to output data.
This feature is beneficial if database load utilities or other programs are used that require different input format.
These alternative formats include:
-
Logical Change Records (LCRs)
-
FormatASCII
-
FormatSQL
-
FormatXML
-
-
Set Up a View
Objective Command To view the trail file header:
Logdump 1> fileheader on
To view the record header with the data:
Logdump 2> ghdr on
To add column information:
Logdump 3> detail on
To add hexadecimal and ASCII data values to the column list:
Logdump 4> detail data
To control how much record data is displayed:
Logdump 5> reclen 280
-
Keep a log of your session
Objective Command To start and stop the logging of a logdump session, use the Log option:
Logdump> Log to MySession.txt
When enabled, logging remains in effect for all sessions of Logdump until it’s disabled with the
Log Stop
command:Logdump> Log Stop
Supported Scenarios
This table describes different scenarios considering that integrated extract and integrated delivery are not supported on any of the non-Oracle databases.
Source | Target | Extract | Replicat |
---|---|---|---|
Oracle 11.x |
Oracle Database Cloud Service |
Integrated Extract is supported |
Integrated and Coordinated Delivery supported |
Oracle 12.1 |
Oracle Database Cloud Service |
Integrated Extract is supported |
Integrated and Coordinated Delivery supported |
MySQL |
Oracle Database Cloud Service |
Only Classic Extract is supported |
Integrated and Coordinated Delivery supported |
Note:
With Oracle 12.1, when not using multitenancy, you can still use Classic Extract, however, it can’t be used when container/pluggable databases are used.
You can review detailed steps in the tutorial, Replicate On-Premises Data to Cloud with Oracle GoldenGate Cloud Service.
Configure Replication
Oracle Data Integration Platform Cloud replication requires connecting to and configuring database support for the replication node.
Configure a General Replication Process
-
Connect to the node defined in the Manager parameter file mgr.prm located at /u01/app/oracle/gghome/dirprm
-
Avoid using root user to run Data Integration Platform Cloud processes, otherwise some operations can fail to read using 'oracle' user.
In the following table, you can review the parameters and descriptions necessary to configure a replication process.
Parameter | Description |
---|---|
Port: |
Establishes the TCP/IP Port Number on Which Manager Listens For Requests |
DynamicPortList: |
Specifies the ports that Manager can dynamically allocate |
Autostart: |
Specifies the processes to be restarted after abnormal termination |
LagReportHours: |
Sets the interval, in hours, at which Manager checks the lag for Extract and Replicat processing. Alternatively, this interval can be set in minutes. |
LagInfoMinutes: |
Specifies the interval at which Extract and Replicat send an informational message to the event log. Alternatively, this interval can be set in seconds or hours. |
LagCriticalMinutes: |
Specifies the interval at which Extract and Replicat send a critical message to the event log. Alternatively, this interval can be set in seconds or hours. |
PurgeOldExtracts: |
Purges the Data Integration Platform Cloud trails that are no longer needed, based on option settings. |
Note:
If you copy and paste text into parameter files, then beware of editors that try to turn a double-minus into em—dashes.Managing Trail Files Use the PurgeOldExtracts
parameter in the Manager parameter file to purge trail files when Data Integration Platform Cloud has finished processing them.
Note:
Trail files, if not managed properly, can consume a significant amount of disk space!Configure Replication process, using the DMZ (Demilitarized Zone) Server
A DMZ server is a public-facing computer host placed on a separate or isolated network segment. The intention of this server is to provide an addition layer of network security between servers in the trusted network and servers in the public network.
Follow the four high-level steps to configuring a replication from a non-cloud database to cloud.
-
Start the SSH Proxy Server on the DMZ Server.
-
Configure and start the Online Change Capture Process (Extract) on the on-premise server.
-
Configure and start the Data pump Extract on the on-premise Server (SOCKS PROXY pointing to DMZ Server).
-
Configure and start the Online Change Delivery Process (Replicat) on the GGCS server.
1. Start the SSH Proxy Tunnel Setup on the DMZ Server
-
Start the SSH SOCKS Proxy Server on the DMZ Server
Command Syntax:
ssh –i <private_key file> -v –N –f –D <listening IP Address>:<listening IP port><GGCS Oracle User>@<GGCS IP Address>><socksproxy output file>2>&1
Parameter Description –i
Private Key file.
-v
Verbose Mode.
–N
Don’t execute remote command (used for port forwarding).
–f
Run SSH process in the background.
–D
Specifies to run as local dynamic application level forwarding; act as a SOCKS proxy server on a specified interface and port.
2>&1
Redirect Stdout and Stderr to the output filelistening.
IP Address
DMZ Server IP Address.
listening IP port
TCP/IP Port Number.
-
Verify that the SSH SOCKS Proxy server has started successfully.
Check the socks proxy output file using the
cat
Unix command.Look for the:
Local connections to <dmz-server:port>
andLocal forwarding listening on <ip_address> port <port #>
.This information helps you to make sure you’re pointing to the right DMZ server address and port.
2. Configure and Start the Online Change Capture Process (Extract) on the On-premises Server
On the source server, create the online change capture (Extract) process, using the following commands:
GGCSI> add extract etpcadb, tranlog, begin now
GGSCI> add exttrail ./dirdat/ea, extract etpcadb, megabytes 100
GGSCI> start extract etpcadb
GGSCI> info extract etpcadb detail
3. Configure and Start the Data pump Extract on the On-premise Server
On the source server, create the datapump (Extract) process, using the following commands:
GGCSI> add extract ptpcadb, exttrailsource ./dirdat/ea
GGSCI> add rmttrail ./dirdat/pa, extract ptpcadb, megabytes 100
GGSCI> start extract ptpcadb
GGSCI> info extract ptpcadb detail
4. Configure and Start the Online Change Delivery Process (Replicat) on the Cloud Server.
On the Data Integration Platform Cloud server, create the Change Delivery process (Replicat) using the following commands:
GGCSI> dblogin useridalias dipcuser_alias
GGSCI> add replicat rtpcadb integrated, exttrail ./dirdat/pa
GGSCI> start replicat rtpcadb
GGSCI> info replicat rtpcadb detail
You can review this detailed steps by following the tutorial Replicate On-Premises Data to Cloud with Oracle GoldenGate Cloud Service.
Configure GoldenGate for MySQL Server Targets
This chapter provides instructions for preparing your system for running Oracle GoldenGate. It contains the following sections:
Ensure Data Availability
Retain enough binary log data so that if you stop Extract or there is an unplanned outage, Extract can start again from its checkpoints. Extract must have access to the binary log that contains the start of the oldest uncommitted unit of work, and all binary logs thereafter. The recommended retention period is at least 24 hours worth of transaction data, including both active and archived information. You might need to do some testing to determine the best retention time given your data volume and business requirements.
If data that Extract needs during processing was not retained, either in active or backup logs, one of the following corrective actions might be required:
-
Alter Extract to capture from a later point in time for which binary log data is available (and accept possible data loss on the target).
-
Resynchronize the source and target tables, and then start the Oracle GoldenGate environment over again.
To determine where the Extract checkpoints are, use the INFO EXTRACT
command. For more information, see GGSCI Command Interface in Reference for Oracle GoldenGate.
Set Logging Parameters
To capture from the MySQL transaction logs, the Oracle GoldenGate Extract process must be able to find the index file. index file in turn contains the paths of all binary log files.
Note:
Extract expects that all of the table columns are in the binary log. As a result, only binlog_row_image
set as full
is supported and this is the default. Other values of binlog_row_image
are not supported.
Extract checks the following parameter settings to get this index file path:
Add Host Names
Oracle GoldenGate gets the name of the database it is supposed to connect to from the SOURCEDB
parameter. A successful connection depends on the localhost entry being properly configured in the system host file. To avoid issues that arise from improper local host configuration, you can use SOURCEDB
in the following format:
SOURCEDB
database_name
@
host_name
Where: database_name
is the name of the MySQL instance, and host_name
is the name or IP address of the local host. If using an unqualified host name, that name must be properly configured in the DNS database. Otherwise, use the fully qualified host name, for example myhost.company.com
.
Set the Session Character Set
The GGSCI
, Extract and Replicat processes use a session character set when connecting to the database. For MySQL, the session character set is taken from the SESSIONCHARSET
option of SOURCEDB
and TARGETDB
. Make certain you specify a session character set in one of these ways when you configure Oracle GoldenGate.
Configure Bi-directional Replication
In a bi-directional configuration, there are Extract and Replicat processes on both the source and target systems to support the replication of transactional changes on each system to the other system. To support this configuration, each Extract must be able to filter the transactions applied by the local Replicat, so that they are not recaptured and sent back to their source in a continuous loop. Additionally, AUTO_INCREMENT
columns must be set so that there is no conflict between the values on each system.
Other Oracle GoldenGate Parameters for MySQL
The following parameters may be of use in MySQL installations, and might be required if non-default settings are used for the MySQL database. Other Oracle GoldenGate parameters will be required in addition to these, depending on your intended business use and configuration.
Table 18-1 Other Parameters for Oracle GoldenGate for MySQL
Parameter | Description |
---|---|
|
Required to specify to the VAM the TCP/IP connection port number of the MySQL instance to which an Oracle GoldenGate process must connect if MySQL is not running on the default of 3306. DBOPTIONS CONNECTIONPORT 3307 |
|
Specifies the DNS name or IP address of the system hosting MySQL to which Replicat must connect. |
|
Prevents Replicat from abending when replicated LOB data is too large for a target MySQL |
|
Specifies database connection information consisting of the database, user name and password to use by an Oracle GoldenGate process that connects to a MySQL database. If MySQL is not running on the default port of 3306, you must specify a complete connection string that includes the port number:
If you are not running the MySQL database on port 3306, you must also specify the connection port of the MySQL database in the DBLOGIN SOURCEDB dbname@hostname:port, USERID user, PASSWORD password For example: GGSCI> DBLOGIN SOURCEDB mydb@mymachine:3307, USERID myuser, PASSWORD mypassword |
|
To enable Replicat to bypass the MySQL connection timeout, configure the following command in a
Where: |
See Oracle GoldenGate Parameters in Reference for Oracle GoldenGate.
See Introduction to Oracle GoldenGate in Administering Oracle GoldenGate.
Prepare Tables for Processing
This section describes how to prepare the tables for processing. Table preparation requires these tasks:
Position Extract to a Specific Start Point
You can position the ADD EXTRACT
and ALTER EXTRACT
commands to a specific start point in the transaction logs with the following command.
{ADD | ALTER EXTRACT}
group
, VAM, LOGNUM
log_num
, LOGPOS
log_pos
-
group
is the name of the Oracle GoldenGate Extract group for which the start position is required. -
log_num
is the log file number. For example, if the required log file name istest.000034
, this value is 34. Extract will search for this log file. -
log_pos
is an event offset value within the log file that identifies a specific transaction record. Event offset values are stored in the header section of a log record. To position at the beginning of abinlog
file, set thelog_pos
as 4. Thelog_pos
0 or 1 are not valid offsets to start reading and processing.
In MySQL logs, an event offset value can be unique only within a given binary file. The combination of the position value and a log number will uniquely identify a transaction record and cannot exceed a length of 37. Transactional records available after this position within the specified log will be captured by Extract. In addition, you can position an Extract using a timestamp.
Change the Log-Bin Location
Modifying the binary log location by using the log-bin
variable in the MySQL configuration file might result in two different path entries inside the index file, which could result in errors. To avoid any potential errors, change the log-bin location by doing the following:
- Stop any new DML operations.
- Let the extract finish processing all of the existing binary logs. You can verify this by noting when the checkpoint position reaches the offset of the last log.
- After Extract finishes processing the data, stop the Extract group and, if necessary, back up the binary logs.
- Stop the MySQL database.
- Modify the
log-bin
path for the new location. - Start the MySQL database.
- To clean the old log name entries from index file, use
flush master
orreset master
(based on your MySQL version). - Start Extract.
Configure GoldenGate for Big Data Targets
This procedure helps you to configure and run Oracle GoldenGate for Big Data to extend the capabilities of Oracle GoldenGate instances. Oracle GoldenGate for Big Data supports specific Big Data handler configurations. It details how each of the Oracle GoldenGate for Big Data Handlers are compatible with the various data collections including distributions, database releases, and drivers.
See the following table for more information.
Topic | Description |
---|---|
This chapter provides an introduction to Oracle GoldenGate for Big Data concepts and features. It includes how to verify and set up the environment, use it with Replicat, logging data, and other configuration details. |
|
This chapter explains the Cassandra Handler and includes examples so that you can understand this functionality. |
|
This chapter explains the Elasticsearch Handler and includes examples so that you can understand this functionality. |
|
This chapter explains the Flume Handler and includes examples so that you can understand this functionality. |
|
The HBase Handler allows you to populate HBase tables from existing Oracle GoldenGate supported sources. |
|
This chapter explains the HDFS Handler, which is designed to stream change capture data into the Hadoop Distributed File System (HDFS). |
|
The Generic Java Database Connectivity (JDBC) is a handler that lets you replicate source transactional data to a target system or database. This chapter explains the Java Database Connectivity (JDBC) Handler and includes examples so that you can understand this functionality. |
|
This chapter explains the Kafka Handler and includes examples so that you can understand this functionality. |
|
This chapter explains the Kafka Connect Handler and includes examples so that you can understand this functionality. |
|
This chapter explains the Kinesis Streams Handler and includes examples so that you can understand this functionality. |
|
This chapter explains the MongoDB Handler and includes examples so that you can understand this functionality. |
|
This chapter explains the Metadata Provider functionality, different types of Metadata Providers, and examples that can be used to understand the functionality. |
|
Formatters provide the functionality to convert operations from the Oracle GoldenGate trail file info formatted messages that can then be sent to Big Data targets by one of the Oracle GoldenGate for Big Data Handlers. |
|
This appendix lists the Cassandra client dependencies for Apache Cassandra. |
|
This appendix lists the Elasticsearch transport client dependencies. |
|
This appendix lists the Flume client dependencies for Apache Flume. |
|
This appendix lists the HBase client dependencies for Apache HBase. The hbase-client-x.x.x.jar is not distributed with Apache HBase nor is it mandatory to be in the classpath. The hbase-client-x.x.x.jar is an empty maven project with the purpose of aggregating all of the HBase client dependencies. |
|
This appendix lists the HDFS client dependencies for Apache Hadoop. The hadoop-client-x.x.x.jar is not distributed with Apache Hadoop nor is it mandatory to be in the classpath. The hadoop-client-x.x.x.jar is an empty maven project with the purpose of aggregating all of the Hadoop client dependencies. |
|
This appendix lists the Kafka client dependencies for Apache Kafka. |
|
This appendix lists the Kafka Connect client dependencies for Apache Kafka. |
|
Oracle GoldenGate recommends that you use the 3.2.2 MongoDB Java Driver integration with MongoDB using mongo-java-driver-3.2.2.jar. You can download this driver from: |
|
Understanding the Java Adapter and Oracle GoldenGate for Big Data |
The Oracle GoldenGate Java Adapter is the overall framework. It allows you to implement custom code to handle Oracle GoldenGate trail records, according to their specific requirements. It comes built-in with Oracle GoldenGate File Writer module that can be used for flat file integration purposes. |