1 Introduction to GoldenGate for Big Data

This chapter provides an introduction to Oracle GoldenGate for Big Data concepts and features. It includes how to verify and set up the environment, use it with Replicat, logging data, and other configuration details. It contains the following sections:

Introduction
Understanding What is Supported
Setting Up Oracle GoldenGate for Big Data
Configuring GoldenGate for Big Data

1.1 Introduction

The Oracle GoldenGate for Big Data integrations run as pluggable functionality into the Oracle GoldenGate Java Delivery framework, also referred to as the Java Adapters framework. This functionality extends the Java Delivery functionality. Oracle recommends that you review the Java Delivery description in the Administering Oracle GoldenGate Application Adapters.

1.2 Understanding What is Supported

Oracle GoldenGate for Big Data supports specific configurations, the handlers are compatible with clearly defined software versions, and there are many support topics. This section provides all of the relevant support information.

Topics:

1.2.1 Verifying Certification and System Requirements

Make sure that you are installing your product on a supported hardware or software configuration. For more information, see the certification document for your release on the Oracle Fusion Middleware Supported System Configurations page.

Oracle has tested and verified the performance of your product on all certified systems and environments; whenever new certifications occur, they are added to the proper certification document right away. New certifications can occur at any time, and for this reason the certification documents are kept outside of the documentation libraries and are available on Oracle Technology Network.

1.2.2 Understanding Handler Compatibility

This section describes how each of the Oracle GoldenGate for Big Data Handlers are compatible with the various data collections including distributions, database releases, and drivers.

Topics:

1.2.2.1 HDFS Handler

The HDFS Handler is designed to work with the following versions :

Distribution	Version
Apache Hadoop	2.7.x 2.6.0 2.5.x 2.4.x 2.3.0 2.2.0 3.0.0-alpha 1
Hortonworks Data Platform (HDP)	HDP 2.5 (HDFS 2.7.3) HDP 2.4 (HDFS 2.7.1) HDP 2.3 (HDFS 2.7.1) HDP 2.2 (HDFS 2.6.0) HDP 2.1 (HDFS 2.4.0)
Cloudera Distribution Include Apache Hadoop (CDH)	CDH 5.8.x (HDFS 2.6.0) CDH 5.7.x (HDFS 2.6.0) CDH 5.6.x (HDFS 2.6.0) CDH 5.5.x (HDFS 2.6.0) CDH 5.4.x (HDFS 2.6.0) CDH 5.3.x (HDFS 2.5.0) CDH 5.2.x (HDFS 2.5.0) CDH 5.1.x (HDFS 2.3.0)

Distribution

Version

Apache Hadoop

2.7.x

2.6.0

2.5.x

2.4.x

2.3.0

2.2.0

3.0.0-alpha 1

Hortonworks Data Platform (HDP)

HDP 2.5 (HDFS 2.7.3)

HDP 2.4 (HDFS 2.7.1)

HDP 2.3 (HDFS 2.7.1)

HDP 2.2 (HDFS 2.6.0)

HDP 2.1 (HDFS 2.4.0)

Cloudera Distribution Include Apache Hadoop (CDH)

CDH 5.8.x (HDFS 2.6.0)

CDH 5.7.x (HDFS 2.6.0)

CDH 5.6.x (HDFS 2.6.0)

CDH 5.5.x (HDFS 2.6.0)

CDH 5.4.x (HDFS 2.6.0)

CDH 5.3.x (HDFS 2.5.0)

CDH 5.2.x (HDFS 2.5.0)

CDH 5.1.x (HDFS 2.3.0)

1.2.2.2 HBase Handler

Cloudera HBase 5.4.x and later did not fully adopt the Apache HBase 1.0.0 client interface so it is not fully in sync with the Apache HBase code line to provide reverse compatibility in that HBase client interface. This means that Cloudera HBase broke binary compatibility with the new HBase 1.0.0 interface resulting in NoSuchMethodError when integrating with the Oracle GoldenGate for Big Data HBase Handler. This can be solved one of the following two ways:

Configure the HBase Handler to use the 0.98.x HBase interface by setting the HBase Handler configuration property, hBase98Compatible, to true.
Alternatively, you can use the Apache HBase client libraries when connecting to CDH 5.4.x and later HBase.

The HBase Handler is designed to work with the following:

Distribution Version

Distribution	Version
Apache HBase	0.98.x and 0.96.x when you set the `hBase98Compatible` property to true 1.2.x, 1.1.x and 1.0.x
Hortonworks Data Platform (HDP)	HDP 2.5 (HBase 1.1.2) HDP 2.4 (HBase 1.1.2) HDP 2.3 (HBase 1.1.1) HDP 2.2 (HBase 0.98.4) when you set the `hBase98Compatible` property to `true`.
Cloudera Distribution Including Apache Hadoop (CDH)	CDH 5.8.x (HBase 1.2.0) when you set the `hBase98Compatible` property to `true`. CDH 5.7.x (HBase 1.2.0) when you set the `hBase98Compatible` property to `true`. CDH 5.6.x (HBase 1.0.0) when you set the `hBase98Compatible` property to `true`. CDH 5.5.x (HBase 1.0.0) when you set the `hBase98Compatible` property to `true`. CDH 5.4.x (HBase 1.0.0) when you set the `hBase98Compatible` property to `true`. CDH 5.3.x (HBase 0.98.6) when you set the `hBase98Compatible` property to `true`. CDH 5.2.x (HBase 0.98.6) when you set the `hBase98Compatible` property to `true`. CDH 5.1.x (HBase 9.98.1) when you set the `hBase98Compatible` property to `true`.

Apache HBase

0.98.x and 0.96.x when you set the hBase98Compatible property to true

1.2.x, 1.1.x and 1.0.x

Hortonworks Data Platform (HDP)

HDP 2.5 (HBase 1.1.2)

HDP 2.4 (HBase 1.1.2)

HDP 2.3 (HBase 1.1.1)

HDP 2.2 (HBase 0.98.4) when you set the hBase98Compatible property to true.

Cloudera Distribution Including Apache Hadoop (CDH)

CDH 5.8.x (HBase 1.2.0) when you set the hBase98Compatible property to true.

CDH 5.7.x (HBase 1.2.0) when you set the hBase98Compatible property to true.

CDH 5.6.x (HBase 1.0.0) when you set the hBase98Compatible property to true.

CDH 5.5.x (HBase 1.0.0) when you set the hBase98Compatible property to true.

CDH 5.4.x (HBase 1.0.0) when you set the hBase98Compatible property to true.

CDH 5.3.x (HBase 0.98.6) when you set the hBase98Compatible property to true.

CDH 5.2.x (HBase 0.98.6) when you set the hBase98Compatible property to true.

CDH 5.1.x (HBase 9.98.1) when you set the hBase98Compatible property to true.

1.2.2.3 Flume Handler

The Oracle GoldenGate for Big Data Flume Handler works with the Apache Flume versions 1.6.x, 1.5.x and 1.4.x. Compatibility with versions of Flume before 1.4.0 is not guaranteed.

The Flume Handler is compatible with the following versions:

Distribution	Version
Distribution: Apache Flume	Version: 1.7.x, 1.6.x, 1.5.x, 1.4.x
Hortonworks Data Platform (HDP)	HDP 2.5 (Flume 1.5.2) HDP 2.4 (Flume 1.5.2) HDP 2.3 (Flume 1.5.2) HDP 2.2 (Flume 1.5.2) HDP 2.1 (Flume 1.4.0)
Cloudera Distribution Including Apache Hadoop (CDH)	CDH 5.8.x (Flume 1.6.0) CDH 5.7.x (Flume 1.6.0) CDH 5.6.x (Flume 1.6.0) CDH 5.5.x (Flume 1.6.0) CDH 5.4.x (Flume 1.5.0) CDH 5.3.x (Flume 1.5.0) CDH 5.2.x (Flume 1.5.0) CDH 5.1.x (Flume 1.5.0)

Distribution

Version

Distribution: Apache Flume

Version: 1.7.x, 1.6.x, 1.5.x, 1.4.x

Hortonworks Data Platform (HDP)

HDP 2.5 (Flume 1.5.2)

HDP 2.4 (Flume 1.5.2)

HDP 2.3 (Flume 1.5.2)

HDP 2.2 (Flume 1.5.2)

HDP 2.1 (Flume 1.4.0)

Cloudera Distribution Including Apache Hadoop (CDH)

CDH 5.8.x (Flume 1.6.0)

CDH 5.7.x (Flume 1.6.0)

CDH 5.6.x (Flume 1.6.0)

CDH 5.5.x (Flume 1.6.0)

CDH 5.4.x (Flume 1.5.0)

CDH 5.3.x (Flume 1.5.0)

CDH 5.2.x (Flume 1.5.0)

CDH 5.1.x (Flume 1.5.0)

1.2.2.4 Kafka Handler

The Kafka Handler is not compatible with Kafka version 8.2.2.2 and later.

The Kafka Handler is designed to work with the following:

Distribution	Version
Apache Kafka	0.9.0.x 0.10.0.0 0.10.0.1
Hortonworks Data Platform (HDP)	HDP 2.5 (Kafka 0.10.0) HDP 2.4 (Kafka 0.9.0)
Cloudera Distribution Including Apache Hadoop (CDH) does not currently include Kafka. Cloudera currently distributes Kafka separately as Cloudera Distribution of Apache Kafka	Cloudera Distribution of Apache Kafka 2.0.x (Kafka 0.9.0.0)
Confluent Platform	3.0.1 (Kafka 0.10.0.0) 2.0.0 (Kafka 0.9.0.0)

1.2.2.5 Cassandra Handler

The Cassandra Handler uses the Datastax 3.1.0 Java Driver for Apache Cassandra. This driver streams change data capture from a source trail file into the corresponding tables in the Cassandra database.

The HDFS Handler is designed to work with the following versions :

Distribution	Version
Apache Cassandra	1.2 2.0 2.1 2.2 3.0
Datastax Enterprise Cassandra	3.2 4.0 4.5 4.6 4.7 4.8

Distribution

Version

Apache Cassandra

1.2

2.0

2.1

2.2

3.0

Datastax Enterprise Cassandra

3.2

4.0

4.5

4.6

4.7

4.8

1.2.2.6 MongoDB Handler

The MongoDB handler uses the native Java driver version 3.2.2. It is compatible with the following MongoDB versions:

MongoDB 2.4
MongoDB 2.6
MongoDB 3.0
MongoDB 3.2
MongoDB 3.4

1.2.2.7 JBDC Handler

The JDBC handler internally uses generic JDBC API. Although it should be compliant with any JDBC complaint database driver we have certified the JDBC handler against the following targets:

Oracle Database target using Oracle JDBC driver.
MySQL Database target using MySQL JDBC driver.
IBM Netezza target using Netezza JDBC driver.
Amazon Redshift target using Redshift JDBC driver.

1.2.3 What are the Additional Support Considerations?

This section describes additional Oracle GoldenGate for Big Data Handlers additional support considerations.

Pluggable Formatters—Support

The handlers support the Pluggable Formatters as described in Using the Pluggable Formatters as follows:

The HDFS Handler supports all of the pluggable handlers .
Pluggable formatters are not applicable to the HBase Handler. Data is streamed to HBase using the proprietary HBase client interface.
The Flume Handler supports all of the pluggable handlers described in Using the Pluggable Formatters.
The Kafka Handler supports all of the pluggable handlers described in Using the Pluggable Formatters.
The Cassandra , MongoDB, and JDBC Handlers do not use a pluggable formatter.

Avro Formatter—Improved Support for Binary Source Data

In previous releases, the Avro Formatter did not support the Avro bytes data type. Binary data was instead converted to Base64 and persisted in Avro messages as a field with a string data type. This required an additional conversion step to convert the data from Base64 back to binary.

The Avro Formatter now can identify binary source fields that will be mapped into an Avro bytes field and the original byte stream from the source trail file will be propagated to the corresponding Avro messages without conversion to Base64.

Avro Formatter—Generic Wrapper

The schema_hash field was changed to the schema_fingerprint field. The schema_fingerprint is a long and is generated using the parsingFingerprint64(Schema s) method on the org.apache.avro.SchemaNormalization class. This identifier provides better traceability from the Generic Wrapper Message back to the Avro schema that is used to generate the Avro payload message contained in the Generic Wrapper Message.

JSON Formatter—Row Modeled Data

The JSON formatter supports row modeled data in addition to operation modeled data.. Row modeled data includes the after image data for insert operations, the after image data for update operations, the before image data for delete operations, and special handling for primary key updates.

Java Delivery Using Extract

Java Delivery using Extract is not supported and was deprecated in this release. Support for Java Delivery is only supported using the Replicat process. Replicat provides better performance, better support for checkpointing, and better control of transaction grouping.

Kafka Handler—Versions

Support for Kafka versions 0.8.2.2, 0.8.2.1, and 0.8.2.0 was discontinued. This allowed the implementation of the flush call on the Kafka producer, which provides better support for flow control and checkpointing.

HDFS Handler—File Creation

A new feature was added to the HDFS Handler so that you can use Extract, Load, Transform (ELT). The new gg.handler.name.openNextFileAtRoll=true property was added to create new files immediately when the previous file is closed. The new file appears in the HDFS directory immediately after the previous file stream is closed.

This feature does not work when writing HDFS files in Avro Object Container File (OCF) format or sequence file format.

MongoDB Handler—Support

The handler can only replicate unique rows from source table. If a source table has no primary key defined and has duplicate rows, replicating the duplicate rows to the MongoDB target results in a duplicate key error and the Replicat process abends.
Missed updates and deletes are undetected so are ignored.
Untested with sharded collections.
Only supports date and time data types with millisecond precision. These values from a trail with microseconds or nanoseconds precision are truncated to millisecond precision.
The datetime data type with timezone in the trail is not supported.
A maximum BSON document size of 16 MB. If the trail record size exceeds this limit, the handler cannot replicate the record.
No DDL propagation.
No truncate operation.

JDBC Handler—Support

The JDBC handler uses the generic JDBC API, which means any target database with a JDBC driver implementation should be able to use this handler. There are a myriad of different databases that support the JDBC API and Oracle cannot certify the JDBC Handler for all targets. Oracle has certified the JDBC Handler for the following RDBMS targets:
- Oracle
- MySQL
- Netezza
- Redshift
The handler supports Replicat using the REPERROR and HANDLECOLLISIONS parameters, see Reference for Oracle GoldenGate for Windows and UNIX.
The database metadata retrieved through the Redshift JDBC driver has known constraints, see Release Notes for Oracle GoldenGate for Big Data.

Redshift target table names in the Replicat parameter file must be in lower case and double quoted. For example:
```
 MAP SourceSchema.SourceTable, target “public”.”targetable”;  
```
DDL operations are ignored by default and are logged with a WARN level.
Coordinated Replicat is a multithreaded process that applies transactions in parallel instead of serially. Each thread handles all of the filtering, mapping, conversion, SQL construction, and error handling for its assigned workload. A coordinator thread coordinates transactions across threads to account for dependencies. It ensures that DML is applied in a synchronized manner preventing certain DMLs from occurring on the same object at the same time due to row locking, block locking, or table locking issues based on database specific rules. If there are database locking issue, then Coordinated Replicat performance can be extremely slow or pauses, see Administering Oracle GoldenGate for Windows and UNIX

1.3 Setting Up Oracle GoldenGate for Big Data

This section contains the various tasks that you need to preform to set up Oracle GoldenGate for Big Data integrations with Big Data targets.

Topics:

1.3.1 Java Environment Setup

The Oracle GoldenGate for Big Data integrations create an instance of the Java virtual machine at runtime. Oracle GoldenGate for Big Data requires that you install Oracle Java 8 JRE at a minimum.

Oracle recommends that you set the JAVA_HOME environment variable to point to Java 8 installation directory. Additionally, the Java Delivery process needs to load the libjvm.so (libjvm.dll on Windows) and libjsig.so (libjsig.dll on Windows) Java shared libraries. These libraries are installed as part of the JRE. The location of these shared libraries need to be resolved and the appropriate environmental variable set to resolve the dynamic libraries needs to be set so the libraries can be loaded at runtime (that is, LD_LIBRARY_PATH, PATH, or LIBPATH).

1.3.2 Properties Files

There are two Oracle GoldenGate properties files required to run the Oracle GoldenGate Java Deliver user exit (alternatively called the Oracle GoldenGate Java Adapter). It is the Oracle GoldenGate Java Delivery that hosts Java integrations including the Big Data integrations. A Replicat properties file is required in order to run either process. The required naming convention for the Replicat file name is the process_name.prm. The exit syntax in the Replicat properties file provides the name and location of the Java Adapter properties file. It is the Java Adapter properties file that contains the configuration properties for the Java adapter include GoldenGate for Big Data integrations. The Replicat and Java Adapters properties files are required to run Oracle GoldenGate for Big Data integrations.

Alternatively the Java Adapters properties can be resolved using the default syntax, process_name.properties. It you use the default naming for the Java Adapter properties file then the name of the Java Adapter properties file can be omitted from the Replicat properties file.

Samples of the properties files for Oracle GoldenGate for Big Data integrations can be found in the subdirectories of the following directory:

GoldenGate_install_dir/AdapterExamples/big-data

1.3.3 Transaction Grouping

The principal way to improve performance in Oracle GoldenGate for Big Data integrations is usingtransaction grouping. In transaction grouping, the operations of multiple transactions are grouped together in a single larger transaction. The application of a larger grouped transaction is typically much more efficient than the application of individual smaller transactions. Transaction grouping is possible with the Replicat process discussed in Running with Replicat.

1.4 Configuring GoldenGate for Big Data

This section describes how to configure GoldenGate for Big Data Handlers.

Topics:

1.4.1 Running with Replicat

This section explains how to run the Java Adapter with the Oracle GoldenGate Replicat process. It includes the following sections:

1.4.1.1 Configuring Replicat

The following is an example of how you can configure a Replicat process properties file for use with the Java Adapter:

REPLICAT hdfs
TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.properties 
--SOURCEDEFS ./dirdef/dbo.def 
DDL INCLUDE ALL
GROUPTRANSOPS 1000
MAPEXCLUDE dbo.excludetable
MAP dbo.*, TARGET dbo.*;

The following is explanation of these Replicat configuration entries:

REPLICAT hdfs - The name of the Replicat process.

TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.properties - Sets the target database as you exit to libggjava.so and sets the Java Adapters property file to dirprm/hdfs.properties.

--SOURCEDEFS ./dirdef/dbo.def - Sets a source database definitions file. It is commented out because Oracle GoldenGate trail files provide metadata in trail.

GROUPTRANSOPS 1000 - Groups 1000 transactions from the source trail files into a single target transaction. This is the default and improves the performance of Big Data integrations.

MAPEXCLUDE dbo.excludetable - Sets the tables to exclude.

MAP dbo.*, TARGET dbo.*; - Sets the mapping of input to output tables.

1.4.1.2 Adding the Replicat Process

The command to add and start the Replicat process in ggsci is the following:

ADD REPLICAT hdfs, EXTTRAIL ./dirdat/gg
START hdfs

1.4.1.3 Replicat Grouping

The Replicat process provides the Replicat configuration property, GROUPTRANSOPS, to control transaction grouping. By default, the Replicat process implements transaction grouping of 1000 source transactions into a single target transaction. If you want to turn off transaction grouping then the GROUPTRANSOPS Replicat property should be set to 1.

1.4.1.4 Replicat Checkpointing

In addition to the Replicat checkpoint file ,.cpr, an additional checkpoint file, dirchk/group.cpj, is created that contains information similar to CHECKPOINTTABLE in Replicat for the database.

1.4.1.5 Unsupported Replicat Features

The following Replicat features are not supported in this release:

BATCHSQL
SQLEXEC
Stored procedure
Conflict resolution and detection (CDR)
REPERROR

1.4.1.6 Mapping Functionality

The Oracle GoldenGate Replicat process supports mapping functionality to custom target schemas. You must use the Metadata Provider functionality to define a target schema or schemas, and then use the standard Replicat mapping syntax in the Replicat configuration file to define the mapping. For more information about the Replicat mapping syntax in the Replication configuration file, see Administering Oracle GoldenGate for Windows and UNIX.

1.4.2 Logging

Logging is essential to troubleshooting Oracle GoldenGate for Big Data integrations with Big Data targets. This section covers how Oracle GoldenGate for Big Data integration log and the best practices for logging. It includes the following sections:

Replicat Process Logging
Java Layer Logging

1.4.2.1 Replicat Process Logging

Oracle GoldenGate for Big Data integrations leverage the Java Delivery functionality described in the Administering Oracle GoldenGate Application Adapters. In this setup, either a Oracle GoldenGate Replicat process loads a user exit shared library. This shared library then loads a Java virtual machine to thereby interface with targets providing a Java interface. So the flow of data is as follows:

Replicat Process —>User Exit—> Java Layer

It is important that all layers log correctly so that users can review the logs to troubleshoot new installations and integrations. Additionally, if you have a problem that requires contacting Oracle Support, the log files are a key piece of information to be provided to Oracle Support so that the problem can be efficiently resolved.

A running Replicat process creates or appends log files into the GoldenGate_Home/dirrpt directory that adheres to the following naming convention: process_name.rpt. If a problem is encountered when deploying a new Oracle GoldenGate process, this is likely the first log file to examine for problems. The Java layer is critical for integrations with Big Data applications.

1.4.2.2 Java Layer Logging

The Oracle GoldenGate for Big Data product provides flexibility for logging from the Java layer. The recommended best practice is to use Log4j logging to log from the Java layer. Enabling simple Log4j logging requires the setting of two configuration values in the Java Adapters configuration file.

gg.log=log4j
gg.log.level=INFO

These gg.log settings will result in a Log4j file to be created in the GoldenGate_Home/dirrpt directory that adheres to this naming convention, process_name_log level_log4j.log. The supported Log4j log levels are in the following list in order of increasing logging granularity.

OFF
FATAL
ERROR
WARN
INFO
DEBUG
TRACE

Selection of a logging level will include all of the coarser logging levels as well (that is, selection of WARN means that log messages of FATAL, ERROR and WARN will be written to the log file). The Log4j logging can additionally be controlled by separate Log4j properties files. These separate Log4j properties files can be enabled by editing the bootoptions property in the Java Adapter Properties file. These three example Log4j properties files are included with the installation and are included in the classpath:

log4j-default.properties
log4j-debug.properites
log4j-trace.properties

You can modify the bootoptionsin any of the files as follows:

javawriter.bootoptions=-Xmx512m -Xms64m -Djava.class.path=.:ggjava/ggjava.jar -Dlog4j.configuration=samplelog4j.properties

You can use your own customized Log4j properties file to control logging. The customized Log4j properties file must be available in the Java classpath so that it can be located and loaded by the JVM. The contents of a sample custom Log4j properties file is the following:

# Root logger option 
log4j.rootLogger=INFO, file 
 
# Direct log messages to a log file 
log4j.appender.file=org.apache.log4j.RollingFileAppender 
 
log4j.appender.file.File=sample.log 
log4j.appender.file.MaxFileSize=1GB 
log4j.appender.file.MaxBackupIndex=10 
log4j.appender.file.layout=org.apache.log4j.PatternLayout 
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

There are two important requirements when you use a custom Log4j properties file. First, the path to the custom Log4j properties file must be included in the javawriter.bootoptions property. Logging initializes immediately when the JVM is initialized while the contents of the gg.classpath property is actually appended to the classloader after the logging is initialized. Second, the classpath to correctly load a properties file must be the directory containing the properties file without wildcards appended.

1.4.3 Metadata Change Events

The Oracle GoldenGate for Big Data Handlers and Formatters provide functionality to take action when a metadata change event is encountered. The ability to take action in the case of metadata change events depends on the metadata change events being available in the source trail file. Oracle GoldenGate supports metadata in trail and the propagation of DDL data from a source Oracle Database. If the source trail file does not have metadata in trail and DDL data (metadata change events) then it is not possible for Oracle GoldenGate for Big Data to provide and metadata change event handling.

1.4.4 Configuration Property `CDATA[]` Wrapping

The GoldenGate for Big Data Handlers and Formatters support the configuration of many parameters in the Java properties file, the value of which may be interpreted as white space. The configuration handling of the Java Adapter trims white space from configuration values from the Java configuration file. This behavior of trimming whitespace may be desirable for some configuration values and undesirable for other configuration values. Alternatively, you can wrap white space values inside of special syntax to preserve the whites pace for selected configuration variables. GoldenGate for Big Data borrows the XML syntax of CDATA[] to preserve white space. Values that would be considered to be white space can be wrapped inside of CDATA[].

The following is an example attempting to set a new-line delimiter for the Delimited Text Formatter:

gg.handler.{name}.format.lineDelimiter=\n

This configuration will not be successful. The new-line character is interpreted as white space and will be trimmed from the configuration value. Therefore the gg.handler setting effectively results in the line delimiter being set to an empty string.

In order to preserve the configuration of the new-line character simply wrap the character in the CDATA[] wrapper as follows:

gg.handler.{name}.format.lineDelimiter=CDATA[\n]

Configuring the property with the CDATA[] wrapping preserves the white space and the line delimiter will then be a new-line character.

1.4.5 Using Regular Expression Search and Replace

You can perform more powerful search and replace operations of both schema data (catalog names, schema names, table names, and column names) and column value data, which are separately configured. Regular expressions (regex) are characters that customize a search string through pattern matching. You can match a string against a pattern or extract parts of the match. Oracle GoldenGate for Big Data uses the standard Oracle Java regular expressions package, java.util.regex. For more information, see "Regular Expressions” in the Base Definitions volume at The Single UNIX Specification, Version 4.

This section includes the following:

Using Schema Data Replace
Using Content Data Replace

1.4.5.1 Using Schema Data Replace

You can replace schema data using the gg.schemareplaceregex and gg.schemareplacestring properties. Use gg.schemareplaceregex to set a regular expression, and then use it to search catalog names, schema names, table names, and column names for corresponding matches. Matches are then replaced with the content of the gg.schemareplacestring value. The default value of gg.schemareplacestring is an empty string or "".

For example, some system table names start with a dollar sign like $mytable. You may want to replicate these tables even though most Big Data targets do not allow dollar signs in table names. To remove the dollar sign, you could configure the following replace strings:

gg.schemareplaceregex=[$] 
gg.schemareplacestring=

The resulting example of searched and replaced table name is mytable. These properties also support CDATA[] wrapping to preserve whitespace in the value of configuration values. So the equivalent of the preceding example using CDATA[] wrapping use is:

gg.schemareplaceregex=CDATA[[$]]
gg.schemareplacestring=CDATA[]

The schema search and replace functionality supports using multiple search regular expressions and replacements strings using the following configuration syntax:

gg.schemareplaceregex=some_regex
gg.schemareplacestring=some_value
gg.schemareplaceregex1=some_regex
gg.schemareplacestring1=some_value
gg.schemareplaceregex2=some_regex
gg.schemareplacestring2=some_value

1.4.5.2 Using Content Data Replace

You can replace content data using the gg.contentreplaceregex and gg.contentreplacestring properties to search the column values using the configured regular expression and replace matches with the replacement string. For example, this is useful to replace line feed characters in column values. If the delimited text formatter is used then line feeds occurring in the data will be incorrectly interpreted as line delimiters by analytic tools.

You can configure n number of content replacement regex search values. The regex search and replacements are done in the order of configuration. Configured values must follow a given order as follows:

gg.conentreplaceregex=some_regex
gg.conentreplacestring=some_value
gg.conentreplaceregex1=some_regex
gg.conentreplacestring1=some_value
gg.conentreplaceregex2=some_regex
gg.conentreplacestring2=some_value

Configuring a subscript of 3 without a subscript of 2 would cause the subscript 3 configuration to be ignored.

Attention

Regular express searches and replacements require computer processing and can reduce the performance of the Oracle GoldenGate for Big Data process.

To replace line feeds with a blank character you could use the following property configurations:

gg.contentreplaceregex=[\n] 
gg.contentreplacestring=CDATA[ ]

This changes the column value from:

this is 
me

to :

this is me

Both values support CDATA wrapping. The second value must be wrapped in a CDATA[] wrapper because a single blank space will be interpreted as whitespace and trimmed by the Oracle GoldenGate for Big Data configuration layer. In addition, you can configure multiple search a replace strings. For example, you may also want to trim leading and trailing white space out of column values in addition to trimming line feeds from:

^\\s+|\\s+$

gg.contentreplaceregex1=^\\s+|\\s+$ 
gg.contentreplacestring1=CDATA[]

1.4.6 Scaling Oracle GoldenGate for Big Data Delivery

Oracle GoldenGate for Big Data supports breaking down the source trail files into either multiple Replicat processes or by using Coordinated Delivery to instantiate multiple Java Adapter instances inside a single Replicat process to improve throughput.. This allows you to scale Oracle GoldenGate for Big Data delivery.

There are some cases where the throughput to Oracle GoldenGate for Big Data integration targets is not sufficient to meet your service level agreements even after you have tuned your Handler for maximum performance. When this occurs, you can configure parallel processing and delivery to your targets using one of the following methods:

Multiple Replicat processes can be configured to read data from the same source trail files. Each of these Replicat processes are configured to process a subset of the data in the source trail files so that all of the processes collectively process the source trail files in their entirety. There is no coordination between the separate Replicat processes using this solution.
Oracle GoldenGate Coordinated Delivery can be used to parallelize processing the data from the source trail files within a single Replicat process. This solution involves breaking the trail files down into logical subsets for which each configured subset is processed by a different delivery thread. For more information about Coordinated Delivery, see https://blogs.oracle.com/dataintegration/entry/goldengate_12c_coordinated_replicat.

With either method, you can split the data into parallel processing for improved throughput. Oracle recommends breaking the data down in one of the following two ways:

Splitting Source Data By Source Table –Data is divided into subsections by source table. For example, Replicat process 1 might handle source tables table1 and table2, while Replicat process 2 might handle data for source tables table3 and table2. Data is split for source table and the individual table data is not subdivided.
Splitting Source Table Data into Sub Streams – Data from source tables is split. For example, Replicat process 1 might handle half of the range of data from source table1, while Replicat process 2 might handler the other half of the data from source table1.

Additional limitations:

Parallel apply is not supported.
The BATCHSQL parameter not supported.

Example 1-1 Scaling Support for the Oracle GoldenGate for Big Data Handlers

Handler Name	Splitting Source Data By Source Table	Splitting Source Table Data into Sub Streams
HDFS	Supported	Not supported
HBase	Supported when all required HBase namespaces are pre-created in HBase.	Supported when: All required HBase namespaces are pre-created in HBase. All required HBase target tables are pre-created in HBase. Schema evolution is not an issue because HBase tables have no schema definitions so a source metadata change does not require any schema change in HBase. The source data does not contain any truncate operations.
Kafka	Supported	Supported for formats that support schema propagation, such as Avro this is less desirable.
Flume	Supported	Supported for formats that support schema propagation, such as Avro this is less desirable.
Cassandra	Supported	Supported when: Required target tables in Cassandra are pre-created. Metadata change events do not occur.
MongoDB	Supported	Supported
JDBC	Supported	Supported

1.4.7 Using Identities in Oracle GoldenGate Credential Store

The Oracle GoldenGate credential store manages user IDs and their encrypted passwords (together known as credentials) that are used by Oracle GoldenGate processes to interact with the local database. The credential store eliminates the need to specify user names and clear-text passwords in the Oracle GoldenGate parameter files. An optional alias can be used in the parameter file instead of the user ID to map to a userid and password pair in the credential store. The credential store is implemented as an auto login wallet within the Oracle Credential Store Framework (CSF). The use of an LDAP directory is not supported for the Oracle GoldenGate credential store. The auto login wallet supports automated restarts of Oracle GoldenGate processes without requiring human intervention to supply the necessary passwords.

In Oracle GoldenGate for Big Data, you specify the alias and domain in the property file not the actual user ID or password. User credentials are maintained in secure wallet storage.

This section includes the following:

1.4.7.1 Creating a Credential Store

You can create a credential store for your Big Data environment.

Run the GGSCI ADD CREDENTIALSTORE command to create a file called cwallet.sso in the dircrd/ subdirectory of your Oracle GoldenGate installation directory (the default).

You can the location of the credential store (cwallet.sso file by specifying the desired location with the CREDENTIALSTORELOCATION parameter in the GLOBALS file.

For more information about credential store commands, see Reference for Oracle GoldenGate for Windows and UNIX.

Note:

Only one credential store can be used for each Oracle GoldenGate instance.

1.4.7.2 Adding Users to a Credential Store

After you create a credential store for your Big Data environment, you can added users to the store.

Run the GGSCI ALTER CREDENTIALSTORE ADD USER userid PASSWORD password [ALIAS alias] [DOMAIN domain] command to create each user, where:

userid is the user name. Only one instance of a user name can exist in the credential store unless the ALIAS or DOMAIN option is used.
password is the user's password. The password is echoed (not obfuscated) when this option is used. If this option is omitted, the command prompts for the password, which is obfuscated as it is typed (recommended because it is more secure).
alias is an alias for the user name. The alias substitutes for the credential in parameters and commands where a login credential is required. If the ALIAS option is omitted, the alias defaults to the user name.

For example:

ALTER CREDENTIALSTORE ADD USER scott PASSWORD tiger ALIAS scsm2 domain ggadapters

For more information about credential store commands, see Reference for Oracle GoldenGate for Windows and UNIX.

1.4.7.3 Configuring Properties to Access the Credential Store

The Oracle GoldenGate Java Adapter properties file requires specific syntax to resolve user name and password entries in the Credential Store at runtime. For resolving a user name the syntax is the following:

ORACLEWALLETUSERNAME alias domain_name

For resolving a password the syntax required is the following:

ORACLEWALLETPASSWORD alias domain_name

The following example illustrate how to configure a Credential Store entry with an alias of myalias and a domain of mydomain.

Note:

With HDFS Hive JDBC the user name and password is encrypted.

gg.handler.hdfs.hiveJdbcUsername=ORACLEWALLETUSERNAME[myalias mydomain] 
gg.handler.hdfs.hiveJdbcPassword=ORACLEWALLETPASSWORD[myalias mydomain]

Although the Credential Store is intended to store user name and password pair type credentials, you can apply this functionality more generically. Consider the user name and password entries as accessible values in the Credential Store. Any configuration property resolved in the Java Adapter layer (not accessed in the C user exit layer) can be resolved from the Credential Store. This allows you more flexibility to be creative in how you protect sensitive configuration entries.

1 Introduction to GoldenGate for Big Data

1.1 Introduction

1.2 Understanding What is Supported

1.2.1 Verifying Certification and System Requirements

1.2.2 Understanding Handler Compatibility

1.2.2.1 HDFS Handler

1.2.2.2 HBase Handler

1.2.2.3 Flume Handler

1.2.2.4 Kafka Handler

1.2.2.5 Cassandra Handler

1.2.2.6 MongoDB Handler

1.2.2.7 JBDC Handler

1.2.3 What are the Additional Support Considerations?

1.3 Setting Up Oracle GoldenGate for Big Data

1.3.1 Java Environment Setup

1.3.2 Properties Files

1.3.3 Transaction Grouping

1.4 Configuring GoldenGate for Big Data

1.4.1 Running with Replicat

1.4.1.1 Configuring Replicat

1.4.1.2 Adding the Replicat Process

1.4.1.3 Replicat Grouping

1.4.1.4 Replicat Checkpointing

1.4.1.5 Unsupported Replicat Features

1.4.1.6 Mapping Functionality

1.4.2 Logging

1.4.2.1 Replicat Process Logging

1.4.2.2 Java Layer Logging

1.4.3 Metadata Change Events

1.4.4 Configuration Property CDATA[] Wrapping

1.4.5 Using Regular Expression Search and Replace

1.4.5.1 Using Schema Data Replace

1.4.5.2 Using Content Data Replace

1.4.6 Scaling Oracle GoldenGate for Big Data Delivery

1.4.7 Using Identities in Oracle GoldenGate Credential Store

1.4.7.1 Creating a Credential Store

1.4.7.2 Adding Users to a Credential Store

1.4.7.3 Configuring Properties to Access the Credential Store

1.4.4 Configuration Property `CDATA[]` Wrapping