1 Overview

This chapter describes Oracle GoldenGate for Big Data, how to set up its environment, use it with Replicat and Extract, logging data, and other configuration details. It contains the following sections:

1.1 Overview

The Oracle GoldenGate for Big Data integrations run as pluggable functionality into the Oracle GoldenGate Java Delivery framework, also referred to as the Java Adapters framework. This functionality extends the Java Delivery functionality. Oracle recommends that you review the Java Delivery documentation in the Oracle GoldenGate Application Adapters Guide. Much of the Big Data functionality employs and extends the Java Delivery functionality.

1.2 Java Environment Setup

The Oracle GoldenGate for Big Data integrations create an instance of the Java virtual machine at runtime. Oracle GoldenGate for Big Data requires Java 7. It is recommended that you set the JAVA_HOME environment variable to point to Java 7 installation directory. Additionally, the Java Delivery process needs to load the libjvm.so (libjvm.dll on Windows) and libjsig.so (libjsig.dll on Windows) Java shared libraries. These libraries are installed as part of the JRE. The location of these shared libraries need to be resolved and the appropriate environmental variable set to resolve the dynamic libraries needs to be set so the libraries can be loaded at runtime (that is, LD_LIBRARY_PATH, PATH, or LIBPATH).

1.3 Properties Files

There are two Oracle GoldenGate properties files required to run the Oracle GoldenGate Java Deliver user exit (alternatively called the Oracle GoldenGate Java Adapter). It is the Oracle GoldenGate Java Delivery that hosts Java integrations including the Big Data integrations. The Oracle GoldenGate Java Delivery can run with either the Oracle GoldenGate Replicat or Extract process, although running with the Replicat process is considered the better practice. A Replicat or Extract properties file is required in order to run the Replicat or Extract process. The required naming convention for the Replicat or Extract file name is the process_name.prm. You exit syntax in the Replicat or Extract properties file provides the name and location of the Java Adapter Properties file. It is the Java Adapter Properties file where the configuration properties for the Java adapter include Big Data integrations. Both properties files are required to run Oracle GoldenGate for Big Data integrations. Alternatively the Java Adapters Properties can be resolved using the default syntax, process_name.properties. It you use the default naming for the Java Adapter Properties file then the name of the Java Adapter Properties file can be omitted from the Replicat or Extract properties file.

Samples of the properties files for Oracle GoldenGate for Big Data integrations can be found in the subdirectories of the following directory:

GoldenGate_install_dir/AdapterExamples/big-data

1.4 Transaction Grouping

The principal way to improve performance in Oracle GoldenGate for Big Data integrations is by the use of transaction grouping. In transaction grouping, the operations of multiple transactions are grouped together in a single larger transaction. The application of a larger grouped transaction is typically much more efficient than the application of individual smaller transactions. Transaction grouping is possible with both the Replicat and Extract processes and will be discussed in the following sections detailing running with Replicat or Extract.

1.5 Running with Replicat

This section explains how to run the Java Adapter with the Oracle GoldenGate Replicat process.

1.5.1 Replicat Configuration

The following is an example of a Replicat process properties file for Java Adapter.

REPLICAT hdfs
TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.properties 
--SOURCEDEFS ./dirdef/dbo.def 
DDL INCLUDE ALL
GROUPTRANSOPS 1000
MAPEXCLUDE dbo.excludetable
MAP dbo.*, TARGET dbo.*;

The following is explanation of the Replicat configuration entries:

REPLICAT hdfs - The name of the Replicat process.

TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.properties - Names the target database as you exit libggjava.so and sets the Java Adapters Property file to dirprm/hdfs.properties

--SOURCEDEFS ./dirdef/dbo.def - Sets a source database definitions file. Commented out because Oracle GoldenGate 12.2.0.1 trail files provide metadata in trail.

GROUPTRANSOPS 1000 - To group 1000 transactions from the source trail files into a single target transaction. This is the default and improves the performance of Big Data integrations.

MAPEXCLUDE dbo.excludetable - To identify tables to exclude.

MAP dbo.*, TARGET dbo.*; - Shows the mapping of input to output tables.

1.5.2 Adding the Replicat Process

The command to add and start the Replicat process in ggsci is the following:

ADD REPLICAT hdfs, EXTTRAIL ./dirdat/gg
START hdfs

1.5.3 Replicat Grouping

The Replicat process provides the Replicat configuration property GROUPTRANSOPS to control transaction grouping. By default, the Replicat process implements transaction grouping of 1000 source transactions into a single target transaction. If you want to turn off transaction grouping then the GROUPTRANSOPS Replicat property should be set to 1.

1.5.4 Replicat Checkpointing

CHECKPOINTTABLE and NODBCHECKPOINT are not applicable for Java Delivery with Replicat. Beside Replicat checkpoint file (.cpr), additional checkpoint file (dirchk/<group>.cpj) will be created that contains information similar to CHECKPOINTTABLE in Replicat for RDBMS.

1.5.5 Unsupported Replicat Features

The following Replicat features are not supported in this release:

  • BATCHSQL

  • SQLEXEC

  • Stored procedure

  • Conflict resolution and detection (CDR)

  • REPERROR

1.5.6 Mapping Functionality

The Oracle GoldenGate Replicat process supports mapping functionality to custom target schemas. This functionality is not available using the Oracle GoldenGate Extract process. You must use the Metadata Provider functionality to define a target schema or schemas and then use the standard Replicat mapping syntax in the Replicat configuration file to define the mapping. Refer to the Oracle GoldenGate Replicat documentation to understand the Replicat mapping syntax in the Replication configuration file. For instructions on setting up the Metadata Provider, see Using the Metadata Provider.

1.6 Running with Extract

This section explains how to run Java Adapter with the Oracle GoldenGate Extract process.

1.6.1 Extract Configuration

The following

EXTRACT hdfs
discardfile ./dirrpt/avro1.dsc, purge 
--SOURCEDEFS ./dirdef/dbo.def
CUSEREXIT libjavaue.so CUSEREXIT PASSTHRU, INCLUDEUPDATEBEFORES, PARAMS "dirprm/hdfs.props" 
GETUPDATEBEFORES 
TABLE dbo.*;

The following is explanation of the Replicat configuration entries:

EXTRACT hdfs - The Extract process name.

discardfile ./dirrpt/avro1.dsc, purge - Set the discard file

--SOURCEDEFS ./dirdef/dbo.def - Source definitions are not required for 12.2 trial files.

CUSEREXIT libjavaue.so CUSEREXIT PASSTHRU, INCLUDEUPDATEBEFORES, PARAMS "dirprm/hdfs.props" - Set you exit shared library, and point to the Java Adapter Properties file

GETUPDATEBEFORES - Get update before images.

TABLE dbo.*; - Select which tables to replicate or exclude to filter.

1.6.2 Adding the Extract Process

ADD EXTRACT hdfs, EXTTRAILSOURCE ./dirdat/gg
START hdfs

1.6.3 Extract Grouping

The Extract process provides no functionality for transaction grouping. However, transaction grouping is still possible when integrating Java Delivery with the Extract process. The Java Delivery layer enables transaction grouping with configuration in the Java Adapter properties file.

  1. gg.handler.name.mode

    To enable grouping, the value of this property must be set to tx.

  2. gg.handler.name.maxGroupSize

    Controls the maximum number of operations that can be held by an operation group - irrespective of whether the operation group holds operations from a single transaction or multiple transactions.

    The operation group will send a transaction commit and end the group as soon as this number of operations is reached. This property leads to splitting of transactions across multiple operation groups.

  3. gg.handler.name.minGroupSize

    This is the minimum number of operations that must exist in a group before the group can end.

    This property helps to avoid groups that are too small by grouping multiple small transactions into one operation group so that it can be more efficiently processed.

    Note:

    maxGroupSize should always be greater than or equal to minGroupSize; that is, maxGroupSize >= minGroupSize.

Note:

It is not recommended to use the Java layer transaction grouping when running Java Delivery with the Replicat process. If running with the Replicat process, you should use Replicat transaction grouping controlled by the GROUPTRANSOPS Replicat property.

1.7 Logging

Logging is essential to troubleshooting Oracle GoldenGate for Big Data integrations with Big Data targets. This section covers how Oracle GoldenGate for Big Data integration log and the best practices for logging.

1.7.1 Extract or Replicat Process Logging

Oracle GoldenGate for Big Data integrations leverage the Java Delivery functionality described in the Oracle GoldenGate Application Adapters Guide. In this setup, either a Oracle GoldenGate Replicat or Extract process loads a user exit shared library. This shared library then loads a Java virtual machine to thereby interface with targets providing a Java interface. So the flow of data is as follows:

Extract Process > User Exit > Java Layer

or

Replicat Process >User Exit > Java Layer

It is important that all layers log correctly so that users can review the logs to troubleshoot new installations and integrations. Additionally, if a customer has a problem that requires contacting Oracle Support, the log files are a key piece of information to be provided to Oracle Support so that the problem can be efficiently resolved.

A running Replicat or Extract process creates or appends log files into the <GG Home>/dirrpt directory that adheres to the following naming convention: <Replicat or Extract process name>.rpt. If a problem is encountered when deploying a new Oracle GoldenGate process, this is likely the first log file to examine for problems. The Java layer provides much of the heavy lifting for integrations with Big Data applications. Therefore are many things that can go wrong in the Java layer when performing the initial setup of a Oracle GoldenGate Big Data integration. You therefore need to understand how to control logging in the Java layer.

1.7.2 Java Layer Logging

The Oracle GoldenGate for Big Data product provides flexibility for logging from the Java layer. The recommended best practice is to use Log4j logging to log from the Java layer. Enabling simple Log4j logging requires the setting of two configuration values in the Java Adapters configuration file.

gg.log=log4j
gg.log.level=INFO

These gg.log settings will result in a Log4j file to be created in the GoldenGate_Home/dirrpt directory that adheres to this naming convention, Replicat or Extract process name_log level_log4j.log. The supported Log4j log levels are in the following list in order of increasing logging granularity.

  • OFF

  • FATAL

  • ERROR

  • WARN

  • INFO

  • DEBUG

  • TRACE

Selection of a logging level will include all of the coarser logging levels as well (that is, selection of WARN means that log messages of FATAL, ERROR and WARN will be written to the log file). The Log4j logging can additionally be controlled by separate Log4j properties files. These separate Log4j properties files can be enabled by editing the bootoptions property in the Java Adapter Properties file. Three example Log4j properties files are included with the installation and, are included in the classpath:

log4j-default.properties
log4j-debug.properites
log4j-trace.properties

Any one of these files can be modifying the bootoptions as follows:

javawriter.bootoptions=-Xmx512m -Xms64m -Djava.class.path=.:ggjava/ggjava.jar -Dlog4j.configuration=samplelog4j.properties

You can use their own customized Log4j properties file to control logging. The customized Log4j properties file must be available in the Java classpath so that it can be located and loaded by the JVM. The contents of a sample custom Log4j properties file is the following:

# Root logger option 
log4j.rootLogger=INFO, file 
 
# Direct log messages to a log file 
log4j.appender.file=org.apache.log4j.RollingFileAppender 
 
log4j.appender.file.File=sample.log 
log4j.appender.file.MaxFileSize=1GB 
log4j.appender.file.MaxBackupIndex=10 
log4j.appender.file.layout=org.apache.log4j.PatternLayout 
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

There are two important requirements when you use a custom Log4j properties file. First, the path to the custom Log4j properties file must be included in the javawriter.bootoptions property. Logging initializes immediately when the JVM is initialized while the contents of the gg.classpath property is actually appended to the classloader after the logging is initialized. Second, the classpath to correctly load a properties file must be the directory containing the properties file without wildcards appended.

1.8 Metadata Change Events

A new feature of Oracle GoldenGate 12.2 is to propagate metadata change events from the source database to the trail file. This functionality is limited to Oracle Database replication sources for the 12.2 release. Refer to the Oracle GoldenGate for Oracle Database documentation for information on how to enable this functionality.

The Oracle GoldenGate for Big Data Handlers and Formatters provide functionality to take action when a metadata change event is encountered. The ability to take action in the case of metadata change events depends on the metadata change events being available in the source trail file. Oracle GoldenGate 12.2 supports metadata in trail and the propagation of DDL data from a source Oracle Database. If the source trail file does not have metadata in trail and DDL data (metadata change events) then it is not possible for Oracle GoldenGate for Big Data to provide and metadata change event handling.

1.9 Configuration Property CDATA[] Wrapping

The Big Data Handlers and Formatters support the configuration of many parameters in the Java properties file the value of which may be interpreted as white space. The configuration handling of the Java Adapter is such that it will trim white space from configuration values from the Java configuration file. This behavior of trimming whitespace may be desirable for some configuration values and undesirable for other configuration values. The default functionality of trimming the whitespace was left in place. New functionality was added whereby you can wrap white space values inside of special syntax in order to preserve the whites pace for selected configuration variables. Oracle for Big Data borrows the XML syntax of CDATA[] to preserve white space. Values that would be considered to be white space can be wrapped inside of CDATA[].

The following is an example attempting to set a new-line delimiter for the Delimited Text Formatter:

gg.handler.{name}.format.lineDelimiter=\n

This configuration will not be successful. The new-line character is interpreted as white space and will be trimmed from the configuration value. Therefore the gg.handler setting effectively results in the line delimiter being set to an empty string.

In order to preserve the configuration of the new-line character simply wrap the character in the CDATA[] wrapper as follows:

gg.handler.{name}.format.lineDelimiter=CDATA[\n]

Configuring the parameter with the CDATA[] wrapping will preserve the white space and the line delimiter will now be a new-line character. Parameters that support CDATA[] wrapping are explicitly listed in this documentation.

1.10 Using Regular Expression Search and Replace

You can perform more powerful search and replace operations of both schema data (catalog names, schema names, table names, and column names) and column value data, which are separately configured. Regular expressions (regex) are characters that customize a search string through pattern matching. You can match a string against a pattern or extract parts of the match. Oracle GoldenGate for Big Data uses the standard Oracle Java regular expressions package, java.util.regex. For more information, see "Regular Expressions” in the Base Definitions volume at The Single UNIX Specification, Version 4.

1.10.1 Using Schema Data Replace

You can replace schema data using the gg.schemareplaceregex and gg.schemareplacestring parameters. Use gg.schemareplaceregex to set a regular expression, and then use it to search catalog names, schema names, table names, and column names for corresponding matches. Matches are then replaced with the content of the gg.schemareplacestring value. The default value of gg.schemareplacestring is an empty string or "".

For example, some system table names start with a dollar sign like $mytable. You may want to replicate these tables even though most Big Data targets do not allow dollar signs in table names. To remove the dollar sign, you could configure the following replace strings:

gg.schemareplaceregex=[$] 
gg.schemareplacestring= 

The resulting example of searched and replaced table name is mytable. These parameters also support CDATA[] wrapping to preserve whitespace in the value of configuration values. So the equivalent of the preceding example using CDATA[] wrapping use is:

gg.schemareplaceregex=CDATA[[$]]
gg.schemareplacestring=CDATA[]

The schema search and replace functionality only supports a single search regular expression and a single replacement string.

1.10.2 Using Content Data Replace

You can replace content data using the gg.contentreplaceregex and gg.contentreplacestring parameters to search the column values using the configured regular expression and replace matches with the replacement string. For example, this is useful to replace line feed characters in column values. If the delimited text formatter is used then line feeds occurring in the data will be incorrectly interpreted as line delimiters by analytic tools.

You can configure n number of content replacement regex search values. The regex search and replacements are done in the order of configuration. Configured values must follow a given order as follows:

gg.conentreplaceregex=some_regex
gg.conentreplacestring=some_value
gg.conentreplaceregex1=some_regex
gg.conentreplacestring1=some_value
gg.conentreplaceregex2=some_regex
gg.conentreplacestring2=some_value

Configuring a subscript of 3 without a subscript of 2 would cause the subscript 3 configuration to be ignored.

Attention

 Regular express searches and replacements require computer processing and can reduce the performance of the Oracle GoldenGate for Big Data process.

To replace line feeds with a blank character you could use the following parameter configurations:

gg.contentreplaceregex=[\n] 
gg.contentreplacestring=CDATA[ ]

This changes the column value from:

this is 
me

to :

this is me

Both values support CDATA wrapping. The second value must be wrapped in a CDATA[] wrapper because a single blank space will be interpreted as whitespace and trimmed by the Oracle GoldenGate for Big Data configuration layer. In addition, you can configure multiple search a replace strings. For example, you may also want to trim leading and trailing white space out of column values in addition to trimming line feeds from:

^\\s+|\\s+$
gg.contentreplaceregex1=^\\s+|\\s+$ 
gg.contentreplacestring1=CDATA[]

to:

1.11 Using Identities in Oracle GoldenGate Credential Store

The Oracle GoldenGate credential store manages user IDs and their encrypted passwords (together known as credentials) that are used by Oracle GoldenGate processes to interact with the local database. The credential store eliminates the need to specify user names and clear-text passwords in the Oracle GoldenGate parameter files. An optional alias can be used in the parameter file instead of the user ID to map to a userid-password pair in the credential store. The credential store is implemented as an autologin wallet within the Oracle Credential Store Framework (CSF). The use of an LDAP directory is not supported for the Oracle GoldenGate credential store. The autologin wallet supports automated restarts of Oracle GoldenGate processes without requiring human intervention to supply the necessary passwords.

In Oracle GoldenGate for Big Data, you specify the alias and domain in the property file not the actual user ID or password.

User credentials are maintained in secure wallet storage

1.11.1 Creating a Credential Store

You can create a credential store for your Big Data environment.

Run the GGSCI ADD CREDENTIALSTORE command to create a file called cwallet.sso in the dircrd/ subdirectory of your Oracle GoldenGate installation directory (the default).

You can the location of the credential store (cwallet.sso file by specifying the desired location with the CREDENTIALSTORELOCATION parameter in the GLOBALS file.

For more information about credential store commands, see Reference for Oracle GoldenGate for Windows and UNIX.

Note:

Only one credential store can be used for each Oracle GoldenGate instance.

1.11.2 Adding Users to a Credential Store

After you create a credential store for your Big Data environment, you can added users to the store.

Run the GGSCI ALTER CREDENTIALSTORE ADD USER userid PASSWORD password [ALIAS alias] [DOMAIN domain] command to create each user, where:

  • userid is the user name. Only one instance of a user name can exist in the credential store unless the ALIAS or DOMAIN option is used.

  • password is the user's password. The password is echoed (not obfuscated) when this option is used. If this option is omitted, the command prompts for the password, which is obfuscated as it is typed (recommended because it is more secure).

  • alias is an alias for the user name. The alias substitutes for the credential in parameters and commands where a login credential is required. If the ALIAS option is omitted, the alias defaults to the user name.

For example:

ALTER CREDENTIALSTORE ADD USER scott PASSWORD tiger ALIAS scsm2 domain ggadapters

For more information about credential store commands, see Reference for Oracle GoldenGate for Windows and UNIX.

1.11.3 Configuring Properties to Access the Credential Store

The Oracle GoldenGate Java Adapter properties file requires specific syntax to resolve user name and password entries in the Credential Store at runtime. For resolving a user name the syntax is the following:

ORACLEWALLETUSERNAME alias domain_name

For resolving a password the syntax required is the following:

ORACLEWALLETPASSWORD alias domain_name

The following example illustrate how to configure a Credential Store entry with an alias of myalias and a domain of mydomain.

Note:

With HDFS Hive JDBCthe user name and password is encrypted.

gg.handler.hdfs.hiveJdbcUsername=ORACLEWALLETUSERNAME[myalias mydomain] 
gg.handler.hdfs.hiveJdbcPassword=ORACLEWALLETPASSWORD[myalias mydomain]

Although the Credential Store is intended to store user name and password pair type credentials, you can apply this functionality more generically. Consider the user name and password entries as accessible values in the Credential Store. Any configuration parameter resolved in the Java Adapter layer (not accessed in the C user exit layer) can be resolved from the Credential Store. This feature is developed to allow you more flexibility to be creative in how you protect sensitive configuration entries.