6.1 Configuring Oracle GoldenGate for Distributed Applications and Analytics
This topic describes how to configure GG for DAA handlers.
- Running with Replicat
You need to review before configuring a replicat process in Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA). - About Schema Evolution and Metadata Change Events
- About Configuration Property CDATA[] Wrapping
- Using Regular Expression Search and Replace
You can perform more powerful search and replace operations of both schema data (catalog names, schema names, table names, and column names) and column value data, which are separately configured. Regular expressions (regex
) are characters that customize a search string through pattern matching. - Scaling Oracle GoldenGate for Distributed Applications and Analytics Delivery
- Configuring Cluster High Availability
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) doesn't have built-in high availability functionality. You need to use a standard cluster software's high availability capability to provide the high availability functionality. - Using Identities in Oracle GoldenGate Credential Store
The Oracle GoldenGate credential store manages user IDs and their encrypted passwords (together known as credentials) that are used by Oracle GoldenGate processes to interact with the local database. The credential store eliminates the need to specify user names and clear-text passwords in the Oracle GoldenGate parameter files.
Parent topic: Configure
6.1.1 Running with Replicat
You need to review before configuring a replicat process in Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA).
This topic explains how to run the Java Adapter with the Oracle GoldenGate Replicat process.
6.1.1.1 Replicat Grouping
The Replicat process provides the Replicat configuration property, GROUPTRANSOPS
, to control transaction grouping. By default, the Replicat process implements transaction grouping of 1000 source transactions into a single target transaction. If you want to turn off transaction grouping then the GROUPTRANSOPS
Replicat property should be set to 1
.
Parent topic: Running with Replicat
6.1.1.2 About Replicat Checkpointing
In addition to the Replicat checkpoint file ,.cpr
, an additional checkpoint file, dirchk/group.cpj
, is created that contains information similar to CHECKPOINTTABLE
in Replicat for the database.
Parent topic: Running with Replicat
6.1.1.3 About Initial Load Support
Replicat can already read trail files that come from both the online capture and initial load processes that write to a set of trail files. In addition, Replicat can also be configured to support the delivery of the special run initial load process using RMTTASK
specification in the Extract parameter file. For more details about configuring the direct load, see Loading Data with an Oracle GoldenGate Direct Load.
Note:
The SOURCEDB
or DBLOGIN
parameter specifications vary depending on your source database.
Parent topic: Running with Replicat
6.1.1.4 About the Unsupported Replicat Features
The following Replicat features are not supported in this release:
-
BATCHSQL
-
SQLEXEC
-
Stored procedure
-
Conflict resolution and detection (CDR)
Parent topic: Running with Replicat
6.1.1.5 How the Mapping Functionality Works
The Oracle GoldenGate Replicat process supports mapping functionality to custom target schemas. You must use the Metadata Provider functionality to define a target schema or schemas, and then use the standard Replicat mapping syntax in the Replicat configuration file to define the mapping. For more information about the Replicat mapping syntax in the Replication configuration file, see Mapping and Manipulating Data.
Parent topic: Running with Replicat
6.1.2 About Schema Evolution and Metadata Change Events
The Metadata in trail is a feature that allows seamless runtime handling of metadata
change events by Oracle GoldenGate for Distributed
Applications and Analytics (GG for DAA), including
schema evolution and schema propagation to GG for
DAA target applications. The
NO_OBJECTDEFS
is a sub-parameter
of the Extract and Replicat
EXTTRAIL
and
RMTTRAIL
parameters that lets you
suppress the important metadata in trail feature
and revert to using a static metadata
definition.
The Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) Handlers and Formatters provide functionality to take action when a metadata change event is encountered. The ability to take action in the case of metadata change events depends on the metadata change events being available in the source trail file. Oracle GoldenGate supports metadata in trail and the propagation of DDL data from a source Oracle Database. If the source trail file does not have metadata in trail and DDL data (metadata change events) then it is not possible for GG for DAA to provide and metadata change event handling.
6.1.3 About Configuration Property CDATA[] Wrapping
The Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA)
Handlers and Formatters support the configuration of many parameters in the Java
properties file, the value of which may be interpreted as white space. The
configuration handling of the Java Adapter trims white space from configuration
values from the Java configuration file. This behavior of trimming whitespace may be
desirable for some configuration values and undesirable for other configuration
values. Alternatively, you can wrap white space values inside of special syntax to
preserve the whites pace for selected configuration variables. GG for DAA borrows
the XML syntax of CDATA[]
to preserve white space. Values that
would be considered to be white space can be wrapped inside of
CDATA[]
.
The following is an example attempting to set a new-line delimiter for the Delimited Text Formatter:
gg.handler.{name}.format.lineDelimiter=\n
This configuration will not be successful. The new-line character is interpreted as white space and will be trimmed from the configuration value. Therefore the gg.handler
setting effectively results in the line delimiter being set to an empty string.
In order to preserve the configuration of the new-line character simply wrap the character in the CDATA[]
wrapper as follows:
gg.handler.{name}.format.lineDelimiter=CDATA[\n]
Configuring the property with the CDATA[]
wrapping preserves the white space and the line delimiter will then be a new-line character.
6.1.4 Using Regular Expression Search and Replace
You can perform more powerful search and replace operations of both schema data
(catalog names, schema names, table names, and column names) and column value data, which
are separately configured. Regular expressions (regex
) are characters that
customize a search string through pattern matching.
You can match a string against a pattern or extract parts of the match. Oracle
GoldenGate for Distributed Applications and Analytics (GG for DAA) uses the standard
Oracle Java regular expressions package, java.util.regex
, see Regular
Expressions in The Single UNIX Specification, Version 4.
6.1.4.1 Using Schema Data Replace
You can replace schema data using the gg.schemareplaceregex
and gg.schemareplacestring
properties. Use gg.schemareplaceregex
to set a regular expression, and then use it to search catalog names, schema names, table names, and column names for corresponding matches. Matches are then replaced with the content of the gg.schemareplacestring
value. The default value of gg.schemareplacestring
is an empty string or ""
.
For example, some system table names start with a dollar sign like
$mytable
. You may want to replicate these tables even though most
technologies do not allow dollar signs in table names. To remove the dollar sign, you
could configure the following replace strings:
gg.schemareplaceregex=[$]
gg.schemareplacestring=
The resulting example of searched and replaced table name is mytable
. These properties also support CDATA[]
wrapping to preserve whitespace in the value of configuration values. So the equivalent of the preceding example using CDATA[]
wrapping use is:
gg.schemareplaceregex=CDATA[[$]]
gg.schemareplacestring=CDATA[]
The schema search and replace functionality supports using multiple search regular expressions and replacements strings using the following configuration syntax:
gg.schemareplaceregex=some_regex
gg.schemareplacestring=some_value
gg.schemareplaceregex1=some_regex
gg.schemareplacestring1=some_value
gg.schemareplaceregex2=some_regex
gg.schemareplacestring2=some_value
Parent topic: Using Regular Expression Search and Replace
6.1.4.2 Using Content Data Replace
You can replace content data using the gg.contentreplaceregex
and gg.contentreplacestring
properties to search the column values using the configured regular expression and replace matches with the replacement string. For example, this is useful to replace line feed characters in column values. If the delimited text formatter is used then line feeds occurring in the data will be incorrectly interpreted as line delimiters by analytic tools.
You can configure n number of content replacement regex search values. The regex search and replacements are done in the order of configuration. Configured values must follow a given order as follows:
gg.contentreplaceregex=some_regex
gg.contentreplacestring=some_value
gg.contentreplaceregex1=some_regex
gg.contentreplacestring1=some_value
gg.contentreplaceregex2=some_regex
gg.contentreplacestring2=some_value
Configuring a subscript of 3 without a subscript of 2 would cause the subscript 3 configuration to be ignored.
Attention:
Regular express searches and replacements require computer processing and can reduce the performance of the Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) process.
To replace line feeds with a blank character you could use the following property configurations:
gg.contentreplaceregex=[\n]
gg.contentreplacestring=CDATA[ ]
This changes the column value from:
this is
me
to :
this is me
Both values support CDATA
wrapping. The second value must be wrapped
in a CDATA[]
wrapper because a
single blank space will be interpreted as whitespace
and trimmed by the GG for DAA configuration layer.
In addition, you can configure multiple search a
replace strings. For example, you may also want to
trim leading and trailing white space out of column
values in addition to trimming line feeds from:
^\\s+|\\s+$
gg.contentreplaceregex1=^\\s+|\\s+$
gg.contentreplacestring1=CDATA[]
Parent topic: Using Regular Expression Search and Replace
6.1.5 Scaling Oracle GoldenGate for Distributed Applications and Analytics Delivery
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) supports breaking down the source trail files into either multiple Replicat processes or by using Coordinated Delivery to instantiate multiple Java Adapter instances inside a single Replicat process to improve throughput. This allows you to scale GG for DAA delivery.
There are some cases where the throughput to GG for DAA integration targets is not sufficient to meet your service level agreements even after you have tuned your handler for maximum performance. When this occurs, you can configure parallel processing and delivery to your targets using one of the following methods:
-
Multiple Replicat processes can be configured to read data from the same source trail files. Each of these Replicat processes are configured to process a subset of the data in the source trail files so that all of the processes collectively process the source trail files in their entirety. There is no coordination between the separate Replicat processes using this solution.
-
Oracle GoldenGate Coordinated Delivery can be used to parallelize processing the data from the source trail files within a single Replicat process. This solution involves breaking the trail files down into logical subsets for which each configured subset is processed by a different delivery thread. For more information abour Co-ordinated Replicat, see About Coordinated Replicat in the Oracle GoldenGate Microservices Architecture Documentation.
With either method, you can split the data into parallel processing for improved throughput. Oracle recommends breaking the data down in one of the following two ways:
-
Splitting Source Data By Source Table –Data is divided into subsections by source table. For example, Replicat process 1 might handle source tables table1 and table 2, while Replicat process 2 might handle data for source tables table3 and table2. Data is split for source table and the individual table data is not subdivided.
-
Splitting Source Table Data into Sub Streams – Data from source tables is split. For example, Replicat process 1 might handle half of the range of data from source table1, while Replicat process 2 might handler the other half of the data from source table1.
- If you are using Coordinated Replicat, please make sure that you add
TARGETDB LIBFILE libggjava.so SET property=path_to_deployment_home/etc/conf/ogg/your_replicat_name.properties
.
Additional limitations:
-
Parallel apply is not supported.
-
The
BATCHSQL
parameter not supported.
Example 6-1 Scaling Support for the Oracle GoldenGate for Distributed Applications and Analytics Handlers
Handler Name | Splitting Source Data By Source Table | Splitting Source Table Data into Sub Streams |
---|---|---|
Cassandra |
Supported |
Supported when:
|
Elastic Search |
Supported |
Supported |
HBase |
Supported when all required HBase namespaces are pre-created in HBase. |
Supported when:
|
HDFS |
Supported |
Supported with some restrictions.
|
JDBC |
Supported |
Supported |
Kafka |
Supported |
Supported for formats that support schema propagation, such as Avro. This is less desirable due to multiple instances feeding the same schema information to the target. |
Kafka Connect |
Supported |
Supported |
Kinesis Streams |
Supported |
Supported |
MongoDB |
Supported |
Supported |
Java File Writer | Supported | Supported with the following restrictions:
You
must select a naming convention for generated files where the
file names do not collide. Colliding file names may results in a
Replicat abend and/or polluted data. When using coordinated
apply it is suggested that you configure
|
6.1.6 Configuring Cluster High Availability
Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) doesn't have built-in high availability functionality. You need to use a standard cluster software's high availability capability to provide the high availability functionality.
You can configure a high availability scenario on a cluster so that if the leader instance of (GG for DAA) on machine fails, another GG for DAA instance could be started on another machine to resume where the failed instance left off.
If you manually configure your instances to share common GG for DAA and Oracle GoldenGate files using a shared disk architecture you can create a fail over situation. For a cluster installation, these files would need to accessible from all machines and accessible in the same location.
The configuration files that must be shared are:
-
replicat.prm
-
Handler properties file.
-
Additional properties files required by the specific adapter. This depends on the target handler in use. For example, Kafka would be a producer properties file.
-
Additional schema files you've generated. For example, Avro schema files generated in the
dirdef
directory. -
File Writer Handler generated files on your local file system at a configured path. Also, the File Writer Handler state file in the
dirsta
directory. -
Any
log4j.properties
orlogback.properties
files in use.
Checkpoint files must be shared for the ability to resume processing:
-
Your Replicat checkpoint file (
*.cpr
). -
Your adapter checkpoint file (
*.cpj
).
6.1.7 Using Identities in Oracle GoldenGate Credential Store
The Oracle GoldenGate credential store manages user IDs and their encrypted passwords (together known as credentials) that are used by Oracle GoldenGate processes to interact with the local database. The credential store eliminates the need to specify user names and clear-text passwords in the Oracle GoldenGate parameter files.
An optional alias can be used in the parameter file instead of the user ID to map to a userid and password pair in the credential store. The credential store is implemented as an auto login wallet within the Oracle Credential Store Framework (CSF). The use of an LDAP directory is not supported for the Oracle GoldenGate credential store. The auto login wallet supports automated restarts of Oracle GoldenGate processes without requiring human intervention to supply the necessary passwords.
In Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA), you specify the alias and domain in the property file not the actual user ID or password. User credentials are maintained in secure wallet storage.
6.1.7.1 Creating a Credential Store
You can create a credential store for your Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) environment.
Run the GGSCI ADD CREDENTIALSTORE
command to create a file called cwallet.sso
in the dircrd/
subdirectory of your Oracle GoldenGate installation directory (the default).
You can the location of the credential store (cwallet.sso
file by specifying the desired location with the CREDENTIALSTORELOCATION
parameter in the GLOBALS
file.
For more information about credential store commands, see Reference for Oracle GoldenGate.
Note:
Only one credential store can be used for each Oracle GoldenGate instance.
Parent topic: Using Identities in Oracle GoldenGate Credential Store
6.1.7.2 Adding Users to a Credential Store
After you create a credential store for your Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) environment, you can added users to the store.
Run the GGSCI ALTER CREDENTIALSTORE ADD USER userid PASSWORD password [ALIAS alias] [DOMAIN domain]
command to create each user, where:
-
userid
is the user name. Only one instance of a user name can exist in the credential store unless theALIAS
orDOMAIN
option is used. -
password
is the user's password. The password is echoed (not obfuscated) when this option is used. If this option is omitted, the command prompts for the password, which is obfuscated as it is typed (recommended because it is more secure). -
alias
is an alias for the user name. The alias substitutes for the credential in parameters and commands where a login credential is required. If theALIAS
option is omitted, the alias defaults to the user name.
For example:
ALTER CREDENTIALSTORE ADD USER scott PASSWORD tiger ALIAS scsm2 domain ggadapters
For more information about credential store commands, see Reference for Oracle GoldenGate.
Parent topic: Using Identities in Oracle GoldenGate Credential Store
6.1.7.3 Configuring Properties to Access the Credential Store
The Oracle GoldenGate Java Adapter properties file requires specific syntax to resolve user name and password entries in the Credential Store at runtime. For resolving a user name the syntax is the following:
ORACLEWALLETUSERNAME[alias domain_name]
For resolving a password the syntax required is the following:
ORACLEWALLETPASSWORD[alias domain_name]
The following example illustrate how to configure a Credential Store entry with an alias of myalias
and a domain of mydomain
.
Note:
With HDFS Hive JDBC the user name and password is encrypted.Oracle Wallet integration only works for configuration properties which contain the string username or password. For example:
gg.handler.hdfs.hiveJdbcUsername=ORACLEWALLETUSERNAME[myalias mydomain]
gg.handler.hdfs.hiveJdbcPassword=ORACLEWALLETPASSWORD[myalias mydomain]
ORACLEWALLETUSERNAME
and ORACLEWALLETPASSWORD
can be
used in the Extract (similar to Replicat) in JMS handler as well. For example:
gg.handler.<name>.user=ORACLEWALLETUSERNAME[JMS_USR JMS_PWD] gg.handler.<name>.password=ORACLEWALLETPASSWORD[JMS_USR JMS_PWD]
Consider the user name and password entries as accessible values in the Credential Store. Any configuration property resolved in the Java Adapter layer (not accessed in the C user exit layer) can be resolved from the Credential Store. This allows you more flexibility to be creative in how you protect sensitive configuration entries.
Parent topic: Using Identities in Oracle GoldenGate Credential Store