10.1 Source
The Extract process is configured to run against the source technology, capturing data generated in the true source tecnology located somewhere else. This process is the extraction or the data capture mechanism of GG for DAA.
- Initial Load Extract: When you set up GG for DAA for initial loads, the Extract process captures the current, static set of data directly from the source objects. This configuration of Extract process uses source source to capture data.
-
Change Data Capture Extract: When you set up GG for DAA to keep the source data synchronized with another set of data, the Extract process captures the DML and (if supported) DDL operations performed on the configured objects after the initial synchronization has taken place. It stores these operations until it receives commit records or rollbacks for the transactions that contain them. If it receives a rollback, it discards the operations for that transaction. If it receives a commit, it persists the transaction to disk in a series of files called a trail, where it is queued for propagation to the target system. All the operations in each transaction are written to the trail and are in the order in which they were committed to the source technology. This design ensures both speed and data integrity. The format of the data written to trail files depends on the source technology.
- Add Extract
- Amazon MSK
- Amazon DocumentDB
To capture messages from Amazon DocumentDB and convert them into logical change records using Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA), you can utilize MongoDB Extract. - Apache Cassandra
The Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) capture (Extract) for Cassandra Extract is used to get changes from Apache Cassandra databases. - Apache Kafka
The Oracle GoldenGate capture (Extract) for Kafka is used to read messages from a Kafka topic or topics and convert data into logical change records written to GoldenGate trail files. This section explains how to use Oracle GoldenGate capture for Kafka. - Azure Event Hubs
- Confluent Kafka
- DataStax
- Java Message Service (JMS)
- MongoDB
The Oracle GoldenGate capture (Extract) for MongoDB is used to get changes from MongoDB databases. - Microsoft Fabric Eventstreams
This chapter describes how to use the Microsoft Fabric eventstreams . - OCI Streaming
- Setting Up and Running the Kinesis Streams Handler
Parent topic: Replicate Data
10.1.1 Setting Up and Running the Kinesis Streams Handler
Instructions for configuring the Kinesis Streams Handler components and running the handler are described in the following sections.
Use the following steps to set up the Kinesis Streams Handler:
- Create an Amazon AWS account at https://aws.amazon.com/.
- Log into Amazon AWS.
- From the main page, select Kinesis (under the Analytics subsection).
- Select Amazon Kinesis Streams Go to Streams to create Amazon Kinesis streams and shards within streams.
- Create a client ID and secret to access Kinesis.
The Kinesis Streams Handler requires these credentials at runtime to successfully connect to Kinesis.
- Create the client ID and secret:
- Select your name in AWS (upper right), and then in the list select My Security Credentials.
- Select Access Keys to create and
manage access keys.
Note your client ID and secret upon creation.
The client ID and secret can only be accessed upon creation. If lost, you have to delete the access key, and then recreate it.
- Set the Classpath in Kinesis Streams Handler
- Kinesis Streams Handler Configuration
- Using Templates to Resolve the Stream Name and Partition Name
- Resolving AWS Credentials
- Configuring the Proxy Server for Kinesis Streams Handler
- Configuring Security in Kinesis Streams Handler
Parent topic: Source
10.1.1.1 Set the Classpath in Kinesis Streams Handler
You must configure the gg.classpath property
in the Java Adapter properties file to specify the JARs for the AWS Kinesis Java SDK
as follows:
gg.classpath=
{download_dir}/aws-java-sdk-2.28.11/lib/*:{download_dir}
/aws-java-sdk-2.28.11/third-party/lib/*Parent topic: Setting Up and Running the Kinesis Streams Handler
10.1.1.2 Kinesis Streams Handler Configuration
You configure the Kinesis Streams Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).
To enable the selection of the Kinesis Streams Handler, you must first configure the
handler type by specifying
gg.handler.name.type=kinesis_streams and the other
Kinesis Streams properties as follows:
Table 10-1 Kinesis Streams Handler Configuration Properties
| Properties | Required/ Optional | Legal Values | Default | Explanation |
|---|---|---|---|---|
gg.handler.name.type |
Required |
|
None |
Selects the Kinesis Streams Handler for streaming change data capture into Kinesis. |
gg.handler.name.mode |
Optional | op or tx |
op |
Choose the operating mode. |
gg.handler.name.region |
Required |
The Amazon region name which is hosting your Kinesis instance. |
None |
Setting of the Amazon AWS region name is required. |
gg.handler.name.proxyServer |
Optional |
The host name of the proxy server. |
None |
Set the host name of the proxy server if connectivity to AWS is required to go through a proxy server. |
gg.handler.name.proxyPort |
Optional |
The port number of the proxy server. |
None |
Set the port name of the proxy server if connectivity to AWS is required to go through a proxy server. |
gg.handler.name.proxyUsername |
Optional |
The username of the proxy server (if credentials are required). |
None |
Set the username of the proxy server if connectivity to AWS is required to go through a proxy server and the proxy server requires credentials. |
gg.handler.name.proxyPassword |
Optional |
The password of the proxy server (if credentials are required). |
None |
Set the password of the proxy server if connectivity to AWS is required to go through a proxy server and the proxy server requires credentials. |
gg.handler.name.deferFlushAtTxCommit |
Optional |
|
|
When set to false, the Kinesis Streams Handler will flush data to Kinesis at transaction commit for write durability. However, it may be preferable to defer the flush beyond the transaction commit for performance purposes, see Kinesis Handler Performance Considerations. |
gg.handler.name.deferFlushOpCount |
Optional |
Integer |
None |
Only applicable if |
gg.handler.name.formatPerOp |
Optional |
|
|
When set to |
gg.handler.name.customMessageGrouper |
Optional |
oracle.goldengate.handler.kinesis.KinesisJsonTxMessageGrouper |
None |
This configuration parameter provides the ability to group Kinesis messages using custom logic. Only one implementation is included in the distribution at this time. The |
gg.handler.name.streamMappingTemplate |
Required |
A template string value to resolve the Kinesis message partition key (message key) at runtime. |
None |
See Using Templates to Resolve the Stream Name and Partition Name for more information. |
gg.handler.name.partitionMappingTemplate |
Required |
A template string value to resolve the Kinesis message partition key (message key) at runtime. |
None |
See Using Templates to Resolve the Stream Name and Partition Name for more information. |
gg.hander.name.format |
Required |
Any supported pluggable formatter. |
|
Selects the operations message formatter. JSON is likely the best fit for Kinesis. |
|
|
Optional |
|
|
By default, the Kinesis Handler automatically creates Kinesis streams if they do not already exist. Set to |
|
|
Optional |
Positive integer. |
|
A Kinesis stream contains one or more shards. Controls the number of shards on
Kinesis streams that the Kinesis Handler creates. Multiple
shards can help improve the ingest performance to a Kinesis
stream. Use only when
|
|
|
Optional |
|
|
Sets the proxy protocol connection to the proxy server for additional level of security. The client first performs an SSL handshake with the proxy server, and then an SSL handshake with Amazon AWS. This feature was added into the Amazon SDK in version 1.11.396 so you must use at least that version to use this property. |
gg.handler.name.enableSTS |
Optional | true | false |
false |
Set to true, to enable the Kinesis
Handler to access Kinesis credentials from the AWS Security Token
Service. Ensure that the AWS Security Token Service is enabled if
you set this property to true.
|
gg.handler.name.STSRegion |
Optional | Any legal AWS region specifier. | The region is obtained from the
gg.handler.name.region property.
|
Use to resolve the region for the STS call. It's only
valid if the gg.handler.name.enableSTS property is
set to true. You can set a different AWS region for
resolving credentials from STS than the configured Kinesis region.
|
gg.handler.name.kinesis.accessKeyId |
Optional | A valid AWS access key. | None | Set this parameter to explicitly set the access key
for AWS. This parameter has no effect if
gg.handler.name.enableSTS is set to
true. If unset, credentials resolution falls
back to the AWS default credentials provider chain. Optionally, you
can configure the session token
(gg.handler.kinesis.sessionToken), which
indicates temporary credentials. The access key and secret key MUST
be set for the session token configuration to be valid.
|
gg.handler.name.kinesis.secretKey |
Optional | A valid AWS secret key. | None | Set this parameter to explicitly set the secret key
for AWS. This parameter has no effect if
gg.handler.name.enableSTS is set to
true. If unset, credentials resolution falls
back to the AWS default credentials provider chain. Optionally, you
can configure the session token
(gg.handler.kinesis.sessionToken), which
indicates temporary credentials. The access key and secret key MUST
be set for the session token configuration to be valid.
|
Parent topic: Setting Up and Running the Kinesis Streams Handler
10.1.1.3 Using Templates to Resolve the Stream Name and Partition Name
The Kinesis Streams Handler provides the functionality to resolve the stream name and the partition key at runtime using a template configuration value. Templates allow you to configure static values and keywords. Keywords are used to dynamically replace the keyword with the context of the current processing. Templates are applicable to the following configuration parameters:
gg.handler.name.streamMappingTemplate
gg.handler.name.partitionMappingTemplateSource database transactions are made up of 1 or more
individual operations which are the individual inserts, updates, and deletes. The
Kinesis Handler can be configured to send one message per operation (insert, update,
delete, Alternatively, it can be configured to group operations into messages at the
transaction level. Many of the template keywords resolve data based on the context
of an individual source database operation. Therefore, many of the keywords do
not work when sending messages at the transaction level. For example
${fullyQualifiedTableName} does not work when sending messages
at the transaction level. The ${fullyQualifiedTableName}
property resolves to the qualified source table name for an operation. Transactions
can contain multiple operations for many source tables. Resolving the
fully-qualified table name for messages at the transaction level is
non-deterministic and so abends at runtime.
Example Templates
The following describes example template configuration values and the resolved values.
| Example Template | Resolved Value |
|---|---|
|
|
|
|
|
|
|
|
|
Parent topic: Setting Up and Running the Kinesis Streams Handler
10.1.1.4 Resolving AWS Credentials
- AWS Kinesis Client Authentication
The Kinesis Handler is a client connection to the AWS Kinesis cloud service. The AWS cloud must be able to successfully authenticate the AWS client in order in order to successfully interface with Kinesis.
Parent topic: Setting Up and Running the Kinesis Streams Handler
10.1.1.4.1 AWS Kinesis Client Authentication
The Kinesis Handler is a client connection to the AWS Kinesis cloud service. The AWS cloud must be able to successfully authenticate the AWS client in order in order to successfully interface with Kinesis.
The AWS client authentication has become increasingly complicated as more authentication options have been added to the Kinesis Stream Handler. This topic explores the different use cases for AWS client authentication.
- Explicit Configuration of the Client ID and Secret
A client ID and secret are generally the required credentials for the Kinesis Handler to interact with Amazon Kinesis. A client ID and secret are generated using the Amazon AWS website. - Use of the AWS Default Credentials Provider Chain
If thegg.eventhandler.name.accessKeyIdandgg.eventhandler.name.secretKeyare unset, then credentials resolution reverts to the AWS default credentials provider chain. The AWS default credentials provider chain provides various ways by which the AWS credentials can be resolved. - AWS Federated Login
The use case is when you have your on-premise system login integrated with AWS. This means that when you log into an on-premise machine, you are also logged into AWS.
Parent topic: Resolving AWS Credentials
10.1.1.4.1.1 Explicit Configuration of the Client ID and Secret
A client ID and secret are generally the required credentials for the Kinesis Handler to interact with Amazon Kinesis. A client ID and secret are generated using the Amazon AWS website.
gg.handler.name.accessKeyId=
gg.handler.name.secretKey=Furthermore, the Oracle Wallet functionality can be used to encrypt these credentials.
Parent topic: AWS Kinesis Client Authentication
10.1.1.4.1.2 Use of the AWS Default Credentials Provider Chain
If the gg.eventhandler.name.accessKeyId and
gg.eventhandler.name.secretKey are unset, then
credentials resolution reverts to the AWS default credentials provider
chain. The AWS default credentials provider chain provides various ways by
which the AWS credentials can be resolved.
When Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) runs on an AWS Elastic Compute Cloud (EC2) instance, the general use case is to resolve the credentials from the EC2 metadata service. The AWS default credentials provider chain provides resolution of credentials from the EC2 metadata service as one of the options.
Parent topic: AWS Kinesis Client Authentication
10.1.1.4.1.3 AWS Federated Login
The use case is when you have your on-premise system login integrated with AWS. This means that when you log into an on-premise machine, you are also logged into AWS.
- You may not want to generate client IDs and secrets. (Some users disable this feature in the AWS portal).
- The client AWS applications need to interact with the AWS Security Token Service (STS) to obtain an authentication token for programmatic calls made to Kinesis.
gg.eventhandler.name.enableSTS=true.
Parent topic: AWS Kinesis Client Authentication
10.1.1.5 Configuring the Proxy Server for Kinesis Streams Handler
Oracle GoldenGate can be used with a proxy server using the following parameters to enable the proxy server:
gg.handler.name.proxyServer= gg.handler.name.proxyPort=80 gg.handler.name.proxyUsername=username gg.handler.name.proxyPassword=password
Sample configurations:
gg.handlerlist=kinesis
gg.handler.kinesis.type=kinesis_streams
gg.handler.kinesis.mode=op
gg.handler.kinesis.format=json
gg.handler.kinesis.region=us-west-2
gg.handler.kinesis.partitionMappingTemplate=TestPartitionName
gg.handler.kinesis.streamMappingTemplate=TestStream
gg.handler.kinesis.deferFlushAtTxCommit=true
gg.handler.kinesis.deferFlushOpCount=1000
gg.handler.kinesis.formatPerOp=true
#gg.handler.kinesis.customMessageGrouper=oracle.goldengate.handler.kinesis.KinesisJsonTxMessageGrouper
gg.handler.kinesis.proxyServer=www-proxy.myhost.com
gg.handler.kinesis.proxyPort=80Parent topic: Setting Up and Running the Kinesis Streams Handler
10.1.1.6 Configuring Security in Kinesis Streams Handler
The Amazon Web Services (AWS) Kinesis Java SDK uses HTTPS to communicate with Kinesis. Mutual authentication is enabled. The AWS server passes a Certificate Authority (CA) signed certificate to the AWS client which allow the client to authenticate the server. The AWS client passes credentials (client ID and secret) to the AWS server which allows the server to authenticate the client.
Parent topic: Setting Up and Running the Kinesis Streams Handler