11 Using the S3 Event Handler

Learn how to use the S3 Event Handler, which provides the interface to Amazon S3 web services.

11.1 Overview

Amazon S3 is object storage hosted in the Amazon cloud. The purpose of the S3 Event Handler is to load data files generated by the File Writer Handler into Amazon S3, see https://aws.amazon.com/s3/.

You can use any format that the File Writer Handler, see Using the File Writer Handler.

11.2 Detailing Functionality

The S3 Event Handler requires the Amazon Web Services (AWS) Java SDK to transfer files to S3 object storage.Oracle GoldenGate for Big Data does not include the AWS Java SDK. You have to download and install the AWS Java SDK from:

https://aws.amazon.com/sdk-for-java/

Then you have to configure the gg.classpath variable to include the JAR files in the AWS Java SDK and are divided into two directories. Both directories must be in gg.classpath, for example:

gg.classpath=/usr/var/aws-java-sdk-1.11.240/lib/*:/usr/var/aws-java-sdk-1.11.240/third-party/lib/

11.2.1 Configuring the Client ID and Secret

A client ID and secret are the required credentials for the S3 Event Handler to interact with Amazon S3. A client ID and secret are generated using the Amazon AWS website. The retrieval of these credentials and presentation to the S3 server are performed on the client side by the AWS Java SDK. The AWS Java SDK provides multiple ways that the client ID and secret can be resolved at runtime.

The client ID and secret can be set as Java properties, on one line, in the Java Adapter properties file as follows:

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar -Daws.accessKeyId=your_access_key -Daws.secretKey=your_secret_key

This sets environmental variables using the Amazon Elastic Compute Cloud (Amazon EC2) AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY variables on the local machine.

11.2.2 About the AWS S3 Buckets

AWS divides S3 storage into separate file systems called buckets. The S3 Event Handler can write to pre-created buckets. Alternatively, if the S3 bucket does not exist, the S3 Event Handler attempts to create the specified S3 bucket. AWS requires that S3 bucket names are lowercase. Amazon S3 bucket names must be globally unique. If you attempt to create an S3 bucket that already exists in any Amazon account, it causes the S3 Event Handler to abend.

11.2.3 Using Templated Strings

Templated strings can contain a combination of string constants and keywords that are dynamically resolved at runtime. The S3 Event Handler makes extensive use of templated strings to generate the S3 directory names, data file names, and S3 bucket names. This gives you the flexibility to select where to write data files and the names of those data files.

Supported Templated Strings

Keyword Description
${fullyQualifiedTableName}

The fully qualified source table name delimited by a period (.). For example, MYCATALOG.MYSCHEMA.MYTABLE.

${catalogName}

The individual source catalog name. For example, MYCATALOG.

${schemaName}

The individual source schema name. For example, MYSCHEMA.

${tableName}

The individual source table name. For example, MYTABLE.

${groupName}

The name of the Replicat process (with the thread number appended if you’re using coordinated apply).

${emptyString}

Evaluates to an empty string. For example,“”

${operationCount}

The total count of operations in the data file. It must be used either on rename or by the event handlers or it will be zero (0) because nothing is written yet. For example, “1024”.

${insertCount}

The total count of insert operations in the data file. It must be used either on rename or by the event handlers or it will be zero (0) because nothing is written yet. For example, “125”.

${updateCount}

The total count of update operations in the data file. It must be used either on rename or by the event handlers or it will be zero (0) because nothing is written yet. For example, “265”.

${deleteCount}

The total count of delete operations in the data file. It must be used either on rename or by the event handlers or it will be zero (0) because nothing is written yet. For example, “11”.

${truncateCount}

The total count of truncate operations in the data file. It must be used either on rename or by the event handlers or it will be zero (0) because nothing is written yet. For example, “5”.

${currentTimestamp}

The current timestamp. The default output format for the date time is yyyy-MM-dd_HH-mm-ss.SSS. For example, 2017-07-05_04-31-23.123. Alternatively, you can customize the format of the current timestamp by inserting the format inside square brackets like:

${currentTimestamp[MM-dd_HH]}

This format uses the syntax defined in the Java SimpleDateFormat class, see https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html.

${toUpperCase[]}

Converts the contents inside the square brackets to uppercase. For example, ${toUpperCase[${fullyQualifiedTableName}]}.

${toLowerCase[]}

Converts the contents inside the square brackets to lowercase. For example, ${toLowerCase[${fullyQualifiedTableName}]}.

Configuration of template strings can use a mix of keywords and static strings to assemble path and data file names at runtime.

Path Configuration Example
/usr/local/${fullyQualifiedTableName}
Data File Configuration Example
${fullyQualifiedTableName}_${currentTimestamp}_${groupName}.txt

11.2.4 Troubleshooting

Connectivity Issues

If the S3 Event Handler is unable to connect to the S3 object storage when running on premise, it’s likely your connectivity to the public internet is protected by a proxy server. Proxy servers act a gateway between the private network of a company and the public internet. Contact your network administrator to get the URLs of your proxy server, and then setup up a proxy server.

Oracle GoldenGate can be used with a proxy server using the following parameters to enable the proxy server:

  • gg.handler.name.proxyServer=
  • gg.handler.name.proxyPort=80

Access to the proxy servers can be secured using credentials and the following configuration parameters:

  • gg.handler.name.proxyUsername=username
  • gg.handler.name.proxyPassword=password

Sample configuration:

gg.eventhandler.s3.type=s3
gg.eventhandler.s3.region=us-west-2
gg.eventhandler.s3.proxyServer=www-proxy.us.oracle.com
gg.eventhandler.s3.proxyPort=80
gg.eventhandler.s3.proxyProtocol=HTTP
gg.eventhandler.s3.bucketMappingTemplate=yourbucketname
gg.eventhandler.s3.pathMappingTemplate=thepath
gg.eventhandler.s3.finalizeAction=none
goldengate.userexit.writers=javawriter

11.3 Configuring the S3 Event Handler

You can configure the S3 Event Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).

To enable the selection of the S3 Event Handler, you must first configure the handler type by specifying gg.eventhandler.name.type=s3 and the other S3 Event properties as follows:

Table 11-1 S3 Event Handler Configuration Properties

Properties Required/ Optional Legal Values Default Explanation

gg.eventhandler.name.type

Required

s3

None

Selects the S3 Event Handler for use with Replicat.

gg.eventhandler.name.region

Required

The AWS region name that is hosting your S3 instance.

None

Setting the legal AWS region name is required.

gg.eventhandler.name.cannedACL Optional Accepts one of the following values:
  • private
  • public-read
  • public-read-write
  • aws-exec-read
  • authenticated-read
  • bucket-owner-read
  • bucket-owner-full-control
  • log-delivery-write
None Amazon S3 supports a set of predefined grants, known as canned Access Control Lists. Each canned ACL has a predefined set of grantees and permissions. For more information, see Managing access with ACLs

gg.eventhandler.name.proxyServer

Optional

The host name of your proxy server.

None

Sets the host name of your proxy server if connectivity to AWS is required use a proxy server.

gg.eventhandler.name.proxyPort

Optional

The port number of the proxy server.

None

Sets the port number of the proxy server if connectivity to AWS is required use a proxy server.

gg.eventhandler.name.proxyUsername

Optional

The username of the proxy server.

None

Sets the user name of the proxy server if connectivity to AWS is required use a proxy server and the proxy server requires credentials.

gg.eventhandler.name.proxyPassword

Optional

The password of the proxy server.

None

Sets the password for the user name of the proxy server if connectivity to AWS is required use a proxy server and the proxy server requires credentials.

gg.eventhandler.name.bucketMappingTemplate

Required

A string with resolvable keywords and constants used to dynamically generate the path in the S3 bucket to write the file.

None

Use resolvable keywords and constants used to dynamically generate the S3 bucket name at runtime. The handler attempts to create the S3 bucket if it does not exist. AWS requires bucket names to be all lowercase. A bucket name with uppercase characters results in a runtime exception.

gg.eventhandler.name.pathMappingTemplate

Required

A string with resolvable keywords and constants used to dynamically generate the path in the S3 bucket to write the file.

None

Use keywords interlaced with constants to dynamically generate a unique S3 path names at runtime. Typically, path names follow the format, ogg/data/${groupName}/${fullyQualifiedTableName} In S3, the convention is not to begin the path with the backslash (/) because it results in a root directory of “”.

gg.eventhandler.name.fileNameMappingTemplate

Optional

A string with resolvable keywords and constants used to dynamically generate the S3 file name at runtime.

None

Use resolvable keywords and constants used to dynamically generate the S3 data file name at runtime. If not set, the upstream file name is used.

gg.eventhandler.name.finalizeAction

Optional

none | delete

None

Set to none to leave the S3 data file in place on the finalize action. Set to delete if you want to delete the S3 data file with the finalize action.

gg.eventhandler.name.eventHandler

Optional

A unique string identifier cross referencing a child event handler.

No event handler configured.

Sets the event handler that is invoked on the file roll event. Event handlers can do file roll event actions like loading files to S3, converting to Parquet or ORC format, or loading files to HDFS.

gg.eventhandler.name.url

Optional (unless Dell ECS, then required)

A legal URL to connect to cloud storage.

None

Not required for Amazon AWS S3. Required for Dell ECS. Sets the URL to connect to cloud storage.

gg.eventhandler.name.proxyProtocol

Optional

HTTP | HTTPS

HTTP

Sets the proxy protocol connection to the proxy server for additional level of security. The client first performs an SSL handshake with the proxy server, and then an SSL handshake with Amazon AWS. This feature was added into the Amazon SDK in version 1.11.396 so you must use at least that version to use this property.

gg.eventhandler.name.SSEAlgorithm

Optional

AES256 | aws:kms

Empty

Set only if you are enabling S3 server side encryption. Use the parameters to set the algorithm for server side encryption in S3.

gg.eventhandler.name.AWSKmsKeyId

Optional

A legal AWS key management system server side management key or the alias that represents that key.

Empty

Set only if you are enabling S3 server side encryption and the S3 algorithm is aws:kms. This is either the encryption key or the encryption alias that you set in the AWS Identity and Access Management web page. Aliases are prepended with alias/.

gg.eventhandler.name.enableSTS

Optional

true | false

false

Set totrue, to enable the S3 Event Handler to access S3 credentials from the AWS Security Token Service. The AWS Security Token Service must be enabled if you set this property to true.

gg.eventhandler.name.STSAssumeRole Optional AWS user and role in the following format: {user arn}:role/{role name} None Set configuration if you want to assume a different user/role. Only valid with STS enabled.
gg.eventhandler.name.STSAssumeRoleSessionName Optional Any string. AssumeRoleSession1 The assumed role requires a session name for session logging. However this can be any value. Only valid if both gg.eventhandler.name.enableSTS=true and gg.eventhandler.name.STSAssumeRole are configured.
gg.eventhandler.name.STSRegion

Optional

Any legal AWS region specifier.

The region is obtained from the gg.eventhandler.name.region property.

Use to resolve the region for the STS call. It's only valid if the gg.eventhandler.name.enableSTS property is set to true. You can set a different AWS region for resolving credentials from STS than the configured S3 region.

gg.eventhandler.name.enableBucketAdmin

Optional

true | false

true

Set to false to disable checking if S3 buckets exist and automatic creation of buckets, if they do not exist. This feature requires S3 admin privileges on S3 buckets which some customers do not wish to grant.

gg.eventhandler.name.accessKeyId Optional A valid AWS access key. None Set this parameter to explicitly set the access key for AWS. This parameter has no effect if gg.eventhandler.name.enableSTS is set to true. If this property is not set, then the credentials resolution falls back to the AWS default credentials provider chain.
gg.eventhandler.name.secretKey Optional A valid AWS secret key. None Set this parameter to explicitly set the secret key for AWS. This parameter has no effect if gg.eventhandler.name.enableSTS is set to true. If this property is not set, then credentials resolution falls back to the AWS default credentials provider chain.