8.2.5 Amazon S3

Learn how to use the S3 Event Handler, which provides the interface to Amazon S3 web services.

8.2.5.1 Overview

Amazon S3 is object storage hosted in the Amazon cloud. The purpose of the S3 Event Handler is to load data files generated by the File Writer Handler into Amazon S3, see https://aws.amazon.com/s3/.

You can use any format that the File Writer Handler, see Flat Files.

8.2.5.2 Detailing Functionality

The S3 Event Handler requires the Amazon Web Services (AWS) Java SDK to transfer files to S3 object storage. Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) does not include the AWS Java SDK. You have to download and install the AWS Java SDK from:

https://aws.amazon.com/sdk-for-java/

Then you have to configure the gg.classpath variable to include the JAR files in the AWS Java SDK and are divided into two directories. Both directories must be in gg.classpath, for example:

gg.classpath=/usr/var/aws-java-sdk-1.11.240/lib/*:/usr/var/aws-java-sdk-1.11.240/third-party/lib/

8.2.5.2.1 Resolving AWS Credentials

8.2.5.2.1.1 Amazon Web Services Simple Storage Service Client Authentication

The S3 Event Handler is a client connection to the Amazon Web Services (AWS) Simple Storage Service (S3) cloud service. The AWS cloud must be able to successfully authenticate the AWS client in order in order to successfully interface with S3.

The AWS client authentication has become increasingly complicated as more authentication options have been added to the S3 Event Handler. This topic explores the different use cases for AWS client authentication.
8.2.5.2.1.1.1 Explicit Configuration of the Client ID and Secret

A client ID and secret are generally the required credentials for the S3 Event Handler to interact with Amazon S3. A client ID and secret are generated using the Amazon AWS website.

These credentials can be explicitly configured in the Java Adapter Properties file as follows:
gg.eventhandler.name.accessKeyId=
gg.eventhandler.name.secretKey=

Furthermore, the Oracle Wallet functionality can be used to encrypt these credentials.

8.2.5.2.1.1.2 Use of the AWS Default Credentials Provider Chain

If the gg.eventhandler.name.accessKeyId and gg.eventhandler.name.secretKey are unset, then credentials resolution reverts to the AWS default credentials provider chain. The AWS default credentials provider chain provides various ways by which the AWS credentials can be resolved.

For more information about the default credential provider chain and order of operations for AWS credentials resolution, see Working with AWS Credentials.

When Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) runs on an AWS Elastic Compute Cloud (EC2) instance, the general use case is to resolve the credentials from the EC2 metadata service. The AWS default credentials provider chain provides resolution of credentials from the EC2 metadata service as one of the options.

8.2.5.2.1.1.3 AWS Federated Login

The use case is when you have your on-premise system login integrated with AWS. This means that when you log into an on-premise machine, you are also logged into AWS.

In this use case:
  • You may not want to generate client IDs and secrets. (Some users disable this feature in the AWS portal).
  • The client AWS applications need to interact with the AWS Security Token Service (STS) to obtain an authentication token for programmatic calls made to S3.
This feature is enabled by setting the following: gg.eventhandler.name.enableSTS=true.

8.2.5.2.2 About the AWS S3 Buckets

AWS divides S3 storage into separate file systems called buckets. The S3 Event Handler can write to pre-created buckets. Alternatively, if the S3 bucket does not exist, the S3 Event Handler attempts to create the specified S3 bucket. AWS requires that S3 bucket names are lowercase. Amazon S3 bucket names must be globally unique. If you attempt to create an S3 bucket that already exists in any Amazon account, it causes the S3 Event Handler to abend.

8.2.5.2.3 Troubleshooting

Connectivity Issues

If the S3 Event Handler is unable to connect to the S3 object storage when running on premise, it’s likely your connectivity to the public internet is protected by a proxy server. Proxy servers act a gateway between the private network of a company and the public internet. Contact your network administrator to get the URLs of your proxy server.

Oracle GoldenGate can be used with a proxy server using the following parameters to enable the proxy server:

gg.handler.name.proxyServer= 
gg.handler.name.proxyPort=80
gg.handler.name.proxyUsername=username
gg.handler.name.proxyPassword=password

Sample configuration:

gg.eventhandler.s3.type=s3
gg.eventhandler.s3.region=us-west-2
gg.eventhandler.s3.proxyServer=www-proxy.us.oracle.com
gg.eventhandler.s3.proxyPort=80
gg.eventhandler.s3.proxyProtocol=HTTP
gg.eventhandler.s3.bucketMappingTemplate=yourbucketname
gg.eventhandler.s3.pathMappingTemplate=thepath
gg.eventhandler.s3.finalizeAction=none

8.2.5.3 Configuring the S3 Event Handler

You can configure the S3 Event Handler operation using the properties file. These properties are located in the Java Adapter properties file (not in the Replicat properties file).

To enable the selection of the S3 Event Handler, you must first configure the handler type by specifying gg.eventhandler.name.type=s3 and the other S3 Event properties as follows:

Table 8-6 S3 Event Handler Configuration Properties

Properties Required/ Optional Legal Values Default Explanation

gg.eventhandler.name.type

Required

s3

None

Selects the S3 Event Handler for use with Replicat.

gg.eventhandler.name.region

Required

The AWS region name that is hosting your S3 instance.

None

Setting the legal AWS region name is required.

gg.eventhandler.name.cannedACL Optional Accepts one of the following values:
  • private
  • public-read
  • public-read-write
  • aws-exec-read
  • authenticated-read
  • bucket-owner-read
  • bucket-owner-full-control
  • log-delivery-write
None Amazon S3 supports a set of predefined grants, known as canned Access Control Lists. Each canned ACL has a predefined set of grantees and permissions. For more information, see Managing access with ACLs

gg.eventhandler.name.proxyServer

Optional

The host name of your proxy server.

None

Sets the host name of your proxy server if connectivity to AWS is required use a proxy server.

gg.eventhandler.name.proxyPort

Optional

The port number of the proxy server.

None

Sets the port number of the proxy server if connectivity to AWS is required use a proxy server.

gg.eventhandler.name.proxyUsername

Optional

The username of the proxy server.

None

Sets the user name of the proxy server if connectivity to AWS is required use a proxy server and the proxy server requires credentials.

gg.eventhandler.name.proxyPassword

Optional

The password of the proxy server.

None

Sets the password for the user name of the proxy server if connectivity to AWS is required use a proxy server and the proxy server requires credentials.

gg.eventhandler.name.bucketMappingTemplate

Required

A string with resolvable keywords and constants used to dynamically generate the path in the S3 bucket to write the file.

None

Use resolvable keywords and constants used to dynamically generate the S3 bucket name at runtime. The handler attempts to create the S3 bucket if it does not exist. AWS requires bucket names to be all lowercase. A bucket name with uppercase characters results in a runtime exception. See Template Keywords.

gg.eventhandler.name.pathMappingTemplate

Required

A string with resolvable keywords and constants used to dynamically generate the path in the S3 bucket to write the file.

None

Use keywords interlaced with constants to dynamically generate unique S3 path names at runtime. Typically, path names follow the format, ogg/data/${groupName}/${fullyQualifiedTableName} In S3, the convention is not to begin the path with the backslash (/) because it results in a root directory of “”. See Template Keywords.

gg.eventhandler.name.fileNameMappingTemplate

Optional

A string with resolvable keywords and constants used to dynamically generate the S3 file name at runtime.

None

Use resolvable keywords and constants used to dynamically generate the S3 data file name at runtime. If not set, the upstream file name is used. See Template Keywords.

gg.eventhandler.name.finalizeAction

Optional

none | delete

None

Set to none to leave the S3 data file in place on the finalize action. Set to delete if you want to delete the S3 data file with the finalize action.

gg.eventhandler.name.eventHandler

Optional

A unique string identifier cross referencing a child event handler.

No event handler configured.

Sets the event handler that is invoked on the file roll event. Event handlers can do file roll event actions like loading files to S3, converting to Parquet or ORC format, or loading files to HDFS.

gg.eventhandler.name.url

Optional (unless Dell ECS, then required)

A legal URL to connect to cloud storage.

None

Not required for Amazon AWS S3. Required for Dell ECS. Sets the URL to connect to cloud storage.

gg.eventhandler.name.proxyProtocol

Optional

HTTP | HTTPS

HTTP

Sets the proxy protocol connection to the proxy server for additional level of security. The client first performs an SSL handshake with the proxy server, and then an SSL handshake with Amazon AWS. This feature was added into the Amazon SDK in version 1.11.396 so you must use at least that version to use this property.

gg.eventhandler.name.SSEAlgorithm

Optional

AES256 | aws:kms

Empty

Set only if you are enabling S3 server side encryption. Use the parameters to set the algorithm for server side encryption in S3.

gg.eventhandler.name.AWSKmsKeyId

Optional

A legal AWS key management system server side management key or the alias that represents that key.

Empty

Set only if you are enabling S3 server side encryption and the S3 algorithm is aws:kms. This is either the encryption key or the encryption alias that you set in the AWS Identity and Access Management web page. Aliases are prepended with alias/.

gg.eventhandler.name.enableSTS

Optional

true | false

false

Set totrue, to enable the S3 Event Handler to access S3 credentials from the AWS Security Token Service. The AWS Security Token Service must be enabled if you set this property to true.

gg.eventhandler.name.STSAssumeRole Optional AWS user and role in the following format: {user arn}:role/{role name} None Set configuration if you want to assume a different user/role. Only valid with STS enabled.
gg.eventhandler.name.STSAssumeRoleSessionName Optional Any string. AssumeRoleSession1 The assumed role requires a session name for session logging. However this can be any value. Only valid if both gg.eventhandler.name.enableSTS=true and gg.eventhandler.name.STSAssumeRole are configured.
gg.eventhandler.name.STSRegion

Optional

Any legal AWS region specifier.

The region is obtained from the gg.eventhandler.name.region property.

Use to resolve the region for the STS call. It's only valid if the gg.eventhandler.name.enableSTS property is set to true. You can set a different AWS region for resolving credentials from STS than the configured S3 region.

gg.eventhandler.name.enableBucketAdmin

Optional

true | false

true

Set to false to disable checking if S3 buckets exist and automatic creation of buckets, if they do not exist. This feature requires S3 admin privileges on S3 buckets which some customers do not wish to grant.

gg.eventhandler.name.accessKeyId Optional A valid AWS access key. None Set this parameter to explicitly set the access key for AWS. This parameter has no effect if gg.eventhandler.name.enableSTS is set to true. If this property is not set, then the credentials resolution falls back to the AWS default credentials provider chain.
gg.eventhandler.name.secretKey Optional A valid AWS secret key. None Set this parameter to explicitly set the secret key for AWS. This parameter has no effect if gg.eventhandler.name.enableSTS is set to true. If this property is not set, then credentials resolution falls back to the AWS default credentials provider chain.
gg.eventhandler.s3.enableAccelerateMode Optional true | false false Enable/Disable Amazon S3 Transfer Acceleration to transfer files quickly and securely over long distances between your client and an S3 bucket.