8.2.19 Google Cloud Storage

Topics:

8.2.19.1 Overview

Google Cloud Storage (GCS) is a service for storing objects in Google Cloud Platform.

You can use the GCS Event handler to load files generated by the File Writer handler into GCS.

8.2.19.2 Prerequisites

Ensure to have the following set up:
  • Google Cloud Platform (GCP) account set up.
  • Google service account key with the relevant permissions.
  • GCS Java Software Developement Kit (SDK)

8.2.19.3 Buckets and Objects

Buckets are the basic containers in GCS that store data (objects).
Objects are the individual pieces of data that you store in the Cloud Storage bucket.

8.2.19.4 Authentication and Authorization

A Google Cloud Platform (GCP) service account is a special kind of account used by an application, not by a person. Oracle GoldenGate for BigData uses a service account key for accessing GCS service.

You need to create a service account key with the relevant Identity and Access Management (IAM) permissions.

Use the JSON key type to generate the service account key file.

You can either set the path to the service account key file in the environment variable GOOGLE_APPLICATION_CREDENTIALS or in the GCS Event handler property gg.eventhandler.name.credentialsFile. You can also specify the individual keys of credentials file like clientId, clientEmail, privateKeyId and privateKey into corresponding handler properties instead of specifying the credentials file path directly. This enables the credential keys to be encrypted using Oracle wallet.

For more information about creating a service account key, see GCP documentation.

The following are the IAM permissions to be added into the service account used to run GCS Event handler.

8.2.19.4.1 Bucket Permissions

Table 8-26 Bucket Permissions

Bucket Permission Name Description
storage.buckets.create Create new buckets in a project.
storage.buckets.delete Delete buckets.
storage.buckets.get Read bucket metadata, excluding IAM policies.
storage.buckets.list List buckets in a project. Also read bucket metadata, excluding IAM policies, when listing.
storage.buckets.update Update bucket metadata, excluding IAM policies.

8.2.19.4.2 Object Permissions

Table 8-27 Object Permissions

Object Permission Name Description
storage.objects.create Add new objects to a bucket.
storage.objects.delete Delete objects.
storage.objects.get Read object data and metadata, excluding ACLs.
storage.objects.list List objects in a bucket. Also read object metadata, excluding ACLs, when listing.
storage.objects.update Update object metadata, excluding ACLs.

8.2.19.5 Configuration

Table 8-28 Object Permissions

Properties Required/Optional Legal Values Default Explanation
gg.eventhandler.name.type Required gcs None Selects the GCS Event Handler for use with File Writer handler.
gg.eventhandler.name.location Optional A valid GCS location. None If the GCS bucket does not exist, a new bucket will be created in this GCS location. If location is not specified, new bucket creation will fail. GCS location reference:GCS locations.
gg.eventhandler.name.bucketMappingTemplate Required A string with resolvable keywords and constants used to dynamically generate a GCS bucket name. None A GCS bucket is created by the GCS Event handler if it does not exist using this name. See Bucket Naming Guidelines.. For more information about supported keywords, see Template Keywords .
gg.eventhandler.name.pathMappingTemplate Required A string with resolvable keywords and constants used to dynamically generate the path in the GCS bucket to write the file. None Use keywords interlaced with constants to dynamically generate a unique GCS path names at runtime. Example path name: ogg/data/${groupName}/${fullyQualifiedTableName}. For more information about supported keywords, see Template Keywords .
gg.eventhandler.name.fileNameMappingTemplate Optional A string with resolvable keywords and constants used to dynamically generate a file name for the GCS object. None Use resolvable keywords and constants used to dynamically generate the GCS object file name. If not set, the upstream file name is used. For more information about supported keywords, see Template Keywords
gg.eventhandler.name.finalizeAction Optional A unique string identifier cross referencing a child event handler. No event handler configured. Sets the downstream event handler that is invoked on the file roll event. A typical example would be use a downstream to load the GCS data into Google BigQuery using the BigQuery Event handler.
gg.eventhandler.name.credentialsFile Optional Relative or absolute path to the service account key file. Noe Sets the path to the service account key file. Alternatively, if the environment variable GOOGLE_APPLICATION_CREDENTIALS is set to the path to the service account key file, then you need not set this parameter.
gg.eventhandler.name.storageClass Optional STANDARD|NEARLINE |COLDLINE|ARCHIVE| REGIONAL|MULTI_REGIONAL| DURABLE_REDUCED_AVAILABILITY None The storage class you set for an object affects the object’s availability and pricing model. If this property is not set, then the storage class for the file is set to the default storage class for the respective bucket. If the bucket does not exist and storage class is specified, then a new bucket is created with this storage class as its default.
gg.eventhandler.name.kmsKey Optional Key names in the format: projects/<PROJECT>/locations/<LOCATION>/keyRings/<RING_NAME>/cryptoKeys/<KEY_NAME>. <PROJECT>: Google project-id. <LOCATION>: Location of the GCS bucket. <RING_NAME>: Google Cloud KMS key ring name. <KEY_NAME>: Google Cloud KMS key name. None Google Cloud Storage always encrypts your data on the server side, before it is written to disk using Google-managed encryption keys. As an additional layer of security, customers may choose to use keys generated by Google Cloud Key Management Service (KMS). This property can be used to set a customer managed Cloud KMS key to encrypt GCS objects. When using customer managed keys, the gg.eventhandler.name.concurrency property cannot be set to a value greater than one because with customer managed keys GCP does not allow multi-part uploads using object composition.
gg.eventhandler.name.concurrency Optional Any number in the range 1 to 32. 10 If concurrency is set to a value greater than one, then the GCS Event handler performs multi-part uploads using composition. The multi-part uploads spawn concurrent threads to upload each part. The individual parts are uploaded to the following directory <bucketMappingTemplate>/oggtmp. This directory is reserved for use by Oracle GoldenGate for Big Data. This provides better throughput rates for uploading large files. Multi-part uploads are used for files with size greater than 10 mega bytes.
gg.eventhandler.gcs.clientId Optional Valid Big Query Credentials Client Id NA Provides the client ID key from the credentials file for connecting to Google Big Query service account.
gg.eventhandler.gcs.clientEmail Optional Valid Big Query Credentials Client Email NA Provides the client Email key from the credentials file for connecting to Google Big Query service account.
gg.eventhandler.gcs.privateKeyId Optional Valid Big Query Credentials Client Email NA Provides the client Email key from the credentials file for connecting to Google Big Query service account.
gg.eventhandler.gcs.privateKey Optional Valid Big Query Credentials Private Key. NA Provides the Private Key from the credentials file for connecting to Google Big Query service account.
gg.eventhandler.name.projectId Optional The Google project-id | project-id associated with the service account. NA Sets the project-id of the Google Cloud project that houses the storage bucket. Auto configure will automatically configure this property by accessing the service account key file unless user wants to override this explicitly.
gg.eventhandler.name.url Optional A legal URL to connect to Google Cloud Storage including scheme, server name and port (if not the default port). The default is https://storage.googleapis.com. https://storage.googleapis.com Allows the user to set a URL for a private endpoint to connect to GCS.  

Note:

To be able to connect GCS to the Google Cloud Service account, ensure that either of the following is configured: the credentials file property with the relative or absolute path to credentials JSON file or the properties for individual credentials keys. The configuration property to individually add google service account credential key enables them to encrypt using the Oracle wallet.

8.2.19.5.1 Classpath Configuration

The GCS Event handler uses the Java SDK for Google Cloud Storage. The classpath must include the path to the GCS SDK.

8.2.19.5.1.1 Dependencies
You can download the SDK using the following maven co-ordinates:
<dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud-storage</artifactId>
        <version>1.113.9</version>
    </dependency>

Alternatively, you can download the GCS dependencies by running the script: <OGGDIR>/DependencyDownloader/gcs.sh.

Edit the gg.classpath configuration parameter to include the path to the GCS SDK.

8.2.19.5.2 Proxy Configuration

When the Replicat process runs behind a proxy server, you can use the jvm.bootoptions property to set proxy server configuration. For Example:
jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com
-Dhttps.proxyPort=80

8.2.19.5.3 Sample Configuration

#The GCS Event handler
gg.eventhandler.gcs.type=gcs
gg.eventhandler.gcs.pathMappingTemplate=${fullyQualifiedTableName}
#TODO: Edit the GCS bucket name
gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name>
#TODO: Edit the GCS credentialsFile
gg.eventhandler.gcs.credentialsFile=/path/to/gcs/credentials-file
gg.eventhandler.gcs.finalizeAction=none
gg.classpath=/path/to/gcs-deps/*
jvm.bootoptions=-Xmx8g -Xms8g