7.3 Google Cloud Storage Replication

Google Cloud Storage (GCS) is a service for storing objects in Google Cloud Platform.

You can use GoldenGate for Big Data to ingest different file formats into GCS. Oracle GoldenGate for Big Data supports the following file formats:

  • delimited-text.json
  • json
  • json_row
  • json_op
  • avro_row
  • avro_op
  • avro_row_ocf
  • avro_op_ocf
  • parquet
Oracle GoldenGate for Big Data uses a two-step process in GCS replication. First, it creates the files locally in a directory on the server by using the File Writer Handler and then loads these files into GCS.

Ensure that the files are in a closed state to load them to GCS. For more information about how to control the File Writer behaviour, see theFile Writer Behaviour blog.

This quick start will load using the default settings.

7.3.1 Install Dependency Files

Oracle GoldenGate for Big Data uses client libraries in the replication process. You need to download these libraries by using the Dependency Downloader utility available in Oracle GoldenGate for Big Data before setting up the replication process. Dependency downloader is a set of shell scripts that downloads dependency jar files from Maven and other repositories.

To install the required dependency files:

  1. Go to installation location of Dependency Downloader: GG_HOME/opt/DependencyDownloader/.
  2. Execute gcs.sh and bigquery.sh with the required versions
  3. Execute gcs.sh and bigquery.sh with the required version.

    Figure 7-10 Execute gcs.sh and bigquery.sh with the required versions

    Execute gcs.sh and bigquery.sh with the required versions
    A directory is created in GG_HOME/opt/DependencyDownloader/dependencies. For example, /u01/app/ogg/opt/DependencyDownloader/dependencies/gcs_1.113.9

7.3.2 Create a Replicat in Oracle GoldenGate for Big Data

To create a replicat in Oracle GoldenGate for Big Data:

  1. In the Oracle GoldenGate for Big Data UI, in the Administration Service tab, click the + sign to add a replicat.

    Figure 7-11 Click + in the Administration Service tab.

    Figure 7-12 Click + sign to add a replicat

    Click + sign in the Administration tab to add a replicat
  2. Select the Replicat Type and click Next.

    There are two different Replicat types here: Classic and Coordinated. Classic Replicat is a single-threaded process whereas Coordinated Replicat is a multithreaded one that applies transactions in parallel.

    Figure 7-13 Select the Replicat Type and click Next.

    Select the Replicat Type and click Next.
  3. Enter the basic information, and click Next:
    1. Process Name: Name of the Replicat
    2. Trail Name: Name of the required trail file. You can use the sample trail file tr which is shipped with Oracle GoldenGate for Big Data.
    3. Trail Subdirectory: Sets the path to trail file. Sample trail file tr is located at OGG_HOME/opt/AdapterExamples/trail.
    4. Target: Google Cloud Storage

      Figure 7-14 Process Name, Trail Name, and Target Names

      Enter all the information and click Next.
  4. Enter Parameter File details and click Next. In the Parameter File, you can either specify source to target mapping or leave it as-is with a wildcard selection. If Co-ordinated Replicat is selected as the Replicat Type, then you need to provide an additional parameter: TARGETDB LIBFILE libggjava.so SET property=<ggbd-deployment_home>/etc/conf/ogg/your_replicat_name.properties

    Figure 7-15 Provide Parameter File details and click Next.

    Provide Parameter File details and click Next.
  5. In the next screen, update the properties only tagged as TODO. They are as follows:

    Provide your GCS bucket name:
    #TODO: Edit the GCS bucket name
    gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name>
    Provide path to your GCP service account key:
    #TODO: Edit the GCS credentialsFile
    gg.eventhandler.gcs.credentialsFile=/path/to/gcp/credentialsFile
    Provide path to dependency jar files that you downloaded in prerequisites:
    #TODO: Edit to include the GCS Java SDK and BQ Java SDK.
    gg.classpath=/path/to/gcs-deps/*:/path/to/bq-deps/*
    

    Without these properties, your replicat will fail. There are also some optional properties that you can modify:

    gg.handler.filewriter.formatcontrols the format of the output files. By default, it is set to avro_row_ocf. You can change into json, delimitedtext or one of the other Configuring the File Writer Handler.

    gg.handler.filewriter.fileRollInterval and gg.handler.filewriter.inactivityRollInterval controls the file behaviour. A file should be in a closed state to be loaded into GCS buckets.

    fileRollInterval starts a timer when file is created and when it is reached, file will be moved to a closed state and moved to GCS bucket. In replicat properties, it is set to 0 which means that it is off. You can set it to 5s(5 seconds) for this quick start.

    inactivityRollIntervaltracks the inactivity period. Here, inactivity means there are no operations coming from the source system. You can set it to 5s (5 seconds) for this quick start.

    Figure 7-16 Add Replicat

    Add Replicat.
  6. If replicat starts successfully, then it will be in running state. Go to action/details/statistics to see the replication statistics:

    Figure 7-17 Replication Statistics

    Replication Statistics
  7. Go to GCP Cloud Storage bucket and check the table.

    Figure 7-18 Bucket Details

    Bucket Details