7.2 Google Cloud Platform Big Query Stage and Merge Replication

BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time.

The BigQuery Event handler uses the stage and merge data flow.

The change data is staged in a temporary location in microbatches and eventually merged into to the target table. Google Cloud Storage (GCS) is used as the staging area for change data. GoldenGate for Big Data loads files generated by the File Writer Handler into Google Cloud Storage and runs BigQuery Query jobs to execute MERGE SQL. The SQL operations are performed in batches providing better throughput.

These processes are automatically handled by the Oracle GoldenGate for Big Data Big Query replicat process. See Google BigQuery Stage and Merge

This topic covers a step-by-step process on how to configure and run a replicat targeting GCP Big Query.

7.2.1 Prerequisites: Google Cloud Platform BigQuery Stage and Merge Replicat And Google Cloud Storage Replication

Apart from the prerequisites listed in the above section, the following are the prerequisites specific to Google Cloud Platform Big Query Stage and Merge Replication and Google Cloud Storage Replication Quickstarts.

  • Google Cloud Platform (GCP) account set up.
  • A Google Cloud Platform (GCP) service account key with relevant permissions. Copy your GCP service account key to a directory on your GoldenGate for Big Data Server.
  • A Google Cloud Storage bucket with relevant BigQuery Permissions (Google Cloud Platform BigQuery Stage and Merge Replicat) In case of Google Cloud Storage Replication, a Google Cloud Storage bucket with relevant Bucket Permissions must be set. Ensure that the GCS bucket and the BigQuery dataset exist in the same location/region.
  • Target BigQuery tables can be created before configuring the replicat. If necessary permissions are provided, then Oracle GoldenGate for Big Data can auto create the target BigQuery tables.

7.2.2 Install Dependency Files

Oracle GoldenGate for Big Data uses client libraries in the replication process. You need to download these libraries by using the Dependency Downloader utility available in Oracle GoldenGate for Big Data before setting up the replication process. Dependency downloader is a set of shell scripts that downloads dependency jar files from Maven and other repositories.

To install the required dependency files:

  1. Go to installation location of Dependency Downloader: GG_HOME/opt/DependencyDownloader/.
  2. Execute gcs.sh and bigquery.sh with the required versions
  3. Execute gcs.sh and bigquery.sh with the required version.

    Figure 7-1 Execute gcs.sh and bigquery.sh with the required versions

    Execute gcs.sh and bigquery.sh with the required versions
    2 directories are created in GG_HOME/opt/DependencyDownloader/dependencies. For example, /u01/app/ogg/opt/DependencyDownloader/dependencies/bigquery_1.111.10 and /u01/app/ogg/opt/DependencyDownloader/dependencies/gcs_1.113.9

7.2.3 Create a Replicat in Oracle GoldenGate for Big Data

To create a replicat in Oracle GoldenGate for Big Data:

  1. In the Oracle GoldenGate for Big Data UI, in the Administration Service tab, click the + sign to add a replicat.

    Figure 7-2 Click + sign to add a replicat

    Click + sign in the Administration tab to add a replicat
  2. Select the Replicat Type and click Next.

    There are two different Replicat types here: Classic and Coordinated. Classic Replicat is a single threaded process whereas Coordinated Replicat is a multithreaded one that applies transactions in parallel.

    Figure 7-3 Select the Replicat Type and click Next.

    Select the Replicat Type and click Next.
  3. Enter the basic information, and click Next:
    1. Process Name: Name of the Replicat
    2. Trail Name: Name of the required trail file
    3. Target: Google BigQuery Stage and Merge

      Figure 7-4 Process Name, Trail Name, and Target Names

      Enter the basic information and click Next.
  4. Enter Parameter File details and click Next. In the Parameter File, you can either specify source to target mapping or leave it as-is with a wildcard selection. If Co-ordinated Replicat is selected as the Replicat Type, then you need to provide an additional parameter: TARGETDB LIBFILE libggjava.so SET property=<ggbd-deployment_home>/etc/conf/ogg/your_replicat_name.properties

    Oracle GoldenGate for Big Data can be used to replicate into multiple GCP projects with the same replicat. For more information, see BigQuery Dataset and GCP ProjectId Mapping.

    Oracle GoldenGate for Big Data maps the table schema name to the BigQuery dataset. The table catalog name is mapped to the GCP projectId.

    Figure 7-5 Provide Parameter File details and click Next.

    Provide Parameter File details and click Next.
  5. In the next screen, update the properties only tagged as TODO. They are as follows:

    Provide your GCS bucket name:
    #TODO: Edit the GCS bucket name
    gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name>
    Provide path to your GCP service account key:
    #TODO: Edit the GCS credentialsFile
    gg.eventhandler.gcs.credentialsFile=/path/to/gcp/credentialsFile
    Provide path to dependency jar files that you downloaded in prerequisites:
    #TODO: Edit to include the GCS Java SDK and BQ Java SDK.
    gg.classpath=/path/to/gcs-deps/*:/path/to/bq-deps/*
    

    For more information, see Google BigQuery Stage and Merge

    Figure 7-6 Update the properties tagged as “TODO”.

    Update the properties tagged as TODO.
  6. If replicat starts successfully, then it will be in running state. Go to action/details/statistics to see the replication statistics:

    Figure 7-7 Replication Statistics

    Replication Statistics

    Figure 7-8 Replication Statistics Table

    Replication Statistics
  7. Go to GCP Big Query console and check the tables. It may take a short while for tables to be created and loaded.

    Figure 7-9 Query Results

    Query Results

Note:

  • You can run an initial load with Big Query replicat. See INSERTALLRECORDS Support.
  • DDL replication is not supported, GGBD can only auto-create target GCP Big Query tables. In case of DDL, replicat will abend.