Realtime Data Ingestion into GCP BigQuery with Oracle GoldenGate for Distributed Applications and Analytics

8.13 Realtime Data Ingestion into GCP BigQuery with Oracle GoldenGate for Distributed Applications and Analytics

Overview

This Quickstart covers a step-by-step process showing how to ingest real-time data into GCP BigQuery with Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA).

GCP BigQuery is a cloud-based data warehousing platform that provides a fully managed service for storing, processing, and analysing data.

GG for DAA supports stage and merge and streaming api designs. In stage and merge, the change data is staged in a temporary location in microbatches and eventually merged into to the target table. GCS is used as the staging location.

All replication process is automatically handled by GoldenGate for Distributed Applications and Analytics (GG for DAA) GCP BigQuery Stage and Merge Handler.

Parent topic: Quickstarts

8.13.1 Prerequisites for Google Cloud Platform BigQuery Stage and Merge

Google Cloud Platform (GCP) account set up.
A Google Cloud Platform (GCP) service account key with relevant BigQuery Permissions. Copy your GCP service account key to a directory on your Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) Server.
A Google Cloud Storage bucket with relevant permissions. Ensure that the GCS bucket and the BigQuery dataset exist in the same location or region.
Target BigQuery tables can be created before configuring the replicat. If necessary permissions are provided, then GG for DAA can auto create the target BigQuery tables.

In this quick start, we will use a sample trail file (named tr) which is shipped with GG for DAA. If you want to continue with sample trail file, it is located at GG_HOME/opt/AdapterExamples/trail/ in your GG for DAA instance.

GG for DAA will create the tables automatically.

Parent topic: Realtime Data Ingestion into GCP BigQuery with Oracle GoldenGate for Distributed Applications and Analytics

8.13.2 Install Dependency Files

Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA) uses Java SDK provided by Google. You can download the SDKs using Dependency Downloader utility shipped with GG for DAA. Dependency downloader is a set of shell scripts that downloads dependency jar files from Maven and other repositories.

In your GG for DAA VM, go to dependency downloader utility. It is located at GG_HOME/opt/DependencyDownloader/.
Run gcs.sh and bigquery.sh with the required versions.

Figure 8-91 Run gcs.sh and bigquery.sh with the required versions
2 directories are created in GG_HOME/opt/DependencyDownloader/dependencies. Make a note of the directories:
- /u01/app/ogg/opt/DependencyDownloader/dependencies/bigquery_1.111.1
- /u01/app/ogg/opt/DependencyDownloader/dependencies/gcs_1.113.9

Parent topic: Realtime Data Ingestion into GCP BigQuery with Oracle GoldenGate for Distributed Applications and Analytics

8.13.3 Create a Replicat in Oracle GoldenGate for Distributed Applications and Analytics

To create a replicat in Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA):

In the Administration Service tab, click the + sign to add a replicat.

Figure 8-92 Click + in the Administration Service tab.
Select the Classic Replicat Replicat Type and click Next. There are two different Replicat types available: Classic and Coordinated. Classic Replicat is a single threaded process whereas Coordinated Replicat is a multithreaded one that applies transactions in parallel.

Figure 8-93 Add Replicat
Enter the Replicat options, and click Next:
1. Trail Name: Name of the required trail file (if using sample trail, provide as tr)
2. Subdirectory: Provide as GG_HOME/opt/AdapterExamples/trail/ if using the sample trail.
3. Target: Google BigQuery
Enable Stage and Merge and select Google Cloud Storage as Available staging location.

Figure 8-94 Replicat Options
Leave Managed Options as is and click Next.

Figure 8-95 Managed Options
Enter Parameter File details and click Next. In the Parameter File, you can specify source to target mapping. If you are using the sample trail file (tr), then enter as follows: MAP QASOURCE.*, TARGET <your_schema_name>.*;
If Coordinated Replicat is selected as the Replicat Type, an additional parameter needs to be provided: TARGETDB LIBFILE libggjava.so SET property=<ggbd-deployment_home>/etc/conf/ogg/your_replicat_name.properties

Figure 8-96 Parameter File
In the next screen, you need to update the properties only tagged as TODO and click Create and Run.
Provide your GCS bucket name:
```
#TODO: Edit the GCS bucket name
gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name>
```
Provide path to your GCP service account key:
```
#TODO: Edit the GCS credentialsFile
gg.eventhandler.gcs.credentialsFile=/path/to/gcp/credentialsFile
```
Provide path to dependency jar files that you downloaded in prerequisites:
```
#TODO: Edit to include the GCS Java SDK and BQ Java SDK.
gg.classpath=/path/to/gcs_dependencies/*:/path/to/bq_dependencies/*		
```
Figure 8-97 Properties File
For more information about replication configuration, see Google BigQuery Stage and Merge.
If replicat starts successfully, it will be in running state. You can go to action/details/statistics to see the replication statistics.

Figure 8-98 Replication Statistics

Figure 8-99 GCPBQ Statistics
You can go to GCP Big Query console and check the tables. It may take a short while for tables to be created and loaded.

Figure 8-100 GCPBQ Console

Note:

You can run an initial load with BigQuery replicat. For more information see BigQuery Handler.

Parent topic: Realtime Data Ingestion into GCP BigQuery with Oracle GoldenGate for Distributed Applications and Analytics