Google BigQuery

8.2.18 Google BigQuery

Topics:

Using Streaming API
Learn how to use the Google BigQuery Handler, which streams change data capture data from source trail files into Google BigQuery.
Google BigQuery Stage and Merge

Parent topic: Target

8.2.18.1 Using Streaming API

Learn how to use the Google BigQuery Handler, which streams change data capture data from source trail files into Google BigQuery.

BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage, see https://cloud.google.com/bigquery/.

Detailing the Functionality
Setting Up and Running the BigQuery Handler
The Google BigQuery Handler uses the Java BigQuery client libraries to connect to Big Query.
Google BigQuery Dependencies
The Google BigQuery client libraries are required for integration with BigQuery.

Parent topic: Google BigQuery

8.2.18.1.1 Detailing the Functionality

Data Types
Metadata Support
Operation Modes
Operation Processing Support
Proxy Settings
Mapping to Google Datasets
A dataset is contained within a specific Google cloud project. Datasets are top-level containers that are used to organize and control access to your tables and views.

Parent topic: Using Streaming API

8.2.18.1.1.1 Data Types

The BigQuery Handler supports the standard SQL data types and most of these data types are supported by the BigQuery Handler. A data type conversion from the column value in the trail file to the corresponding Java type representing the BigQuery column type in the BigQuery Handler is required.

The following data types are supported:

BIGNUMERIC
BOOLEAN
BYTES
DATE
DATETIME
FLOAT
INTEGER
JSON
NUMERIC
STRING
TIME
TIMESTAMP

The BigQuery Handler does not support complex data types, such as ARRAY and STRUCT.

Parent topic: Detailing the Functionality

8.2.18.1.1.2 Metadata Support

The BigQuery Handler creates tables in BigQuery if the tables do not exist.

The BigQuery Handler alters tables to add columns which exist in the source metadata or configured metacolumns which do not exist in the target metadata. The BigQuery Handler also adds columns dynamically at runtime if it detects a metadata change.

The BigQuery Handler does not drop columns in the BigQuery table which do not exist into the source table definition. BigQuery neither supports dropping existing columns, nor supports changing the data type of existing columns. Once a column is created in BigQuery, it is immutable.

Truncate operations are not supported.

Parent topic: Detailing the Functionality

8.2.18.1.1.3 Operation Modes

You can configure the BigQuery Handler in one of these two modes:

Audit Log Mode = true

gg.handler.name.auditLogMode=true

When the handler is configured to run with audit log mode true, the data is pushed into Google BigQuery without a unique row identification key. As a result, Google BigQuery is not able to merge different operations on the same row. For example, a source row with an insert operation, two update operations, and then a delete operation would show up in BigQuery as four rows, one for each operation.

Also, the order in which the audit log is displayed in the BigQuery data set is not deterministic.

To overcome these limitations, users should specify optype and postion in the meta columns template for the handler. This adds two columns of the same names in the schema for the table in Google BigQuery. For example: gg.handler.bigquery.metaColumnsTemplate = ${optype}, ${position}

The optype is important to determine the operation type for the row in the audit log.

To view the audit log in order of the operations processed in the trail file, specify position which can be used in the ORDER BY clause while querying the table in Google BigQuery. For example:

SELECT * FROM [projectId:datasetId.tableId] ORDER BY position

auditLogMode = false

gg.handler.name.auditLogMode=false

When the handler is configured to run with audit log mode false, the data is pushed into Google BigQuery using a unique row identification key. The Google BigQuery is able to merge different operations for the same row. However, the behavior is complex. The Google BigQuery maintains a finite deduplication period in which it will merge operations for a given row. Therefore, the results can be somewhat non-deterministic.

The trail source needs to have a full image of the records in order to merge correctly.

Example 1

An insert operation is sent to BigQuery and before the deduplication period expires, an update operation for the same row is sent to BigQuery. The resultant is a single row in BigQuery for the update operation.

Example 2

An insert operation is sent to BigQuery and after the deduplication period expires, an update operation for the same row is sent to BigQuery. The resultant is that both the insert and the update operations show up in BigQuery.

This behavior has confounded many users, as this is the documented behavior when using the BigQuery SDK and a feature as opposed to a defect. The documented length of the deduplication period is at least one minute. However, Oracle testing has shown that the period is significantly longer. Therefore, unless users can guarantee that all operations for a give row occur within a very short period, it is likely there will be multiple entries for a given row in BigQuery. It is therefore just as important for users to configure meta columns with the optype and position so they can determine the latest state for a given row. To read more about audit log mode read the following Google BigQuery documentation:Streaming data into BigQuery.

Parent topic: Detailing the Functionality

8.2.18.1.1.4 Operation Processing Support

The BigQuery Handler pushes operations to Google BigQuery using synchronous API. Insert, update, and delete operations are processed differently in BigQuery than in a traditional RDBMS.

The following explains how insert, update, and delete operations are interpreted by the handler depending on the mode of operation:

auditLogMode = true

insert – Inserts the record with optype as an insert operation in the BigQuery table.
update –Inserts the record with optype as an update operation in the BigQuery table.
delete – Inserts the record with optype as a delete operation in the BigQuery table.
pkUpdate—When pkUpdateHandling property is configured as delete-insert, the handler sends out a delete operation followed by an insert operation. Both these rows have the same position in the BigQuery table, which helps to identify it as a primary key operation and not a separate delete and insert operation.

auditLogMode = false

insert – If the row does not already exist in Google BigQuery, then an insert operation is processed as an insert. If the row already exists in Google BigQuery, then an insert operation is processed as an update. The handler sets the deleted column to false.
update – If a row does not exist in Google BigQuery, then an update operation is processed as an insert. If the row already exists in Google BigQuery, then an update operation is processed as update. The handler sets the deleted column to false.
delete – If the row does not exist in Google BigQuery, then a delete operation is added. If the row exists in Google BigQuery, then a delete operation is processed as a delete. The handler sets the deleted column to true.
pkUpdate—When pkUpdateHandling property is configured as delete-insert, the handler sets the deleted column to true for the row whose primary key is updated. It is followed by a separate insert operation with the new primary key and the deleted column set to false for this row.

Do not toggle the audit log mode because it forces the BigQuery handler to abend as Google BigQuery cannot alter schema of an existing table. The existing table needs to be deleted before switching audit log modes.

Note:

The BigQuery Handler does not support the truncate operation. It abends when it encounters a truncate operation.

Parent topic: Detailing the Functionality

8.2.18.1.1.5 Proxy Settings

To connect to BigQuery using a proxy server, you must configure the proxy host and the proxy port in the properties file as follows:

jvm.bootoptions= -Dhttps.proxyHost=proxy_host_name -Dhttps.proxyPort=proxy_port_number

Parent topic: Detailing the Functionality

8.2.18.1.1.6 Mapping to Google Datasets

A dataset is contained within a specific Google cloud project. Datasets are top-level containers that are used to organize and control access to your tables and views.

A table or view must belong to a dataset, so you need to create at least one dataset before loading data into BigQuery.

The Big Query handler can use existing datasets or create datasets if not found.

The Big Query Handler maps the table's schema name to the dataset name. For three-part table names, the dataset is constructed by concatenating catalog and schema.

Parent topic: Detailing the Functionality

8.2.18.1.2 Setting Up and Running the BigQuery Handler

The Google BigQuery Handler uses the Java BigQuery client libraries to connect to Big Query.

These client libraries are located using the following Maven coordinates:

Group ID: com.google.cloud
Artifact ID: google-cloud-bigquery
Version: 2.7.1

The BigQuery Client libraries do not ship with Oracle GoldenGate for Big Data. Additionally, Google appears to have removed the link to download the BigQuery Client libraries. You can download the BigQuery Client libraries using Maven and the Maven coordinates listed above. However, this requires proficiency with Maven. The Google BigQuery client libraries can be downloaded using the Dependency downloading scripts. For more information, see Google BigQuery Dependencies.

For more information about Dependency Downloader, see Dependency Downloader.

Parent topic: Using Streaming API

8.2.18.1.2.1 Schema Mapping for BigQuery

The table schema name specified in the replicat map statement is mapped to the BigQuery dataset name. For example: map QASOURCE.*, target "dataset_US".*;

This map statement replicates tables to the BigQuery dataset "dataset_US". Oracle GoldenGate for Big Data normalizes schema and table names to uppercase. Lowercase and mixed case dataset and table names are supported, but need to be quoted in the Replicat mapping statement.

Parent topic: Setting Up and Running the BigQuery Handler

8.2.18.1.2.2 Understanding the BigQuery Handler Configuration

The following are the configurable values for the BigQuery Handler. These properties are located in the Java Adapter properties file (not in the Replicat properties file).

To enable the selection of the BigQuery Handler, you must first configure the handler type by specifying gg.handler.name.type=bigquery and the other BigQuery properties as follows:

Properties	Required/ Optional	Legal Values	Default	Explanation
`gg.handlerlist`	Required	Any string	None	Provides a name for the BigQuery Handler. The BigQuery Handler name then becomes part of the property names listed in this table.
`gg.handler.name.type=bigquery`	Required	`bigquery`	None	Selects the BigQuery Handler for streaming change data capture into Google BigQuery.
`gg.handler.name.credentialsFile`	Optional	Relative or absolute path to the credentials file	None	The credentials file downloaded from Google BigQuery for authentication. If you do not specify the path to the credentials file, you need to set it as an environment variable, see Configuring Handler Authentication.
`gg.handler.name.projectId`	Required	Any string	None	The name of the project in Google BigQuery. The handler needs project ID to connect to Google BigQuery store.
`gg.handler.name.batchSize`	Optional	Any number	`500`	The maximum number of operations to be batched together. This is applicable for all target table batches.
`gg.handler.name.batchFlushFrequency`	Optional	Any number	`1000`	The maximum amount of time in milliseconds to wait before executing the next batch of operations. This is applicable for all target table batches.
`gg.handler.name.skipInvalidRows`	Optional	`true` \| `false`	`false`	Sets whether to insert all valid rows of a request, even if invalid rows exist. If not set, the entire insert request fails if it contains an invalid row.
`gg.handler.name.ignoreUnknownValues`	Optional	`true` \| `false`	`false`	Sets whether to accept rows that contain values that do not match the schema. If not set, rows with unknown values are considered to be invalid.
`gg.handler.name.connectionTimeout`	Optional	Positive integer	`20000`	The maximum amount of time, in milliseconds, to wait for the handler to establish a connection with Google BigQuery.
`gg.handler.name.readTimeout`	Optional	Positive integer	`30000`	The maximum amount of time in milliseconds to wait for the handler to read data from an established connection.
`gg.handler.name.metaColumnsTemplate`	Optional	A legal string	None	A legal string specifying the `metaColumns` to be included. If you set `auditLogMode` to `true`, it is important that you set the `metaColumnsTemplate` property to view the operation type for the row inserted in the audit log, see Metacolumn Keywords.
`gg.handler.name.auditLogMode`	Optional	`true` \| `false`	`false`	Set to `true`, the handler writes each record to the target without any primary key. Everything is processed as insert. Set to `false`, the handler tries to merge incoming records into the target table if they have the same primary key. Primary keys are needed for this property. The trail source records need to have a full image updates to merge correctly.
`gg.handler.name.pkUpdateHandling`	Optional	`abend` \| `delete-insert`	`abend`	Sets how the handler handles update operations that change a primary key. Primary key operations can be problematic for the BigQuery Handler and require special consideration: `abend`- indicates the process abends. `delete-insert`- indicates the process treats the operation as a delete and an insert. The full before image is required for this property to work correctly. Without full before and after row images the insert data are incomplete. Oracle recommends this option.
`gg.handler.name.adjustScale`	Optional	`true` \| `false`	`false`	The BigQuery numeric data type supports a maximum scale of 9 digits. If a field is mapped into a BigQuery numeric data type, then it fails if the scale is larger than 9 digits. Set this property to `true` to round fields mapped to BigQuery numeric data types to a scale of 9 digits. Enabling this property results in a loss of precision for source data values with a scale larger than 9.
`gg.handler.name.includeDeletedColumn`	Optional	`true` \| `false`	`false`	Set to `true` to include a boolean column in the output called deleted. The value of this column is set to `false` for insert and update operations, and is set to `true` for delete operations.
`gg.handler.name.enableAlter`	Optional	`true` \| `false`	`false`	Set to `true` to enable altering the target BigQuery table. This will allow the BigQuery Handler to add columns or metacolumns configured on the source, which are not currently in the target BigQuery table.
`gg.handler.name.clientId`	Optional	String	None	Use to set the client id if the configuration property `gg.handler.name.credentialsFile` to resolve the Google BigQuery credentials is not set. You may wish to use this property instead of the credentials file in order to use Oracle Wallet to secure credentials.
`gg.handler.name.clientEmail`	Optional	String	None	Use to set the client email if the configuration property `gg.handler.name.credentialsFile` to resolve the Google BigQuery credentials is not set. You may wish to use this property instead of the credentials file inorder to use Oracle Wallet to secure credentials.
`gg.handler.name.privateKey`	Optional	String	None	Use to set the private key if the configuration property `gg.handler.name.credentialsFile` to resolve the Google BigQuery credentials is not set. You may wish to use this property instead of the credentials file inorder to use Oracle Wallet to secure credentials.
`gg.handler.name.privateKeyId`	Optional	String	None	Use to set the private key id if the configuration property `gg.handler.name.credentialsFile` to resolve the Google BigQuery credentials is not set. You may wish use this property instead of the credentials file in order to use Oracle Wallet to secure credentials.
`gg.handler.name.url`	Optional	A legal URL to connect to BigQuery including scheme, server name and port (if not the default port). The default is https://www.googleapis.com.	https://www.googleapis.com	Allows the user to set a URL for a private endpoint to connect to BigQuery.

To be able to connect GCS to the Google Cloud Service account, ensure that either of the following is configured: the credentials file property with the relative or absolute path to credentials JSON file or the properties for individual credentials keys. The configuration property that is used to individually add google service account credential key enables them to be encrypted using the Oracle wallet.

Parent topic: Setting Up and Running the BigQuery Handler

8.2.18.1.2.3 Review a Sample Configuration

The following is a sample configuration for the BigQuery Handler:

gg.handlerlist = bigquery

#The handler properties
gg.handler.bigquery.type = bigquery
gg.handler.bigquery.projectId = festive-athlete-201315
gg.handler.bigquery.credentialsFile = credentials.json
gg.handler.bigquery.auditLogMode = true
gg.handler.bigquery.pkUpdateHandling = delete-insert

gg.handler.bigquery.metaColumnsTemplate =${optype}, ${position}

Parent topic: Setting Up and Running the BigQuery Handler

8.2.18.1.2.4 Configuring Handler Authentication

You have to configure the BigQuery Handler authentication using the credentials in the JSON file downloaded from Google BigQuery.

Download the credentials file:

Login into your Google account at cloud.google.com.
Click Console, and then to go to the Dashboard where you can select your project.
From the navigation menu, click APIs & Services then select Credentials.
From the Create Credentials menu, choose Service account key.
Choose the JSON key type to download the JSON credentials file for your system.

After you have the credentials file, you can authenticate the handler in one of the following methods listed here:

Specify the path to the credentials file in the properties file with the gg.handler.name.credentialsFile configuration property.

The path of the credentials file must contain the path with no wildcard appended. If you include the * wildcard in the path to the credentials file, the file is not recognized.

Or
Set the credentials file keys (clientId, ClientEmail, privateKeyId, and privateKey) into the corresponding handler properties.
Or
Set the GOOGLE_APPLICATION_CREDENTIALS environment variable on your system. For example:
```
export GOOGLE_APPLICATION_CREDENTIALS = credentials.json
```
Then restart the Oracle GoldenGate manager process.

Parent topic: Setting Up and Running the BigQuery Handler

8.2.18.1.3 Google BigQuery Dependencies

The Google BigQuery client libraries are required for integration with BigQuery.

The maven coordinates are as follows:

Maven groupId: com.google.cloud

Maven artifactId: google-cloud-bigquery

Version: 2.7.1

BigQuery 2.7.1

Parent topic: Using Streaming API

8.2.18.1.3.1 BigQuery 2.7.1

The required BigQuery Client libraries for the 2.7.1 version are as follows:

api-common-2.1.3.jar
checker-compat-qual-2.5.5.jar
checker-qual-3.21.1.jar
commons-codec-1.15.jar
commons-logging-1.2.jar
error_prone_annotations-2.11.0.jar
failureaccess-1.0.1.jar
gax-2.11.0.jar
gax-httpjson-0.96.0.jar
google-api-client-1.33.1.jar
google-api-services-bigquery-v2-rev20211129-1.32.1.jar
google-auth-library-credentials-1.4.0.jar
google-auth-library-oauth2-http-1.4.0.jar
google-cloud-bigquery-2.7.1.jar
google-cloud-core-2.4.0.jar
google-cloud-core-http-2.4.0.jar
google-http-client-1.41.2.jar
google-http-client-apache-v2-1.41.2.jar
google-http-client-appengine-1.41.2.jar
google-http-client-gson-1.41.2.jar
google-http-client-jackson2-1.41.2.jar
google-oauth-client-1.33.0.jar
grpc-context-1.44.0.jar
gson-2.8.9.jar
guava-31.0.1-jre.jar
httpclient-4.5.13.jar
httpcore-4.4.15.jar
j2objc-annotations-1.3.jar
jackson-core-2.13.1.jar
javax.annotation-api-1.3.2.jar
jsr305-3.0.2.jar
listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
opencensus-api-0.31.0.jar
opencensus-contrib-http-util-0.31.0.jar
protobuf-java-3.19.3.jar
protobuf-java-util-3.19.3.jar
proto-google-common-protos-2.7.2.jar
proto-google-iam-v1-1.2.1.jar

Parent topic: Google BigQuery Dependencies

8.2.18.2 Google BigQuery Stage and Merge

Topics:

Overview
BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time.
Detailed Functionality
Prerequisites
Differences between BigQuery Handler and Stage and Merge BigQuery Event Handler
Authentication or Authorization
Configuration
Troubleshooting and Diagnostics

Parent topic: Google BigQuery

8.2.18.2.1 Overview

BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time.

Parent topic: Google BigQuery Stage and Merge

8.2.18.2.2 Detailed Functionality

The BigQuery Event handler uses the stage and merge data flow.

The change data is staged in a temporary location in microbatches and eventually merged into to the target table. Google Cloud Storage (GCS) is used as the staging area for change data.

This Event handler is used as a downstream Event handler connected to the output of the GCS Event handler.

The GCS Event handler loads files generated by the File Writer Handler into Google Cloud Storage.

The Event handler runs BigQuery Query jobs to execute MERGE SQL. The SQL operations are performed in batches providing better throughput.

Note:

The BigQuery Event handler doesn't use the Google BigQuery streaming API.

Parent topic: Google BigQuery Stage and Merge

8.2.18.2.3 Prerequisites

Target table existence: Ensure that the target tables exist in the BigQuery dataset.
Google Cloud Storage(GCS) bucket and dataset location: Ensure that the GCS bucket and the BigQuery dataset exist in the same location/region.

Parent topic: Google BigQuery Stage and Merge

8.2.18.2.4 Differences between BigQuery Handler and Stage and Merge BigQuery Event Handler

Table 8-23 BigQuery Handler v/s Stage and Merge BigQuery Event Handler

Feature/Limitation	BigQuery Handler	Stage And Merge BigQuery Event Handler
Compressed update support	Partially supported with limitations.	YES
Audit log mode	Process all the operations as `INSERT`.	No need to enable audit log mode.
GCP Quotas/Limits	Maximum rows per second per table: `100000`. See Google BigQuery Documentation.	Daily destination table update limit — `1500` updates per table per day. See Google BigQuery Documentation.
Approximate pricing with `1TB` Storage (for exact pricing refer GCP Pricing calculator)	Streaming Inserts for 1TB costs `~72.71` USD per month	Query job for `1TB` costs `~20.28` USD per month.
Duplicate rows replicated to BigQuery	YES	NO
Replication of `TRUNCATE` operation	Not supported	Supported
API used	BigQuery Streaming API	BigQuery Query job

Parent topic: Google BigQuery Stage and Merge

8.2.18.2.5 Authentication or Authorization

For more information about using the Google service account key, see Authentication and Authorization in the Google Cloud Service (GCS) Event Handler topic. In addition to the permissions needed to access GCS, the service account also needs permissions to access BigQuery. You may choose to use a pre-defined IAM role, such as roles/bigquery.dataEditor or roles/bigquery.dataOwner. When creating a custom role, the following are the IAM permissions used to run BigQuery Event handler. For more information, see Configuring Handler Authentication.

BigQuery Permissions

Parent topic: Google BigQuery Stage and Merge

8.2.18.2.5.1 BigQuery Permissions

Table 8-24 BigQuery Permissions

Permission	Description
`bigquery.connections.create`	Create new connections in a project.
`bigquery.connections.delete`	Delete a connection.
`bigquery.connections.get`	Gets connection metadata. Credentials are excluded.
`bigquery.connections.list`	List connections in a project.
`bigquery.connections.update`	Update a connection and its credentials.
`bigquery.connections.use`	Use a connection configuration to connect to a remote data source.
`bigquery.datasets.create`	Create new datasets.
`bigquery.datasets.get`	Get metadata about a dataset.
`bigquery.connections.export`	Export table data out of BigQuery.
`bigquery.connections.get`	Get table metadata. To get table data, you need `bigquery.tables.getData`.
`bigquery.connections.list`	List connections in a project.
`bigquery.connections.update`	Update a connection and its credentials.
`bigquery.datasets.create`	Create new empty datasets.
`bigquery.datasets.get`	Get metadata about a dataset.
`bigquery.datasets.getIamPolicy`	Reserved for future use.
`bigquery.datasets.update`	Update metadata for a dataset.
`bigquery.datasets.updateTag`	Update tags for a dataset.
`bigquery.jobs.create`	Run jobs (including queries) within the project.
`bigquery.jobs.get`	Get data and metadata on any job.
`bigquery.jobs.list`	List all jobs and retrieve metadata on any job submitted by any user. For jobs submitted by other users, details and metadata are redacted.
`bigquery.jobs.listAll`	List all jobs and retrieve metadata on any job submitted by any user.
`bigquery.jobs.update`	Cancel any job.
`bigquery.readsessions.create`	Create a new read session via the BigQuery Storage API.
`bigquery.readsessions.getData`	Read data from a read session via the BigQuery Storage API.
`bigquery.readsessions.update`	Update a read session via the BigQuery Storage API.
`bigquery.reservations.create`	Create a reservation in a project.
`bigquery.reservations.delete`	Delete a reservation.
`bigquery.reservations.get`	Retrieve details about a reservation.
`bigquery.reservations.list`	List all reservations in a project.
`bigquery.reservations.update`	Update a reservation’s properties.
`bigquery.reservationAssignments.create`	Create a reservation assignment. This permission is required on the owner project and assignee resource. To move a reservation assignment, you need `bigquery.reservationAssignments.create` on the new owner project and assignee resource.
`bigquery.reservationAssignments.delete`	Delete a reservation assignment. This permission is required on the owner project and assignee resource. To move a reservation assignment, you need `bigquery.reservationAssignments.delete` on the old owner project and assignee resource.
`bigquery.reservationAssignments.list`	List all reservation assignments in a project.
`bigquery.reservationAssignments.search`	Search for a reservation assignment for a given project, folder, or organization.
`bigquery.routines.create`	Create new routines (functions and stored procedures).
`bigquery.routines.delete`	Delete routines.
`bigquery.routines.list`	List routines and metadata on routines.
`bigquery.routines.update`	Update routine definitions and metadata.
`bigquery.savedqueries.create`	Create saved queries.
`bigquery.savedqueries.delete`	Delete saved queries.
`bigquery.savedqueries.get`	Get metadata on saved queries.
`bigquery.savedqueries.list`	Lists saved queries.
`bigquery.savedqueries.update`	Updates saved queries.
`bigquery.tables.create`	Create new tables.
`bigquery.tables.delete`	Delete tables
`bigquery.tables.export`	Export table data out of BigQuery.
`bigquery.tables.get`	Get table metadata. To get table data, you need `bigquery.tables.getData`.
`bigquery.tables.getData`	Get table data. This permission is required for querying table data. To get table metadata, you need `bigquery.tables.get`.
`bigquery.tables.getIamPolicy`	Read a table’s IAM policy.
`bigquery.tables.list`	List tables and metadata on tables.
`bigquery.tables.setCategory`	Set policy tags in table schema.
`bigquery.tables.setIamPolicy`	Changes a table’s IAM policy.
`bigquery.tables.update`	Update table metadata. To update table data, you need `bigquery.tables.updateData`.
`bigquery.tables.updateData`	Update table data. To update table metadata, you need `bigquery.tables.update`.
`bigquery.tables.updateTag`	Update tags for a table.

In addition to these permissions, ensure that resourcemanager.projects.get/list is always granted as a pair.

Parent topic: Authentication or Authorization

8.2.18.2.6 Configuration

Automatic Configuration
Classpath Configuration
The GCS Event handler and the BigQuery Event handler use the Java SDK provided by Google. Google does not provide a direct link to download the SDK.
Proxy Configuration
INSERTALLRECORDS Support
BigQuery Dataset and GCP ProjectId Mapping
End-to-End Configuration
Compressed Update Handling

Parent topic: Google BigQuery Stage and Merge

8.2.18.2.6.1 Automatic Configuration

Replication to BigQuery involves configuring of multiple components, such as File Writer handler, Google Cloud Storae (GCS) Event handler and BigQuery Event handler.

The Automatic Configuration functionality helps to auto configure these components so that the user configuration is minimal.

The properties modified by auto configuration is also logged in the handler log file. To enable auto configuration to replicate to BigQuery target, set the parameter gg.target=bq.

When replicating to BigQuery target, you cannot customize GCS Event handler name and BigQuery Event handler name.

File Writer Handler Configuration
File Writer handler name is preset to the value bq. The following is an example to edit a property of File Writer handler: gg.handler.bq.pathMappingTemplate=./dirout.
GCS Event Handler Configuration
The GCS Event handler name is preset to the value gcs. The following is an example to edit a property of GCS Event handler: gg.eventhandler.gcs.concurrency=5.
BigQuery Event Handler Configuration
BigQuery Event handler name is preset to the value bq. There are no mandatory parameters required for BigQuery Event handler. Mostly, auto configure derives the required parameters.

Parent topic: Configuration

8.2.18.2.6.1.1 File Writer Handler Configuration

File Writer handler name is preset to the value bq. The following is an example to edit a property of File Writer handler: gg.handler.bq.pathMappingTemplate=./dirout.

Parent topic: Automatic Configuration

8.2.18.2.6.1.2 GCS Event Handler Configuration

The GCS Event handler name is preset to the value gcs. The following is an example to edit a property of GCS Event handler: gg.eventhandler.gcs.concurrency=5.

Parent topic: Automatic Configuration

8.2.18.2.6.1.3 BigQuery Event Handler Configuration

BigQuery Event handler name is preset to the value bq. There are no mandatory parameters required for BigQuery Event handler. Mostly, auto configure derives the required parameters.

The following are the BigQuery Event handler configurations:

Properties	Required/ Optional	Legal Values	Default	Explanation
`gg.eventhandler.bq.credentialsFile`	Optional	Relative or absolute path to the service account key file.	Value from property `gg.eventhandler.gcs.credentialsFile`	Sets the path to the service account key file. Autoconfigure will automatically configure this property based on the configuration `gg.eventhandler.gcs.credentialsFile,` unless the user wants to use a different service account key file for BigQuery access. Alternatively, if the environment variable `GOOGLE_APPLICATION_CREDENTIALS` is set to the path to the service account key file, this parameter need not be set.
`gg.eventhandler.bq.projectId`	Optional	The Google project-id	project-id associated with the service account.	Sets the project-id of the Google Cloud project that houses BigQuery. Autoconfigure will automatically configure this property by accessing the service account key file unless user wants to override this explicitly.
`gg.eventhandler.bq.kmsKey`	Optional	Key names in the format: `projects/<PROJECT>/locations/<LOCATION>/keyRings/<RING_NAME>/cryptoKeys/<KEY_NAME>` `<PROJECT>`: Google project-id `<LOCATION>`: Location of the BigQuery dataset. `<RING_NAME>`: Google Cloud KMS key ring name. `<KEY_NAME>`: Google Cloud KMS key name.	Value from property `gg.eventhandler.gcs.kmsKey`	Set a customer managed Cloud KMS key to encrypt data in BigQuery. Autoconfigure will automatically configure this property based on the configuration `gg.eventhandler.gcs.kmsKey`.
`gg.eventhandler.bq.connectionTimeout`	Optional	Positive integer.	`20000`	The maximum amount of time, in milliseconds, to wait for the handler to establish a connection with Google BigQuery.
`gg.eventhandler.bq.readTimeout`	Optional	Positive integer.	`30000`	The maximum amount of time in milliseconds to wait for the handler to read data from an established connection.
`gg.eventhandler.bq.totalTimeout`	Optional	Positive integer.	`120000`	The total timeout parameter in seconds. The TotalTimeout parameter has the ultimate control over how long the logic should keep trying the remote call until it gives up completely.
`gg.eventhandler.bq.retries`	Optional	Positive integer.	`3`	The maximum number of retry attempts to perform.
`gg.eventhandler.bq.createDataset`	Optional	`true` \| `false`	`true`	Set to `true` to automatically create the BigQuery dataset if it does not exist.
`gg.eventhandler.bq.createTable`	Optional	`true` \| `false`	`true`	Set to `true` to automatically create the BigQuery target table if it does not exist.
`gg.aggregate.operations.flush.interval`	Optional	Integer	30000	The flush interval parameter determines how often the data will be merged into Snowflake. The value is set in milliseconds. Caution: The higher this value, more data will be stored in the memory of the Replicat process.. Note: Use the flush interval parameter with caution. Increasing its default value will increase the amount of data stored in the internal memory of the Replicat. This can cause out of memory errors and stop the Replicat if it runs out of memory.
`gg.compressed.update`	Optional	`true` or `false`	`true`	If set the `true`, then this indicates that the source trail files contain compressed update operations. If set to `true`, then the source trail files are expected to contain uncompressed update operations.
`gg.eventhandler.bq.connectionRetryIntervalSeconds`	Optional	Integer Value	30	Specifies the delay in seconds between connection retry attempts.
`gg.eventhandler.bq.connectionRetries`	Optional	Integer Value	3	Specifies the number of times connections to the target data warehouse will be retried.
`gg.eventhandler.bq.url`	Optional	An absolute URL to connect to Google BigQuery.	https://googleapis.com	A legal URL to connect to Google BigQuery including scheme, server name and port (if not the default port). The default is https://googleapis.com.

Parent topic: Automatic Configuration

8.2.18.2.6.2 Classpath Configuration

The GCS Event handler and the BigQuery Event handler use the Java SDK provided by Google. Google does not provide a direct link to download the SDK.

You can download the SDKs using the following maven co-ordinates:

Google Cloud Storage

 <dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud-storage</artifactId>
        <version>1.113.9</version>
    </dependency>

To download the GCS dependencies, execute the following script <OGGDIR>/DependencyDownloader/gcs.sh.

BigQuery

 <dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud-bigquery</artifactId>
        <version>1.111.1</version>
    </dependency>

To download the BigQuery dependencies, execute the following script<OGGDIR>/DependencyDownloader/bigquery.sh. For more information, see gcs.sh in Dependency Downloader Scripts.

Set the path to the GCS and BigQuery SDK in the gg.classpath configuration parameter. For example: gg.classpath=./gcs-deps/*:./bq-deps/*.

For more information, see Dependency Downloader Scripts.

Parent topic: Configuration

8.2.18.2.6.3 Proxy Configuration

When the replicat process is run behind a proxy server, you can use the jvm.bootoptions property to set the proxy server configuration. For example: jvm.bootoptions=-Dhttps.proxyHost=some-proxy-address.com -Dhttps.proxyPort=80.

Parent topic: Configuration

8.2.18.2.6.4 INSERTALLRECORDS Support

Stage and merge targets supports INSERTALLRECORDS parameter.

See INSERTALLRECORDS in Reference for Oracle GoldenGate. Set the INSERTALLRECORDS parameter in the Replicat parameter file (.prm). Set the INSERTALLRECORDS parameter in the Replicat parameter file (.prm)

Setting this property directs the Replicat process to use bulk insert operations to load operation data into the target table.

To process initial load trail files, set the INSERTALLRECORDS parameter in the Replicat parameter file (.prm). Setting this property directs the Replicat process to use bulk insert operations to load operation data into the target table. You can tune the batch size of bulk inserts using the gg.handler.bq.maxFileSize File Writer property. The default value is set to 1GB.

The frequency of bulk inserts can be tuned using the File Writer gg.handler.bq.fileRollInterval property, the default value is set to 3m (three minutes).

Parent topic: Configuration

8.2.18.2.6.5 BigQuery Dataset and GCP ProjectId Mapping

The BigQuery Event handler maps the table schema name to the BigQuery dataset.

The table catalog name is mapped to the GCP projectId.

Parent topic: Configuration

8.2.18.2.6.5.1 Three-Part Table Names

If the tables use distinct catalog names, then the BigQuery datasets would reside in multiple GCP projects. The GCP service account key should have the required privileges in the respective GCP projects. See BigQuery Permissions.

Parent topic: BigQuery Dataset and GCP ProjectId Mapping

8.2.18.2.6.5.2 Mapping Table

Table 8-25 Mapping Table

MAP statement in the Replicat parameter file	BigQuery Dataset	GCP ProjectId
`MAP SCHEMA1., TARGET "bq-project-1"..*;`	`SCHEMA1`	`bq-project-1`
`MAP "bq-project-2".SCHEMA2., TARGET ..;`	`SCHEMA2`	`bq-project-2`
`MAP SCHEMA3., TARGET .*;`	`SCHEMA3`	The default projectId from the GCP service account key file or the configuration `gg.eventhandler.bq.projectId`.

Parent topic: BigQuery Dataset and GCP ProjectId Mapping

8.2.18.2.6.6 End-to-End Configuration

The following is an end-end configuration example which uses auto configuration for File Writer (FW) handler, GCS, and BigQuery Event handlers.

This sample properties file is located at: AdapterExamples/big-data/bigquery-via-gcs/bq.props.

 # Configuration to load GoldenGate trail operation records
 # into Google Big Query by chaining
 # File writer handler -> GCS Event handler -> BQ Event handler.
 # Note: Recommended to only edit the configuration marked as TODO
 # The property gg.eventhandler.gcs.credentialsFile need not be set if
 # the GOOGLE_APPLICATION_CREDENTIALS environment variable is set.

 gg.target=bq

 ## The GCS Event handler
 #TODO: Edit the GCS bucket name
 gg.eventhandler.gcs.bucketMappingTemplate=<gcs-bucket-name>
 #TODO: Edit the GCS credentialsFile
 gg.eventhandler.gcs.credentialsFile=/path/to/gcp/credentialsFile

## The BQ Event handler
## No mandatory configuration required.

#TODO: Edit to include the GCS Java SDK and BQ Java SDK.
gg.classpath=/path/to/gcs-deps/*:/path/to/bq-deps/*
#TODO: Edit to provide sufficient memory (at least 8GB).
jvm.bootoptions=-Xmx8g -Xms8g
#TODO: If running OGGBD behind a proxy server.
#jvm.bootoptions=-Xmx8g -Xms512m -Dhttps.proxyHost=<ip-address> -Dhttps.proxyPort=<port>

Parent topic: Configuration

8.2.18.2.6.7 Compressed Update Handling

A compressed update record contains values for the key columns and the modified columns.

An uncompressed update record contains values for all the columns.

Oracle GoldenGate trails may contain compressed or uncompressed update records. The default extract configuration writes compressed updates to the trails.

The parameter gg.compressed.update can be set to true or false to indicate compressed/uncompressed update records.

MERGE Statement with Uncompressed Updates

Parent topic: Configuration

8.2.18.2.6.7.1 MERGE Statement with Uncompressed Updates

In some use cases, if the trail contains uncompressed update records, then the MERGE SQL statement can be optimized for better performance by setting gg.compressed.update=false.

Parent topic: Compressed Update Handling

8.2.18.2.7 Troubleshooting and Diagnostics

DDL not applied on the target table: Oracle GoldenGate for BigData does not support DDL replication.
SQL Errors: In case there are any errors while executing any SQL, the entire SQL statement along with the bind parameter values are logged into the Oracle GoldenGate for Big Data handler log file.
Co-existence of the components: The location/region of the machine where Replicat process is running and the BigQuery dataset/GCS bucket impacts the overall throughput of the apply process.
Data flow is as follows: GoldenGate -> GCS bucket -> BigQuery. For best throughput, ensure that the components are located as close as possible.
com.google.cloud.bigquery.BigQueryException: Access Denied: Project <any-gcp-project>: User does not have bigquery.datasets.create permission in project <any-gcp-project>. The service account key used by Oracle GoldenGate for BigData does not have permission to create datasets in this project. Grant the permission bigquery.datasets.create and restart the Replicat process. The privileges are listed in BigQuery Permissions.

Parent topic: Google BigQuery Stage and Merge