7.2 Realtime Data Ingestion into Snowflake with Oracle GoldenGate for Distributed Applications and Analytics

Overview

This Quickstart covers a step-by-step process showing how to ingest real-time data into Snowflake with Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA).

Snowflake is a cloud-based data warehousing platform that provides a fully managed service for storing, processing, and analysing data.

GG for DAA Snowflake handler uses the stage and merge data flow. It supports both external and internal staging options.

In stage and merge, the change data is staged in a temporary location in microbatches and eventually merged into to the target table. If external staging is selected, storage location is cloud storage service of the cloud provider (Azure Storage for Azure, S3 for AWS and Cloud Storage for GCP). If internal staging is selected, then external storage location will not be needed.

All replication process is automatically handled by Snowflake Stage and Merge Handler handler.

This quickstart covers the steps for internal staging. External staging requires a storage service and a Snowflake storage integration.

7.2.1 Prerequisites for Internal Staging

  • A Snowflake account.
  • Target schema and database in Snowflake
  • JDBC URL for Snowflake access.
  • Snowflake username/ password

In this Quickstart, a sample trail file (named tr), which is shipped with GG for DAA is used. The sample trail file is located at GG_HOME/opt/AdapterExamples/trail/ in your GG for DAA instance.

GG for DAA creates the target Snowflake tables automatically.

7.2.2 Install Dependency Files

GG for DAA uses Java SDK provided by Snowflake. You can download the SDKs using Dependency Downloader utility shipped with GG for DAA. Dependency downloader is a set of shell scripts that downloads dependency jar files from Maven and other repositories.

To install the required dependency files:

  1. In your GoldenGate for VM, go to dependency downloader utility. It is located at GG_HOME/opt/DependencyDownloader/ and locate snowflake.sh.
  2. Execute snowflake.sh with the required version. You can check the version and reported vulnerabilities in https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc. This document uses 3.15.0 which is the latest version when this quickstart is published.

    Figure 7-1 Snowflake Ingestion - Install dependency

    Snowflake Ingestion - Install dependency
  3. A new directory gets created in GG_HOME/opt/DependencyDownloader/dependencies named as <snowflake_version>. Make a note of this directory as it will be used in the replicat properties. For example: /u01/app/ogg/opt/DependencyDownloader/dependencies/snowflake_3.15.0

    Figure 7-2 Snowflake jar new directory

    Snowflake jar new directory

7.2.3 Create a Credential Store Entry

To create a credential store entry for securing Snowflake username and password:

  1. In Administration Service, click DB Connections and then click Add DB Connection.
  2. Under Database, click Add Credential.
  3. Provide the following details:
    • Credential Domain: Domain name of your choice
    • Credential Alias: Alias name of your choice
    • User ID: Snowflake Username
    • Password: Snowflake Password
    • Verify Password: Snowflake Password

    Figure 7-3 Create a credential entry

    Create a credential entry

7.2.4 Create a Replicat in Oracle GoldenGate for Distributed Applications and Analytics

To create a replicat in Oracle GoldenGate for Distributed Applications and Analytics (GG for DAA):

  1. In the Oracle GoldenGate for Distributed Applications and Analytics UI, in the Administration Service tab, click the + sign to add a replicat.

    Figure 7-4 Click + in the Administration Service tab.

    Click + in the Administration Service tab.
  2. Select the Classic Replicat Replicat Type and click Next There are two different Replicat types available: Classic and Coordinated. Classic Replicat is a single threaded process whereas Coordinated Replicat is a multithreaded one that applies transactions in parallel.

    Figure 7-5 Add a Replicat

    Add a Replicat
  3. Enter the basic information, and click Next:
    1. Trail Name: Name of the required trail file (if using sample trail, provide as tr)
    2. Subdirectory: Provide as GG_HOME/opt/AdapterExamples/trail/ if using the sample trail.
    3. Target: Snowflake Data Warehouse

    Figure 7-6 Add Replicat - Replicat Options

    Add Replicat - Replicat Options - Stage and Merge using external object storage is not selected as this quickstart uses internal staging.
  4. Leave Managed Options as is and click Next.
  5. Enter Parameter File details and click Next. In the Parameter File, you can specify source to target mapping. If you are using the sample trail file (tr), then enter as follows: MAP QASOURCE.*, TARGET <your_schema_name>.*;

    If Coordinated Replicat is selected as the Replicat Type, then an additional parameter needs to be provided:

    TARGETDB LIBFILE libggjava.so SET property=<ggbd-deployment_home>/etc/conf/ogg/your_replicat_name.properties

    Figure 7-7 Enter parameter details

    Enter parameter details.
  6. In the next screen, update the properties only tagged as TO DO and click Create.

    Edit your Snowflake JDBC URL:
    #TODO: Edit JDBC ConnectionUrl
    gg.eventhandler.snowflake.connectionURL=jdbc:snowflake://<account_name>.snowflakecomputing.com/?<connection_params>
    
    Edit Snowflake username and password:
    #TODO: Edit JDBC user name with alias and domain from step 2.
    gg.eventhandler.snowflake.UserName=ORACLEWALLETUSERNAME [alias domain] 
    #TODO: Edit JDBC password with alias and domain from step 2.
    gg.eventhandler.snowflake.Password=ORACLEWALLETPASSWORD [alias domain]
    
    Provide path to dependency jar files that you downloaded in prerequisites:
    #TODO: Edit to include the GCS Java SDK and BQ Java SDK.
    gg.classpath=/path/to/Snowflake-dependencies/*	
    

    Figure 7-8 Properties File

    Properties File
    See Understanding the BigQuery Handler Configuration for more replicat configuration details.
  7. If replicat starts successfully, it will be in running state. You can go to action/details/statistics to see the replication statistics.

    Figure 7-9 Replicat Statistics

    Replicat Statistics

    Figure 7-10 Replicat Table Statistics

    Replicat Table Statistics
  8. You can go to Snowflake console and check the tables. It may take a short while for tables to be created and loaded.

Note: