Replicate data from PostgreSQL to Snowflake

Learn to replicate data from PostgreSQL to Snowflake using OCI GoldenGate.

Before you begin

To successfully complete this quicktart, you must have the following:

Set up the environment

To set up the environment for this Quickstart:
  1. Install PostgreSQL.
    1. Install PostgreSQL server:
      sudo yum install postgresql-server
    2. Install postgresql-contrib module to avoid this SQL exception:
      sudo yum install postgresql-contrib
    3. Create a new PostgreSQL database cluster:
      sudo postgresql-setup --initdb
    4. Enable the postgresql.service:
      sudo systemctl enable postgresql.service
    5. Start the postgresql.service:
      sudo systemctl start postgresql.service
  2. By default, PostgreSQL only allows local connections. Allow remote connectivity to PostgreSQL.
    1. In /var/lib/pgsql/data/postgresql.conf, prepare the database for replication:
      1. Locate and uncomment listen_addresses = 'localhost' and change localhost to an asterisk (*):
        listen_addresses = '*'
      2. Set the following parameters as follows:
        • wal_level = logical
        • max_replication_slots = 1
        • max_wal_senders = 1
        • track_commit_timestamp = on

      Note:

      Configure /var/lib/pgsql/data/pg_hba.conf to ensure that client authentication is set to allow connections from an Oracle GoldenGate host. For example, add the following:
      #Allow connections from remote hosts
      host    all    all    0.0.0.0/0    md5
      See The pg_hba.conf File for more information.
    2. Restart PostgreSQL server:
      sudo systemctl restart postgresql.service
  3. If using Oracle Cloud Compute to host PostgreSQL, open port 5432:
    sudo firewall-cmd --permanent --add-port=5432/tcp
    sudo firewall-cmd --reload
    sudo firewall-cmd --list-all
  4. Open port 5432 in your VCN's security list.
  5. Connect to PostgreSQL.
    > sudo su - postgres
    > psql
  6. Set up PostgreSQL.
    1. Download and run seedSRCOCIGGLL_PostgreSQL.sql to set up the database and load the sample data.
    2. Run the following commands to set up the user:
      create user ggadmin with password 'W3lcome@1234';
      alter user ggadmin with SUPERUSER;
      GRANT ALL PRIVILEGES ON DATABASE ociggll TO ggadmin;
  7. Set up Snowflake:
    1. Create a GoldenGate user in Snowflake with appropriate privileges.
    2. Create target tables using sample schema.
    3. Ensure the tables and user have been succesfully created.

Task 1: Create the OCI GoldenGate Resources

This quickstart example requires deployments and connections for both the source and target.
  1. Create a deployment for the source PostgreSQL database.
  2. Create a Big Data deployment for the target Snowflake database.
  3. Create a PostgreSQL connection with the following values:
    1. For Type, select PostgreSQL Server from the dropdown.
    2. For Database name, enter ociggll.
    3. For Host, enter the public IP of the Compute instance that PostgreSQL runs on.
    4. For Port, enter 5432.
    5. For Username, enter ggadmin.
    6. For Password, enter W3lcome@1234.
    7. For Security Protocol, select Plain from the dropdown.
  4. Create a Snowflake connection with the following values:
    1. For Connection URL, enter jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?warehouse=<warehouse name>&db=OCIGGLL.

      Note:

      Ensure you replace <account_identifier> and <warehouse name> with the appropriate values.
    2. For Authentication Type, select Basic authentication from the dropdown.
    3. For Username, enter a name.
    4. For Password, enter a password.
  5. (Optional) If your Big Data deployment doesn't have a public endpoint, then create a connection to GoldenGate, and then assign this connection to the source PostgreSQL deployment.
  6. Assign the source PostgreSQL connection to the PostgreSQL deployment.
  7. Assign the Snowflake connection to the target Big Data deployment.

Task 2: Create the Extracts

  1. Enable supplemental logging:
    1. Launch the PostgreSQL GoldenGate deployment console:
      1. From the Deployments page, select the PostgreSQL deployment to view its details.
      2. On the PostgreSQL deployment details page, click Launch console.
      3. On the deployment console sign in page, enter the GoldenGate admin credentials provided in Task 1, step 1.
    2. After signing in, open the navigation menu, and then click Configuration.
    3. Click Connect. Checkpoint table and TRANDATA fields appear if the connection is successful.
    4. Next to TRANDATA Information, click Add TRANDATA (plus icon).
    5. For Table Name, enter src_ociggll.*;, and then click Submit.

      Note:

      You only need to click Submit once. Use the search field to search for src_ociggll and verify the tables were added.
  2. Add the Change Data Capture Extract:
    1. From the navigation menu, click Overview.
    2. On the Administration Service page, click Add Extract (plus icon).
    3. On the Extract Type page, select Change Data Capture Extract, and then click Next.
    4. Complete the Extract Options as follows, and then click Next:
      1. For Process Name, enter a name for the Extract, such as ECDCPSQL.
      2. For Credential Domain, select Oracle GoldenGate.
      3. For Credential Alias, select the alias.
      4. For Begin, select Now.
      5. For Trail Name, enter a two-character trail name, such as P1.
    5. On the Extract Parameters page, add the following:
      TABLE SRC_OCIGGLL.*;
    6. Click Create and Run.
  3. Add the Initial Load Extract:
    1. On the Administration Service Overview page, click Add Extract (plus icon).
    2. On the Extract Type page, select Initial Load Extract, and then click Next.
    3. Complete the Extract Options as follows, and then click Next:
      1. For Process Name, enter a name for the Extract, such as EINIPSQL.
      2. For Credential Domain, select Oracle GoldenGate.
      3. For Credential Alias, select the alias.
      4. For Trail Name, enter a two-character trail name, such as I1.
    4. In the Extract Parameters text area, add the following:
      EXTRACT EINIPSQL
      USERIDALIAS PostgreSQL_Compute, DOMAIN OracleGoldenGate
      EXTFILE I1, PURGE
      TABLE src_ociggll.*;

      Note:

      Ensure that you remove the SOURCEDB parameter in front of USERIDALIAS before you move on.
    5. Click Create and Run.

      You return to the Administration Service Overview page, where you can observe the Extract starting.

Task 3: Create the Distribution Path for Initial Load Extract

To create a Distribution Path for Initial Load Extract, complete the following:
  1. In the Oracle Cloud console, on the Deployments page, select the target Big Data deployment.
  2. On the deployment details page, click Launch Console. Log in with the admin user details created in task 1, step 2.
  3. Create a user for the Distribution Path.
    1. Open the navigation menu, and then click Administrator.
    2. Click Add New User (plus icon), complete the fields as follows, and then click Submit:
      • For Username, enter ggsnet.
      • For Role, select Operator.
      • Enter the password twice for verification.
  4. In the source PostgreSQL deployment console, create a credential for the user created in the previous step.
    1. Open the navigaton menu, and then select Configuration.
    2. Click Add Credential (plus icon), complete the fields as follows, and then click Submit:
      • For Credential Domain, enter GGSNetwork.
      • For Credential Alias, enter dpuser.
      • For Database Name, you can enter any name and leave the Database Server and Port fields blank, or use the default values.
      • For User ID, enter ggsnet
      • For Password, enter the same password used in the previous step.
  5. In the source PostgreSQL deployment console, create a Distribution Path.
    1. Click Distribution Service, and then click Add Path (plus icon).
    2. Complete the following fields, and click Create and Run:
      1. For Path Name, enter a name for this path.
      2. For Source Trail, leave blank.
      3. For Trail Name, enter the Initial Load Extract trail name (I1).
      4. For Target Authentication Method, select UserID Alias.
      5. For Target, select wss.
      6. For Target Host, enter the target OCI GoldenGate deployment console URL, without the https:// or any trailing slashes.
      7. For Port Number, enter 443.
      8. For Trail Name, enter I1.
      9. For Domain, enter the domain name created in the previous step.
      10. For Alias, enter the alias created in the previous step.

      You return to the Distribution Service Overview page where you can review the path created.

  6. In the target Big Data deployment console, review the Receiver path created as a result of the Distribution path:
    1. Click Receiver Service.
    2. Review the Receiver path details.

Task 4: Add the Replicat for Initial Load

  1. In the target Big Data deployment console, add a checkpoint table.
    1. Open the navigation menu, and then select Configuration.
    2. Click the connect icon for the target Snowflake database.
    3. Click Add Checkpoint (plus icon).
    4. For Checkpoint Table, enter SRCMIRROR_OCIGGLL.CHECKTABLE.
    5. Click Submit.
  2. Add the Initial Load Replicat.
    1. In the navigation menu, click Overview, and then click Add Replicat (plus icon).
    2. On the Add Replicat page, under Replicat type, select either Classic or Coordinated, and then click Next.
    3. On the Replicat Options page, complete the following form fields, and then click Next:
      1. For Process Name, enter a name, such as RIL.
      2. For Credential Domain, select the domain for the Snowflake connection.
      3. For Credential Alias, select the alias of the Snowflake connection.
      4. For Trail Name, enter the name of the Trail from Task 2 (I1).
      5. For Target, select the target Snowflake connection from the dropdown.
      6. For Available aliases, select an alias from the dropdown, such as Snowflake.
      7. (Optional) Enable external storage to select an available staging location from the dropdown.
    4. On the Replicat Parameters page, add the following mapping, and then click Next:
      INSERTALLRECORDS
      MAP src_ociggll.src_city, TARGET SRCMIRROR_OCIGGLL.SRC_CITY;
      MAP src_ociggll.src_region, TARGET SRCMIRROR_OCIGGLL.SRC_REGION;
      MAP src_ociggll.src_customer, TARGET SRCMIRROR_OCIGGLL.SRC_CUSTOMER;
      MAP src_ociggll.src_orders, TARGET SRCMIRROR_OCIGGLL.SRC_ORDERS;
      MAP src_ociggll.src_order_lines, TARGET SRCMIRROR_OCIGGLL.SRC_ORDER_LINES;
      MAP src_ociggll.src_product, TARGET SRCMIRROR_OCIGGLL.SRC_PRODUCT;
    5. On the Properties page, review the properties, and then click Create and Run.

      You return to the Overview page, where you can review the Replicat details.

  3. To verify the Initial Load, connect to Snowflake database and run following queries:
    select * from SRCMIRROR_OCIGGLL.SRC_CITY;
    select * from SRCMIRROR_OCIGGLL.SRC_CUSTOMER;

    The output should return the data that was loaded into the target database tables as a result of the Initial Load.

Task 5: Create the Distribution Path for Change Data Capture

To create a Distribution Path for Change Data Capture, complete the following:
  1. In the source PostgreSQL deployment console, click Distribution Service.
  2. On the Paths page, click Add Path.
  3. On the Add Path page, complete the following fields, and then click Create and Run:
    1. For Path Name, enter a name.
    2. For Source Extract, select the Change Data Capture Extract (ECDCPSQL).
    3. For Trail Name, select the Change Data Capture Extract trail file (P1).
    4. For Target Authentication Method, select UserID Alias.
    5. For Target, select wss.
    6. For Target Host, enter the target OCI GoldenGate deployment console URL, without the https:// or any trailing slashes.
    7. For Port Number, enter 443.
    8. For Trail Name, enter P1.
    9. For Domain, enter the domain name created in task 3.
    10. For Alias, enter the alias created in task 3.
  4. In the target Big Data deployment console, click Receiver Service, and then review the Receiver path created.

Task 6: Add a Replicat for Change Data Capture

Perform updates to the source PostgreSQL database to verify replication to Snowflake.
  1. Add the Change Data Replicat.
    1. In the target Big Data deployment console, click Administrator Service, and then click Add Replicat (plus icon).
    2. On the Add Replicat page, under Replicat type, select either Classic or Coordinated, and then click Next.
    3. On the Replicat Options page, complete the following form fields, and then click Next:
      1. For Process Name, enter a name, such as RCDC.
      2. For Credential Domain, select the domain for the Snowflake connection.
      3. For Credential Alias, select the alias of the Snowflake connection.
      4. For Trail Name, enter the name of the Trail from Task 2 (P1).
      5. For Target, select the target Snowflake connection from the dropdown.
      6. For Available aliases, select an alias from the dropdown, such as Snowflake.
      7. (Optional) Enable external storage to select an available staging location from the dropdown.
    4. On the Replicat Parameters page, add the following mapping, and then click Next:
      MAP src_ociggll.src_city, TARGET SRCMIRROR_OCIGGLL.SRC_CITY;
      MAP src_ociggll.src_region, TARGET SRCMIRROR_OCIGGLL.SRC_REGION;
      MAP src_ociggll.src_customer, TARGET SRCMIRROR_OCIGGLL.SRC_CUSTOMER;
      MAP src_ociggll.src_orders, TARGET SRCMIRROR_OCIGGLL.SRC_ORDERS;
      MAP src_ociggll.src_order_lines, TARGET SRCMIRROR_OCIGGLL.SRC_ORDER_LINES;
      MAP src_ociggll.src_product, TARGET SRCMIRROR_OCIGGLL.SRC_PRODUCT;
    5. On the Properties page, review the properties, and then click Create and Run.

      You return to the Overview page, where you can review the Replicat details.

  2. Verify the Change Data Capture:
    1. Perform updates to the source PostgreSQL database to verify replication to Snowflake. Run the following script to perform inserts into the PostgreSQL database:
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1000,'Houston',20,743113);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1001,'Dallas',20,822416);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1002,'San Francisco',21,157574);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1003,'Los Angeles',21,743878);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1004,'San Diego',21,840689);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1005,'Chicago',23,616472);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1006,'Memphis',23,580075);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1007,'New York City',22,124434);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1008,'Boston',22,275581);
      Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1009,'Washington D.C.',22,688002);
    2. In the source PostgreSQL deployment console, select the Change Data Capture Extract name (RCDC), and then click Statistics. Verify that src_ociggll.src_city has 10 inserts.

      Note:

      If the Extract captured no inserts, then restart the ECDCPSQL Extract.
    3. In the target Big Data deployment console, select the Change Data Capture Replicat name (RCDC), review its Details and Statistics to verify the number of Inserts.

Task 7: Monitor and maintain processes

  1. Monitor performance.
  2. Manage Trail files.