Replicate data from PostgreSQL to Google BigQuery

Learn how to use OCI GoldenGate to replicate data from PostgreSQL to Google BigQuery.

Before you begin

To successfully complete this quickstart, you must have the following:

To successfully complete this quickstart, you must have the following:

Environment set up: PostgreSQL

To set up the environment for this Quickstart:

  1. Run the following commands to install PostgreSQL.

    1. Install PostgreSQL server:

      sudo yum install postgresql-server
    2. Install postgresql-contrib module to avoid this SQL exception:

      sudo yum install postgresql-contrib
    3. Create a new PostgreSQL database cluster:

      sudo postgresql-setup --initdb
    4. Enable the postgresql.service:

      sudo systemctl enable postgresql.service
    5. Start the postgresql.service:

      sudo systemctl start postgresql.service
  2. By default, PostgreSQL only allows local connections. Allow remote connectivity to PostgreSQL.

    1. In /var/lib/pgsql/data/postgresql.conf, prepare the database for replication.

    2. Locate and uncomment listen_addresses = 'localhost' and change localhost to an asterisk (`):

      listen_addresses = '*'
    3. Set the following parameters as follows:

      • wal_level = logical

      • max_replication_slots = 1

      • max_wal_senders = 1

      • track_commit_timestamp = on

      Note:

      Configure /var/lib/pgsql/data/pg_hba.conf to ensure that client authentication is set to allow connections from an Oracle GoldenGate host. For example, add the following:

      #Allow connections from remote hosts
      
      host all all 0.0.0.0/0 md5

      See The pg_hba.conf File for more information.

      1. Restart PostgreSQL server:
      sudo systemctl restart postgresql.service
      1. If using Oracle Cloud Compute to host PostgreSQL, open port 5432:
      sudo firewall-cmd --permanent --add-port=5432/tcp
      sudo firewall-cmd --reload
      sudo firewall-cmd --list-all
    4. Open port 5432 in your VCN's security list.

    5. Connect to PostgreSQL.

      > sudo su - postgres
      > psql

      Note: Alternatively, you can enter sudo su - postgres psql if the above example doesn’t work.

    6. Set up PostgreSQL.

      1. Download and run seedSRCOCIGGLL_PostgreSQL.sql to set up the database and load the sample data.

      2. Run the following commands to set up the user (ensure you replace <password> with an actual password):

      create user ggadmin with password '<password>';
      alter user ggadmin with SUPERUSER;
      GRANT ALL PRIVILEGES ON DATABASE ociggll TO ggadmin;

Task 1: Create the OCI GoldenGate resources

  1. Create a deployment for the source PostgreSQL database.

  2. Create a Big Data deployment for the target Google BigQuery.

  3. Create a connection to the to the target Google BigQuery.

  4. Create a connection to the source PostgreSQL database.

    1. For Type, ensure that you select PostgreSQL Server.

    2. For Database name, enter ociggll.

    3. For Host, enter the public IP of the Compute instance that PostgreSQL runs on.

    4. For Port, enter 5432.

    5. For Username, enter ggadmin.

    6. For Password, enter a password.

    7. For Security Protocol, select Plain.

  5. Create a connection to GoldenGate, and then assign this connection to the source PostgreSQL deployment.

  6. Assign the source connection to the source PostgreSQL deployment..

  7. Assign the target connection to the target Big Data deployment.

Task 2: Enable supplemental logging

To enable supplemental logging:

  1. Launch the PostgreSQL GoldenGate deployment console:

    1. From the Deployments page, select the PostgreSQL deployment to view its details.

    2. On the PostgreSQL deployment details page, select Launch console.

    3. On the deployment console sign in page, enter the GoldenGate admin credentials provided in Task 1, step 1.

      Note: Sign in is required if IAM wasn’t selected as the credential store when creating a deployment.

  2. In the deployment console, select DB Connections in the left navigation, then the source PostgreSQL database, and then Trandata.

  3. On the Trandata page, next to TRANDATA Information, select Add Trandata (plus icon).

  4. In the Trandata panel, for Schema Name, enter src_ociggll, and then select Submit.

    Note: Use the search field to search for src_ociggll and verify the tables were added.

Task 3: Create the Extract

To add the Change Data Capture Extract:

  1. In the left navigation, select Extracts.

  2. On the Extracts page, select Add Extract (plus icon), and then complete the fields as follows:

    • On the Extract Information page:

      1. For Extract type, select Change Data Capture Extract.

      2. For Process Name, enter a name for the Extract, such as ECDC.

      3. Select Next.

    • On the Extract Options page:

      1. For Source credentials, select Oracle GoldenGate from the Domain dropdown

      2. Select the source PostgreSQL database from the Alias dropdown.

      3. For Extract Trail Name, enter a two-character trail name, such as C1.

      4. Select Next.

    • On the Extract Parameters page, replace MAP *.*, TARGET *.*; with the following:

      TABLE SRC_OCIGGLL.*;
  3. Select Create and Run.

You're returned to the Extracts page, where you can observe the Extracts starting.

Task 4: Create the Distribution Path for Change Data Capture

To create a Distribution Path for Change Data Capture, complete the following:

  1. In the Oracle Cloud console, on the Deployments page, select the target Big Data deployment.

  2. Select Add Distribution Path.

  3. Complete the Add Path form as follows:

    1. On the Path Information page:

      1. For Path Name, enter a name.

      2. Select Next.

    2. On the Source Options page:

      1. For Source Extract, select the Change Data Capture Extract (ECDC).

      2. For Trail Name, select the Change Data Capture trail file (C1).

      3. Select Next.

    3. On the Target Options page:

      1. For Target, select wss.

      2. For Target Host, enter the target deployment console URL (you can find this on the deployment details page, without the https:// or any trailing slashes).

      3. For Port Number, enter 443.

      4. For Trail Name, enter C1.

      5. For Target Authentication Method, select UserID Alias.

      6. For Domain, enter the domain name.

      7. For Alias, enter the alias.

    4. Select Create Path and Run.

Task 5: Add a Replicat

  1. In the target Big Data deployment console navigation menu, select Replicats, and then select Add Replicat (plus icon).

  2. On the Replicats page, select Add Replicat (plus icon), and then complete the Add Replicat form as follows:

    1. On the Replicat Information page:

      1. For Replicat type, select Parallel or Coordinated Replicat.

      2. For Process Name, enter a name, such as RCDC.

      3. Select Next.

    2. On the Replicat Options page:

      1. For Replicat Trail Name, enter the name of the Trail from Task 3 (C1).

      2. For Target Credentials, select the Domain and Alias for the Google Big Query connection.

      3. For Available staging locations, select Google Cloud Storage from the dropdown.

      4. For via staging alias, select Google Cloud Storage connection from the dropdown.

    3. On the Parameter File page, add the following mapping, and then select Next:

      MAP *.*, TARGET *.*;
    4. On the Properties File page, configure the required properties as needed. Look for the ones marked as #TODO, and then select Next.

      Some properties to consider modifying include:

      • gg.eventhandler.gcs.bucketMappingTemplate: provide the name of the bucket that will be used as staging storage
  3. Select Create and Run.

You return to the Replicats page, where you can review the Replicat details.

Task 6: Verify Change Data Capture

Perform updates to the source PostgreSQL database to verify replication to Google BigQuery.

  1. Run the following script to perform inserts into the PostgreSQL database:

    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1000,'Houston',20,743113);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1001,'Dallas',20,822416);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1002,'San Francisco',21,157574);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1003,'Los Angeles',21,743878);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1004,'San Diego',21,840689);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1005,'Chicago',23,616472);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1006,'Memphis',23,580075);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1007,'New York City',22,124434);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1008,'Boston',22,275581);
    Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1009,'Washington D.C.',22,688002);
  2. In the source PostgreSQL deployment console, select the Extract name (ECDC), and then select Statistics. Verify that src_ociggll.src_city has 10 inserts.

    Note: If the Extract captured no inserts, then restart the ECDC Extract.

  3. In the target Big Data deployment console, select the Replicat name, view its Details, and check Statistics to verify the number of inserts.