Replicate data from PostgreSQL to Snowflake
Learn to replicate data from PostgreSQL to Snowflake using OCI GoldenGate.
Before you begin
To successfully complete this quickstart, you must have the following:
-
A PostgreSQL installation to serve as the source database (See Environment setup)
-
Open port 5432 in your VCN's security list
-
A Snowflake database to serve as the target database.
Environment set up: PostgreSQL
To set up the environment for this Quickstart:
-
Run the following commands to install PostgreSQL.
-
Install PostgreSQL server:
sudo yum install postgresql-server -
Install postgresql-contrib module to avoid this SQL exception:
sudo yum install postgresql-contrib -
Create a new PostgreSQL database cluster:
sudo postgresql-setup --initdb -
Enable the postgresql.service:
sudo systemctl enable postgresql.service -
Start the postgresql.service:
sudo systemctl start postgresql.service
-
-
By default, PostgreSQL only allows local connections. Allow remote connectivity to PostgreSQL.
-
In
/var/lib/pgsql/data/postgresql.conf, prepare the database for replication. -
Locate and uncomment
listen_addresses = 'localhost'and change localhost to an asterisk (`):listen_addresses = '*' -
Set the following parameters as follows:
-
wal_level = logical -
max_replication_slots = 1 -
max_wal_senders = 1 -
track_commit_timestamp = on
Note:
Configure
/var/lib/pgsql/data/pg_hba.confto ensure that client authentication is set to allow connections from an Oracle GoldenGate host. For example, add the following:#Allow connections from remote hosts host all all 0.0.0.0/0 md5See The pg_hba.conf File for more information.
- Restart PostgreSQL server:
sudo systemctl restart postgresql.service- If using Oracle Cloud Compute to host PostgreSQL, open port 5432:
sudo firewall-cmd --permanent --add-port=5432/tcp sudo firewall-cmd --reload sudo firewall-cmd --list-all -
-
Open port 5432 in your VCN's security list.
-
> sudo su - postgres > psqlNote: Alternatively, you can enter
sudo su - postgres psqlif the above example doesn’t work. -
Set up PostgreSQL.
-
Download and run seedSRCOCIGGLL_PostgreSQL.sql to set up the database and load the sample data.
-
Run the following commands to set up the user (ensure you replace
<password>with an actual password):
create user ggadmin with password '<password>'; alter user ggadmin with SUPERUSER; GRANT ALL PRIVILEGES ON DATABASE ociggll TO ggadmin; -
-
Environment set up: Snowflake
-
Create a GoldenGate user in Snowflake with appropriate privileges.
-
Create target tables using sample schema.
Ensure the tables and user have been successfully created.
Task 1: Create the OCI GoldenGate resources
This quickstart example requires deployments and connections for both the source and target.
-
Create a deployment for the source PostgreSQL database.
-
Create a Big Data deployment for the target Snowflake database.
-
Create a PostgreSQL connection with the following values:
-
For Type, select PostgreSQL Server from the dropdown.
-
For Database name, enter
ociggll. -
For Host, enter the public IP of the Compute instance that PostgreSQL runs on.
-
For Port, enter
5432. -
For Username, enter
ggadmin. -
For Password, enter your password.
-
For Security Protocol, select Plain from the dropdown.
-
-
Create a Snowflake connection with the following values:
-
For Connection URL, enter
jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?warehouse=<warehouse name>&db=OCIGGLL.Note: Ensure you replace
<account_identifier>and<warehouse name>with the appropriate values. -
For Authentication Type, select Basic authentication from the dropdown.
-
For Username, enter a name.
-
For Password, enter a password.
-
-
(Optional) If your Big Data deployment doesn't have a public endpoint, then create a connection to GoldenGate, and then assign this connection to the source PostgreSQL deployment.
-
Assign the source PostgreSQL connection to the PostgreSQL deployment.
-
Assign the Snowflake connection to the target Big Data deployment.
Task 2: Enable supplemental logging
To enable supplemental logging:
-
Launch the PostgreSQL GoldenGate deployment console:
-
From the Deployments page, select the PostgreSQL deployment to view its details.
-
On the PostgreSQL deployment details page, select Launch console.
-
On the deployment console sign in page, enter the GoldenGate admin credentials provided in Task 1, step 1.
Note: Sign in is required if IAM wasn’t selected as the credential store when creating a deployment.
-
-
In the deployment console, select DB Connections in the left navigation, then the source PostgreSQL database, and then Trandata.
-
On the Trandata page, next to TRANDATA Information, select Add Trandata (plus icon).
-
In the Trandata panel, for Schema Name, enter
src_ociggll, and then select Submit.Note: Use the search field to search for
src_ocigglland verify the tables were added.
Task 3: Create the Extracts
-
Add the Change Data Capture Extract:
-
In the left navigation, select Extracts,
-
On the Extracts page, select Add Extract (plus icon), and then complete the fields as follows:
-
On the Extract Information page:
-
For Extract type, select Change Data Capture Extract.
-
For Process Name, enter a name for the Extract, such as
ECDC. -
Select Next.
-
-
On the Extract Options page:
-
For Source credentials, select Oracle GoldenGate from the Domain dropdown
-
Select the source PostgreSQL database from the Alias dropdown.
-
For Extract Trail Name, enter a two-character trail name, such as
C1. -
Select Next.
-
-
On the Extract Parameters page, replace
MAP *.*, TARGET *.*;with the following:TABLE SRC_OCIGGLL.*;
-
-
Select Create and Run.
-
-
Add the Initial Load Extract:
-
On the Extracts page, select Add Extract, and then complete the Add Extract form as follows:
-
On the Extract Information page:
-
For Extract type, select Initial Load Extract.
-
For Process Name, enter a name, such as
EIL. -
Select Next.
-
-
On the Extract Options page:
-
For Source credentials, select Oracle GoldenGate from the Domain dropdown.
-
Select the PostgreSQL database from the Alias, dropdown.
-
For Extract Trail Name, enter a two-character trail name, such as
I1. -
Select Next.
-
-
On the Extract Parameters page, replace the contents of the textarea with the following:
EXTRACT EIL USERIDALIAS PostgreSQL_Compute, DOMAIN OracleGoldenGate EXTFILE I1, PURGE TABLE src_ociggll.*;Note: Ensure that you remove the
SOURCEDBparameter in front ofUSERIDALIASbefore you move on.
-
-
Select Create and Run.
-
You return to the Extracts page, where you can observe the Extract starting.
Task 4: Create the Distribution Path for Initial Load Extract
To create a Distribution Path for Initial Load Extract:
-
In the Oracle Cloud console, on the Deployments page, select the target Big Data deployment.
-
On the deployment details page, select Launch Console, and then log in as the admin user.
-
If using IAM credential store, proceed to the Create a Distribution Path step. If using GoldenGate credential store, create a user with which the source GoldenGate uses to connect to the target GoldenGate.
-
In the navigation menu, select User Administration.
-
Select Add New User (plus icon), complete the fields as follows, and then select Submit:
-
For Username, enter
ggsnet. -
For Role, select Operator.
-
Enter the password twice for verification.
-
-
-
In the source PostgreSQL deployment console, create a Path Connection for the user created in the previous step.
-
In the navigation menu, select Path Connections.
-
Select Add Path Connection (plus icon), complete the fields as follows, and then select Submit:
-
For Credential Alias, enter
dpuser. -
For User ID, enter
ggsnet -
For Password, enter the same password used in the previous step.
-
-
-
Create a Distribution Path.
-
In the service menu bar, select Paths, and then select Add Distribution Path (plus icon).
-
Complete the Add Path form as follows:
-
On the Path Information page:
-
For Path Name, enter a name for this path.
-
Select Next.
-
-
On the Source Options page:
-
For Source Extract, leave blank.
-
For Trail Name, enter the Initial Load Extract trail name (
I1). -
Select Next.
-
-
On the Target Options page:
-
For Target, select wss.
-
For Target Host, enter the target deployment URL, without the https:// or any trailing slashes.
-
For Port Number, enter
443. -
For Trail Name, enter
I1. -
For Target Authentication Method, select UserID Alias.
-
For Domain, enter the domain name created in the previous step.
-
For Alias, enter the alias created in the previous step (
dpuser). -
Select Next.
-
-
-
Select Create and Run.
You return to the Distribution Service page where you can review the path created.
-
-
In the target deployment console, review the Receiver path created as a result of the Distribution path:
-
Select Paths.
-
Review the Path details.
-
Task 5: Add the Replicat for Initial Load
-
In the target Big Data deployment console, add the Initial Load Replicat.
-
In the navigation menu, select Replicats, and then select Add Replicat (plus icon).
-
On the Replicats page, then complete the Add Replicat fields as follows:
-
On the Replication Information page:
-
For Replicat type, select Classic or Coordinated.
-
For Process Name, enter a name, such as
RIL. -
Select Next.
-
-
On the Replicat Options page:
-
For Replicat Trail Name, enter the name of the Trail from Task 2 (
I1). -
For Target Credentials, select the Domain and Alias for the Snowflake connection.
-
For Available aliases, select an alias from the dropdown, such as Snowflake.
-
(Optional) Enable external storage to select an available staging location from the dropdown.
-
Select Next.
-
-
On the Parameter File page, add the following mapping:
INSERTALLRECORDS MAP src_ociggll.src_city, TARGET SRCMIRROR_OCIGGLL.SRC_CITY; MAP src_ociggll.src_region, TARGET SRCMIRROR_OCIGGLL.SRC_REGION; MAP src_ociggll.src_customer, TARGET SRCMIRROR_OCIGGLL.SRC_CUSTOMER; MAP src_ociggll.src_orders, TARGET SRCMIRROR_OCIGGLL.SRC_ORDERS; MAP src_ociggll.src_order_lines, TARGET SRCMIRROR_OCIGGLL.SRC_ORDER_LINES; MAP src_ociggll.src_product, TARGET SRCMIRROR_OCIGGLL.SRC_PRODUCT; -
On the Properties page, review the properties, and then select Create and Run.
You return to the Replicats page, where you can review the Replicat details.
-
-
-
To verify the Initial Load, connect to Snowflake database and run following queries:
select * from SRCMIRROR_OCIGGLL.SRC_CITY; select * from SRCMIRROR_OCIGGLL.SRC_CUSTOMER;
The output should return the data that was loaded into the target database tables as a result of the Initial Load.
Task 6: Create the Distribution Path for Change Data Capture
-
In the source PostgreSQL deployment console, select Distribution Service.
-
Select Add Distribution Path.
-
Complete the Add Path form as follows:
-
On the Path Information page:
-
For Path Name, enter a name.
-
Select Next.
-
-
On the Source Options page:
-
For Source Extract, select the Change Data Capture Extract (
ECDC). -
For Trail Name, select the Change Data Capture trail file (
C1). -
Select Next.
-
-
On the Target Options page:
-
For Target, select wss.
-
For Target Host, enter the target deployment console URL (you can find this on the deployment details page, without the https:// or any trailing slashes).
-
For Port Number, enter
443. -
For Trail Name, enter
C1. -
For Target Authentication Method, select UserID Alias.
-
For Domain, enter the domain name.
-
For Alias, enter the alias.
-
-
Select Create Path and Run.
-
-
In the target Big Data deployment console, select Receiver Service, and then review the Receiver path created.
Task 7: Add a Replicat for Change Data Capture
Perform updates to the source PostgreSQL database to verify replication to Snowflake.
-
Add the Replicat.
-
In the target Big Data deployment console, select Administration Service, and then in the navigation menu, select Replicats.
-
On the Replicats page, select Add Replicat (plus icon), and then complete the Add Replicat form as follows:
-
On the Replicat Information page:
-
For Replicat type, select Classic or Coordinated.
-
For Process Name, enter a name, such as
RCDC. -
Select Next.
-
-
On the Replicat Options page:
-
For Replicat Trail Name, enter the name of the Trail from Task 3 (
C1). -
For Target Credentials, select the Domain and Alias for the Snowflake connection.
-
(Optional) Enable external storage to select an available staging location from the dropdown.
-
-
On the Parameter Files page, add the following mapping, and then select Next:
MAP src_ociggll.src_city, TARGET SRCMIRROR_OCIGGLL.SRC_CITY; MAP src_ociggll.src_region, TARGET SRCMIRROR_OCIGGLL.SRC_REGION; MAP src_ociggll.src_customer, TARGET SRCMIRROR_OCIGGLL.SRC_CUSTOMER; MAP src_ociggll.src_orders, TARGET SRCMIRROR_OCIGGLL.SRC_ORDERS; MAP src_ociggll.src_order_lines, TARGET SRCMIRROR_OCIGGLL.SRC_ORDER_LINES; MAP src_ociggll.src_product, TARGET SRCMIRROR_OCIGGLL.SRC_PRODUCT; -
On the Properties page, review the properties, and then select Create and Run.
You return to the Replicats page, where you can review the Replicat details.
-
-
-
Verify Change Data Capture:
-
Perform updates to the source PostgreSQL database to verify replication to Snowflake. Run the following script to perform inserts into the PostgreSQL database:
Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1000,'Houston',20,743113); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1001,'Dallas',20,822416); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1002,'San Francisco',21,157574); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1003,'Los Angeles',21,743878); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1004,'San Diego',21,840689); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1005,'Chicago',23,616472); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1006,'Memphis',23,580075); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1007,'New York City',22,124434); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1008,'Boston',22,275581); Insert into src_ociggll.src_city (CITY_ID,CITY,REGION_ID,POPULATION) values (1009,'Washington D.C.',22,688002);
-
-
In the source PostgreSQL deployment console, select the
RCDC, and then select Statistics. Verify thatsrc_ociggll.src_cityhas 10 inserts.Note: If the Extract captured no inserts, then restart the
ECDCExtract. -
In the target Big Data deployment console, select the
RCDC, review its Details and Statistics to verify the number of Inserts.