Replicate data to Stream analytics

Learn to replicate data from OCI GoldenGate into Stream Analytics.

GoldenGate Stream Analytics began as a Complex Event Processing engine that evolved to run atop of runtime frameworks with Apache Spark and Apache Kafka. Stream Analytics can ingest a stream of data from any source such as a database, GoldenGate, Kafka, JMS, REST, or even a file system file. After the data is ingested, you can run analytics on live data.

Before you begin

To successfully complete this quickstart you need:

A source Autonomous AI Database with sample data loaded, and supplemental logging enabled.
Tip:

You can download the OCI GoldenGate sample data, if you need sample data to work with.
- First, edit SETUP_USERS_ATP.sql and modify the SRC_OCIGGLL user's password to remove the special characters.
- Use the Autonomous AI Database's Database actions SQL tool to run the two scripts to create the user schema and tables.
- Use the SQL tool to enable supplemental logging.
Follow the steps in Lab 1, Task 3: Load the ATP schema for more details.
Unlock the GGADMIN user on the source Autonomous AI Database instance
1. On the Autonomous AI Database Details page, select Database Users from the Database actions menu.
  
  Tip: Use the Autonomous AI Database administrator credentials provided when you created the instance to log in, if prompted.
2. Locate the GGADMIN user, and then select Edit from its ellipsis (three dots) menu.
3. In the Edit User panel, enter a password, confirm that password, and then deselect Account is Locked.
4. Select Apply Changes.

Task 1: Create the OCI GoldenGate resources

Create the OCI GoldenGate deployment for Data replication.
Create a connection for the source database.
Assign the connection to the deployment.
Create and run an Extract.

Task 2: Create the Stream Analytics resources

Create the Stream Analytics deployment.
Create a Kafka connection using the Kafka instance's public IP, and select Plaintext for Security protocol.
Create a GoldenGate connection.
Assign the connections to the Stream Analytics deployment.

Task 3: Create and run the pipelines

Launch the Stream Analytics deployment console.
Review the connections in the Stream Analytics deployment console.
1. In the Stream Analytics deployment console, select Catalog.
2. On the Catalog page, review the list of connections. You should see the GoldenGate connection, the Autonomous AI Database connection, and the Kafka connection.
Start the GoldenGate Big Data cluster:
1. In the OCI GoldenGate Stream Analytics deployment console, select System settings from the ossaadmin user menu.
2. In the System Setting dialog, select Manage Clusters, and then expand GGDB Cluster.
3. Select Start Cluster. Wait until the cluster status is Running, and then close the dialog window.
Update the GoldenGate connection credentials:

Although the GoldenGate connection is available in the Stream Analytics deployment console, the GoldenGate credentials don't carry over. Update the password and test the connection.
1. Select Catalog, and then select the GoldenGate connection.
2. In the Edit Connection dialog, select Next.
3. For GG Username, enter oggadmin.
4. For GG Password, select Change password, and then enter the password provided when you created the OCI GoldenGate deployment for Data Replication in Task 1.
5. Select Test connection. If successful, select Save.
Use the GoldenGate Extract to create and start GoldenGate Change Data.

Ensure that you use the Extract details provided in Task 1 on the GG Change Data Details page.
Update the Autonomous AI Database user name.

Database connections are created with the default user, ggadmin. Update the username to SRC_OCIGGLL (if you used the sample data provided) to access its schema and tables.
1. Select Catalog, and then select the Autonomous AI Database connection.
2. In the Edit Connection dialog, select Next.
3. For Username, enter SRC_OCIGGLL.
4. For Password, enter the SRC_OCIGGLLpassword you modified in the Before you begin steps at the start of this quickstart.
5. Select Test connection. If successful, select Save.
Use the Autonomous AI Database lookup tables to create References for Customers and Orders.
Use the Kafka connection to create Kafka Streams for Customers and Orders.

Use the Autonomous AI Database SQL tool to perform inserts on the source database.

For example, you can run the following inserts:

Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (11,'COM',101,to_date('16-AUG-2023','DD-MON-YYYY'),null);
Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (12,'COM',102,to_date('16-AUG-2023','DD-MON-YYYY'),null);
Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (13,'COM',103,to_date('16-AUG-2023','DD-MON-YYYY'),null);
Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (14,'COM',104,to_date('16-AUG-2023','DD-MON-YYYY'),null);
Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (15,'COM',105,to_date('16-AUG-2023','DD-MON-YYYY'),null);
Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (16,'COM',106,to_date('16-AUG-2023','DD-MON-YYYY'),null);
Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (17,'COM',107,to_date('16-AUG-2023','DD-MON-YYYY'),null);
Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (18,'COM',201,to_date('16-AUG-2023','DD-MON-YYYY'),null);
Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (19,'COM',202,to_date('16-AUG-2023','DD-MON-YYYY'),null);

Create a pipeline that uses the Kafka stream created in Step 8.
Add a Query stage, and then add a Filter, to return only orders where the CUST_ID of the Orders stream match the CUSTID of the Customers stream.
Add target stage.
Publish the pipeline.