Replicate data to Stream analytics

Learn to replicate data from OCI GoldenGate into Stream Analytics.

GoldenGate Stream Analytics began as a Complex Event Processing engine that evolved to run atop of runtime frameworks with Apache Spark and Apache Kafka. Stream Analytics can ingest a stream of data from any source such as a database, GoldenGate, Kafka, JMS, REST, or even a file system file. After the data is ingested, you can run analytics on live data.

Before you begin

To successfully complete this quickstart you need:

  • A source Autonomous Database with sample data loaded, and supplemental logging enabled.

    Tip:

    You can download the OCI GoldenGate sample data, if you need sample data to work with.
    • First, edit SETUP_USERS_ATP.sql and modify the SRC_OCIGGLL user's password to remove the special characters.
    • Use the Autonomous Database's Database actions SQL tool to run the two scripts to create the user schema and tables.
    • Use the SQL tool to enable supplemental logging.
    Follow the steps in Lab 1, Task 3: Load the ATP schema for more details.
  • Unlock the GGADMIN user on the source Autonomous Database instance
    1. On the Autonomous Database Details page, select Database Users from the Database actions menu.

      Tip:

      Use the Autonomous Database administrator credentials provided when you created the instance to log in, if prompted.
    2. Locate the GGADMIN user, and then select Edit from its ellipsis (three dots) menu.
    3. In the Edit User panel, enter a password, confirm that password, and then deselect Account is Locked.
    4. Click Apply Changes.

Task 1: Create the OCI GoldenGate resources

  1. Create the OCI GoldenGate deployment for Data replication.
  2. Create a connection for the source database.
  3. Assign the connection to the deployment.
  4. Create and run an Extract.

Task 2: Create the Stream Analytics resources

  1. Create the Stream Analytics deployment.
  2. Create a Kafka connection using the Kafka instance's public IP, and select Plaintext for Security protocol.
  3. Create a GoldenGate connection.
  4. Assign the connections to the Stream Analytics deployment.

Task 3: Create and run the pipelines

  1. Launch the Stream Analytics deployment console.
  2. Review the connections in the Stream Analytics deployment console.
    1. In the Stream Analytics deployment console, click Catalog.
    2. On the Catalog page, review the list of connections. You should see the GoldenGate connection, the Autonomous Database connection, and the Kafka connection.
  3. Start the GoldenGate Big Data cluster:
    1. In the OCI GoldenGate Stream Analytics deployment console, select System settings from the ossaadmin user menu.
    2. In the System Setting dialog, click Manage Clusters, and then expand GGDB Cluster.
    3. Click Start Cluster. Wait until the cluster status is Running, and then close the dialog window.
  4. Update the GoldenGate connection credentials:

    Although the GoldenGate connection is available in the Stream Analytics deployment console, the GoldenGate credentials don't carry over. Update the password and test the connection.

    1. Click Catalog, and then click the GoldenGate connection.
    2. In the Edit Connection dialog, click Next.
    3. For GG Username, enter oggadmin.
    4. For GG Password, click Change password, and then enter the password provided when you created the OCI GoldenGate deployment for Data Replication in Task 1.
    5. Click Test connection. If successful, click Save.
  5. Use the GoldenGate Extract to create and start GoldenGate Change Data.

    Ensure that you use the Extract details provided in Task 1 on the GG Change Data Details page.

  6. Update the Autonomous Database user name.

    Database connections are created with the default user, ggadmin. Update the username to SRC_OCIGGLL (if you used the sample data provided) to access its schema and tables.

    1. Click Catalog, and then click the Autonomous Database connection.
    2. In the Edit Connection dialog, click Next.
    3. For Username, enter SRC_OCIGGLL.
    4. For Password, enter the SRC_OCIGGLLpassword you modified in the Before you begin steps at the start of this quickstart.
    5. Click Test connection. If successful, click Save.
  7. Use the Autonomous Database lookup tables to create References for Customers and Orders.
  8. Use the Kafka connection to create Kafka Streams for Customers and Orders.
  9. Use the Autonomous Database SQL tool to perform inserts on the source database.
    For example, you can run the following inserts:
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (11,'COM',101,to_date('16-AUG-2023','DD-MON-YYYY'),null);
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (12,'COM',102,to_date('16-AUG-2023','DD-MON-YYYY'),null);
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (13,'COM',103,to_date('16-AUG-2023','DD-MON-YYYY'),null);
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (14,'COM',104,to_date('16-AUG-2023','DD-MON-YYYY'),null);
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (15,'COM',105,to_date('16-AUG-2023','DD-MON-YYYY'),null);
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (16,'COM',106,to_date('16-AUG-2023','DD-MON-YYYY'),null);
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (17,'COM',107,to_date('16-AUG-2023','DD-MON-YYYY'),null);
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (18,'COM',201,to_date('16-AUG-2023','DD-MON-YYYY'),null);
    Insert into SRC_OCIGGLL.SRC_ORDERS (ORDER_ID,STATUS,CUST_ID,ORDER_DATE,CUSTOMER) values (19,'COM',202,to_date('16-AUG-2023','DD-MON-YYYY'),null);
  10. Create a pipeline that uses the Kafka stream created in Step 8.
  11. Add a Query stage, and then add a Filter, to return only orders where the CUST_ID of the Orders stream match the CUSTID of the Customers stream.
  12. Add target stage.
  13. Publish the pipeline.