Before you Begin

This 10-minute tutorial shows you how to implement incremental processing in a data flow with a dataset created from a connection.

Background

You can use incremental processing in your data flow to add the latest data available from the connected data source to your dataset. When your data flow runs on a schedule, incremental processing enables updating the dataset between scheduled runs. In this tutorial, you learn how to specify a new data indicator column in the dataset to enable incremental processing and how to set parameters in the data flow to update the dataset.

Incremental processing is only available with datasets created from a connection.

What Do You Need?

  • Access to Oracle Analytics
  • Ability to connect a relational data source such as Oracle Autonomous Data Warehouse or Oracle Database
  • Access to the Oracle sample SH schema to perform the steps in this tutorial, see Installing Sample Schemas

Create a Connection

This tutorial uses an Oracle Database connection to an instance with the SH schema. In this section, use these steps to create a connection to the data source.

This tutorial uses the Basic connection type for the non-clustered single node database. Use the Advanced connection type with multi-node database clusters that have multiple host names and ports.

If you already have a connection, you can skip to the next section.

  1. Sign in to Oracle Analytics.
  2. On the Home page, click Create, and then click Connection.
  3. In Create Connection - Select Connection Type, click your database connection type.

    This example uses an Oracle Database connection type. Your connection variables depend on the selected database connection type.

  4. In Create Connection when using an Oracle Database, enter a Connection Name and select Basic as the connection type.
  5. Enter the values for these fields:
    • Host
    • Port
    • Service Name
    • User Name
    • Password
  6. Click Save.

Create a Dataset

In this section, you create a dataset from the connection. In the next section, you use the dataset in a data flow.

  1. On the Home page, click Create, select Dataset, and then click the database connection containing the SH schema.
  2. In Connections Connections icon, expand Schemas, expand the SH schema.
  3. Hold down the Ctrl key and click the CUSTOMERS, PRODUCTS, SALES, and TIMES tables. Drag them to the Join Diagram.


    Oracle Analytics automatically creates the joins using the relationships defined in the schema.

    Description of data_set_tables.png follows
    Description of the illustration data_set_tables.png
  4. Click Save Save icon. In Save Dataset As, enter Customer Sales in Name, and then click OK.

Edit Table Definitions

In this section, you remove columns that aren't needed from the tables in the dataset. The CUSTOMERS table contains 23 data elements. The PRODUCTS table contains 22 data elements. You don't need all these columns in your dataset.

  1. Click the CUSTOMERS table tab. In the CUSTOMERS table use the horizontal scroll bar to view the columns.
  2. Click Edit Definition.
  3. In Edit Definition, click Remove All. Hold down the Ctrl key and select the following:
    • CUST_ID
    • CUST_CITY
    • CUST_FIRST_NAME
    • CUST_LAST_NAME
    • CUST_GENDER
    • CUST_POSTAL_CODE
    • CUST_STATE_PROVINCE
    • CUST_STREET_ADDRESS
  4. Click Add Selected, and then click OK.


    Description of customers_edit_def.png follows
    Description of the illustration customers_edit_def.png
  5. Click Save Save icon.
  6. Click the PRODUCTS table tab. Use the horizontal scroll bar to view the columns.
  7. Click Edit Definition.
  8. In Edit Definition, click Remove All. Hold down the Ctrl key and select the following:
    • PROD_ID
    • PROD_CATEGORY
    • PROD_NAME
    • PROD_SUBCATEGORY
  9. Click Add Selected, and then click OK.


    Description of prod_edit_def.png follows
    Description of the illustration prod_edit_def.png
  10. Click Save Save icon.

Specify New Data Indicator

In this section, you set the new data indicator property to update the dataset. In this example, when a sale occurs the transaction is listed with a time ID, making it a good new data indicator.

  1. Click the SALES table tab.
  2. In SALES table, click Edit Definition. Click Expand Expand icon.
  3. In Data Access, keep Live as the value.
  4. Expand Advanced. From Flow New Data Indicator List icon, select TIME_ID, and then click OK.


    Description of new_data_indicator.png follows
    Description of the illustration new_data_indicator.png
  5. Click Save Save icon. Click Go back Go back icon.

Create a Data Flow

In this section, you create a data flow with the Customer Sales dataset.

  1. On the Home page, click Create, and then select Data Flow.
  2. In Add Dataset, click Customer Sales, and then click Add.
  3. In Add Data - Customer Sales, click Folder. In the Select All message, click Yes.


    Description of cust_sales_node.png follows
    Description of the illustration cust_sales_node.png
  4. On the Customer Sales node, click Add a step Add a step icon, and then click Filter.
  5. In Filter, click Add Filter. From the Available data list, click PROD_CATEGORY. From the PROD_CATEGORY list, click Electronics.


    Description of filter_node.png follows
    Description of the illustration filter_node.png
  6. In the data flow, click Add a step Add a step icon on the Filter node. Select Save Data.
  7. In Save Dataset, enter Electronics Sales.
  8. From the Save data to list, select Database Connection. Click Database Connection. Next to Connection, click Select connection, and then choose the database with the table to update.
  9. In Table, enter SALES. In the When run list, select Add new data to existing data.


    Description of electronics_sales.png follows
    Description of the illustration electronics_sales.png
  10. Click Save. In Save Data Flow As, enter Sales Revenues, and then click OK.
  11. Click Run Data Flow Run Data Flow icon.

Schedule the Data Flow

Incremental processing runs when changes occur in the data source between data flow runs. This section shows you how to schedule a data flow.

  1. On the Home page, click Data, enter enter Sales Revenues in the Search bar, and then press Enter.
  2. Select your data flow, click Actions Actions menu icon, and then select New Schedule.
  3. In Sales Revenue, click New.
  4. In Schedule, enter a Name or keep the default name.
  5. Click the calendar in Start, and then select a start date. Click the calendar in End to specify an ending date or leave End empty.
  6. In Time, enter the hour and minutes of the start time. From the Repeat list, select a frequency for running the data flow, and then click OK.
    Description of schedule_df.png follows
    Description of the illustration schedule_df.png

Learn More