Before you Begin

This 10-minute tutorial shows you how to modify columns and data in a data flow to create curated datasets in Oracle Analytics. This tutorial uses a spreadsheet as the data source, however, you can use any supported data source.

Background

You might need to implement changes to you data before using that data in analyses. In a data flow, you can add, remove, change or merge columns, add calculations, modify the data contained in the columns, and create multiple datasets from one data flow. If you schedule the data flow to run periodically, you can capture updates in the data source, and enable persisting the transformations in your curated datasets.

After running the data flow, you can use the dataset to analyze the data by creating visualizations.

What Do You Need?

Create a Dataset and Data Flow

  1. Sign in to Oracle Analytics Cloud.
  2. On the Home page, click Create, and then click Dataset. In Create Dataset, click Drop data file here or click to browse, select the samp_revenue_denorm2024.xlsx file, and then click Open.
  3. In Create Dataset Table from samp_revenue_denorm2024, click OK.
  4. In the Join Diagram, click the OFFICE_NUMBER column, click Measure Measure icon, and then click Attribute.
  5. Click the PROD_NUMBER column, click Measure Measure icon, and then click Attribute.
  6. Click the ORDER_NUMBER column, click Measure Measure icon , and then click Attribute.
  7. Click Save Save icon. In Save Dataset As, enter Sample Revenue 2024, and then click OK.
  8. Click the samp_revenue_denorm2024 tab. In the dataset, click the No data column, select Options Options menu icon, and then select Delete.


    Description of no_data_column.png follows
    Description of the illustration no_data_column.png
  9. Click Save Save icon. Click Go back Back icon.

Create a Dataflow

In this section, you use the Sample Revenue 2024 dataset to create a dataflow that branches to create separate datasets.

  1. On the Home page, click Create, and then click Data Flow. In Add Data, select Sample Revenue 2024, and then click Add.
  2. From the Data Flow Steps panel, drag Branch to the Add a step Add Step node node.


    Branch uses 2 as the default number of datasets created from the source dataset. You can increase the number of datasets created when the data flow is run.

    Description of branch_step.png follows
    Description of branch_step.png
  3. Click Add a step Add Step node node on the top branch, and then select Merge Columns.


    Description of data_flow_w_branch.png follows
    Description of data_flow_w_branch.png
  4. In Merge Columns, enter Prod_Attribute in New column name. Next to Merge column, click the hyperlink and then select PROD_ATTRIBUTE1. Next to With, click the hyperlink, and then select PROD_ATTRIBUTE2. From the Delimiter list, select Space ().


    Description of merge_columns.png follows
    Description of merge_columns.png

Select Columns to Create a Dataset

In this section, you select the columns used to create a PRODUCTS dataset.

  1. From the Data Flow Steps panel, drag Select Columns to Add a step Add Step node between Merge Columns and the Save Data node.
  2. In Select Columns, click Remove all. Hold down the Ctrl key and select the following columns:
    • PROD_NAME
    • PROD_TYPE
    • PROD_LOB
    • PROD_BRAND
    • PROD_NUMBER
    • Prod_Attribute
  3. Click Add selected.
  4. Click the top Save Data node. In Save Dataset, enter Products in the Dataset field. In the PROD_NUMBER row, click Measure in the Treat As column, and then select Attribute.


    Description of products_branch.png follows
    Description of products_branch.png

Create a Second Dataset

  1. From the Data Flow Steps panel, drag Select Columns to Add a step Add Step node between the Branch and the second Save Data nodes.


    Description of second_dataset.png follows
    Description of second_dataset.png
  2. In Select Columns, click Remove all. Hold down the Ctrl key and select the following columns:
    • PROD_NAME
    • ORDER_NUMBER
    • REVENUE
    • UNITS
    • DISCNT_VALUE
    • BILL_DAY_DT
    • ORDER_DAY_DT
    • ORDER_STATUS
    • ORDER_TYPE
  3. Click Add selected.


    Description of elect_columns_orders.png follows
    Description of the illustration select_columns_orders.png
  4. Drag Add Columns to the Add a Step Add Step node node between Select Columns and Save Data. In Add Columns, enter ACTUAL_REVENUE in Name.
  5. In the Expression field, start entering Revenue, and then select REVENUE from Available Data. Expand Operators, and double-click the minus sign (-). After the minus sign, start entering DIS, and then select DISCNT_VALUE from Available Data.
  6. Click Validate, and then click Apply.


    Description of actual_revenue_calc.png follows
    Description of the illustration actual_revenue_calc.png
  7. Click the Save Data node on the branch with Add Columns. In Save Dataset, enter Orders in Dataset. In the ORDER_NUMBER row, click Measure in the Treat As column, and then select Attribute.


    Description of save_as_orders.png follows
    Description of the illustration save_as_orders.png
  8. Click Save Save icon. In Save Data Flow As, enter Sample Revenue DF in Name, and then click OK.
  9. Click Run Data Flow Run Data Flow icon.
  10. After the data flow run completes, click Go back Back icon.
  11. On the Home page, select Products, click the Actions Actions menu icon, and then select Inspect. In the PRODUCTS dataset, click Data Elements to review the dataset. Click Close.
  12. (Optional) On the Home page, select Orders, click the Actions Actions menu icon, and then select Inspect. In the ORDERS dataset, click Data Elements to review the dataset. Click Close.

Schedule the Data Flow Run

In this section, you schedule the data flow to run by defining the repetition, the duration, and interval. Your data might not change frequently, so you could define a schedule that meets your needs.

  1. On the Home page, click the Data search tag, enter Sample Revenue 2024 to located the data flow, and then click Search.
  2. Select the Sample Revenue 2024 data flow, click Actions Actions menu icon, and select New schedule.
  3. In Schedules, click New.
  4. In Schedule, enter a Name for the schedule. In Start, click the calendar Calendar icon, and then select a month and day. In Time, click the clock Clock icon to select the hour and minutes for the run's start time.
  5. From Repeat, select No Repeat as the frequency to use for running the data flow, and then click OK.


    In a production environment, you can select an actual frequency interval for running the dataflow.

    Description of data_flow_schedule.png follows
    Description of data_flow_schedule.png

Learn More