Before You Begin

This 15-minute tutorial shows you how to create a data flow with two datasets curated for reporting and analysis in Oracle Analytics.

Background

In this tutorial, you create a data flow using datasets that were create from two spreadsheet files. You learn how to combine data from separate sources in a data flow uses graphical representation of functions that transform and modify the source datasets to create a curated dataset.

In the data flow, you modify the donation dataset using transform columns, filter, add columns, and select column steps. You add the zip stats dataset to the data flow and transform a column to join the data from each dataset. After joining the datasets, you remove columns that you don't need using the select column step. You add the Aggregate step to define the functions used with the measure columns and add columns with expressions. Every data flow ends with the Save Dataset step. You run the data flow to create a dataset from the source data.

This is the first tutorial in the Prepare and Analyze Data in Oracle Analytics Cloud series. If you are just interested in learning how to create a dataset with a dataflow, you can end the series after the first tutorial. Read the tutorials in the order listed.

What Do You Need?

Create a Dataset

In this section, you create a dataset using the donation2024.xlsx file.

  1. Sign in to Oracle Analytics.
  2. On the Home page, click Create, and then click Dataset.
  3. In Create Dataset, click Drop data file here or click to browse. In File Upload, select the donation2024.xlsx file, and then click Open.
  4. In Create Dataset Table from donation2024.xlsx, click OK.
  5. Click Save Save icon. In Save Dataset As, enter donations2024 in Name, and then click OK.


    Description of donation_dataset.png follows
    Description of the illustration donation_dataset.png
  6. In the donations2024 dataset page, click Go back Back icon.

Add a Second Dataset

In this section, you create a dataset that provides demographics for the donations2024 dataset.

  1. On the Home page, click Create, and then click Dataset.
  2. In Create Dataset, click Drop data file here or click to browse. In File Upload, select the zip_stats.xlsx file, and then click Open.
  3. In Create Dataset Table from zip_stats.xlsx, click OK.


    Description of zip_stats_join_diagram.png follows
    Description of the illustration zip_stats_join_diagram.png
  4. Click Save Save icon. In Save Dataset As, enter zip stats in Name, and then click OK.
  5. Click Go back Back icon.

Create a Data Flow

In this section, you create a data flow and add steps to modify the data. You also create a filter to eliminate NULL values from the dataset.

  1. On the Home page, click Create, and then click Data Flow.
  2. In Add Dataset, select the donations2024 dataset, and then click Add.


    The data flow opens with donations2024 as the first node.

    Description of df_first_node.png follows
    Description of the illustration df_first_node.png
  3. In the SCH_LATITUDE column, click Options Options icon, select Convert to Text, click Validate, and then click Apply.
  4. In the SCH_LONGITUDE column, click Options Options icon, select Convert to Text, click Validate, and then click Apply.


    Description of transform_lat_long.png follows
    Description of the illustration transform_lat_long.png
  5. Click Add a Step Add step icon on the last transform column node. Select Filter Filter step icon.
  6. Click Data. In the Data panel, select DATE_COMPLETED and drag it to the filter area in the data flow.
  7. In DATE_COMPLETED, under Date Range keep Range as the value. In the first field, enter 1/1/2019 as the start date. In the second field enter 10/31/2023 as the end date, and then click outside the dialog.


    The filter eliminates records that don't have a date completed value.

    Description of date_completed_filter.png follows
    Description of the illustration date_completed_filter.png
  8. Click Save. In Save Data Flow As, enter School Donations, and then click OK.

Add Columns with Expressions

In this section, you create columns by defining expressions. Manually enter the expressions instead of copying from this tutorial.

  1. From Data Flow Steps, drag the Add Columns to Add a step Add a step icon on the Filter node. In Name, enter SCH_STATE. In the expression field, enter the following:

    SUBSTRING(SCH_STATEZIP FROM 1 FOR 2)

  2. Click Validate, and then click Apply.
  3. In Add Columns, click Column Add Column icon. In Name, enter SCH_ZIP. In the expression field, enter the following:

    CAST(SUBSTRING(SCH_STATEZIP FROM 4) AS int)

  4. Click Validate, and then click Apply.
  5. In Add Columns, click Column Add step icon. In Name, enter YR_COMPLETED. In the expression field, enter the following:

    YEAR(DATE_COMPLETED)

  6. Click Validate, and then click Apply.
  7. Click Toggle auto-refresh Data Preview Toggle auto-refresh Data Preview, and then scroll to view the new columns.
  8. Select the SCH_ZIP column, click Options Options icon, and then select Convert to Text. Select the YR_COMPLETED column, click Options Options icon, and then select Convert to Text.


    Description of add_columns.png follows
    Description of the illustration add_columns.png

Add a Dataset to the Data Flow

In this section, you add the zip stats dataset to the data flow.

  1. In Data Flow Steps, double-click Add Data. In Add Dataset, click zip_stats, and then click Add.
  2. Click zip stats. In the Zip column, click Options, and then select Convert to Text.
  3. Click Validate. Click Apply.
  4. Click the Join node, from Input 1 list, select All rows. Under Match columns, click PROJECTID and select SCH_ZIP for the Input 1 value.


    The join uses Zip column from the zip_stats dataset and the SCH_ZIP columns from the donations2024 dataset.

    Description of dataset_join.png follows
    Description of the illustration dataset_join.png

Select Columns

In this section, you select columns to use from the joined datasets.

  1. From Data Flow Steps, drag Select Columns to Add a step Add a step icon on the Join node.


    Description of select_columns.png follows
    Description of the illustration select_columns.png
  2. Under Select Columns, click Remove all. Hold down the Ctrl key, select the following columns, and then click Add selected:
    • PROJECTID
    • SCHOOL_ID
    • PRIMARY_FOCUS_SUBJECT
    • RESOURCE_TYPE
    • POVERTY_LEVEL
    • GRADE_LEVEL
    • STUDENTS_REACHED
    • TOTAL_DONATIONS
    • NUM_DONORS
    • SCH_STATE
    • SCH_ZIP
    • YR COMPLETED
    • Median
    • Population
  3. From Data Flow Steps, drag Aggregate to Add a step Add a step icon on the Select Columns node.
  4. Under Group by, click Remove Delete icon next to the PROJECTID row.
  5. Under Aggregate, from the Function list, select Average in the following rows:
    • STUDENTS_REACHED
    • TOTAL_DONATIONS
    • NUM_DONORS
    • Median
    • Population
  6. Under Aggregate, click Add Aggregate, select PROJECTID. From the Function list, select Count, and then enter Number of Projects in New column name. In the Median row, enter Income in New column name.


    Description of aggregate_columns.png follows
    Description of the illustration aggregate_columns.png

Define Aggregation

In this section, you use the Aggregate step to set the functions to use for measure columns in the dataset.

  1. From Data Flow Steps, drag Add Columns to Add a step Add a step icon on the Aggregate node.
  2. In Name, enter Donation by Population.
  3. To represent the average donation amount by the average population in the zip code, in the expression field, enter the following:

    TOTAL_DONATIONS Average/Population Average

  4. Click Validate. Click Apply.
  5. In Add Columns, click Column Add Column icon. In Name, enter Average School Donation by Income.
  6. In the expression field, enter the following:

    avg(TOTAL_DONATIONS Average by SCHOOL_ID)/Income

  7. Click Validate. Click Apply.


    Description of new_columns_after_join.png follows
    Description of the illustration new_columns_after_join.png

Save and Run the Data Flow

In this section, you name the dataset that is output as a result of running the data flow, and examine the columns in the new dataset. You use the Donations by School dataset in the next tutorial.

  1. From Data Flow Steps, double-click Save Dataset.
  2. In Save Dataset, enter Donations by School in Name. From Save data to, select Dataset Storage to save the data in Oracle Analytics.
  3. Click Save.


    Description of add_columns_after_join.png follows
    Description of the illustration add_columns_after_join.png
  4. Click Run Data Flow Run Data Flow icon.
  5. After the data flow run completes successfully, click Go back Back icon.
  6. On the Home page, click Data, and then select the Donations by School dataset. Click the Actions actions icon, and then select Inspect.
  7. In Donations by School dataset, click Data Elements to view the columns.


    Description of donationsbysch_dataset.png.png follows
    Description of the illustration donationsbysch_datase.png

Next Step

Analyze Data

Learn More