Explore Data Flows in Oracle Analytics

Before You Begin

This 15-minute tutorial shows you how to create a data flow with two datasets curated for reporting and analysis in Oracle Analytics.

Background

In this tutorial, you create a data flow using datasets that were create from two spreadsheet files. You learn how to combine data from separate sources in a data flow uses graphical representation of functions that transform and modify the source datasets to create a curated dataset.

In the data flow, you modify the donation dataset using transform columns, filter, add columns, and select column steps. You add the zip stats dataset to the data flow and transform a column to join the data from each dataset. After joining the datasets, you remove columns that you don't need using the select column step. You add the Aggregate step to define the functions used with the measure columns and add columns with expressions. Every data flow ends with the Save Dataset step. You run the data flow to create a dataset from the source data.

This is the first tutorial in the Prepare and Analyze Data in Oracle Analytics Cloud series. If you are just interested in learning how to create a dataset with a dataflow, you can end the series after the first tutorial. Read the tutorials in the order listed.

What Do You Need?

Access to Oracle Analytics Cloud or Oracle Analytics Desktop
Download the following source files:
- donation2024.xlsx
- zip_stats.xlsx

Create a Dataset

In this section, you create a dataset using the donation2024.xlsx file.

Sign in to Oracle Analytics.
On the Home page, click Create, and then click Dataset.
In Create Dataset, click Drop data file here or click to browse. In File Upload, select the donation2024.xlsx file, and then click Open.
In Create Dataset Table from donation2024.xlsx, click OK.
Click Save . In Save Dataset As, enter donations2024 in Name, and then click OK.

Description of the illustration donation_dataset.png
In the donations2024 dataset page, click Go back .

Add a Second Dataset

In this section, you create a dataset that provides demographics for the donations2024 dataset.

On the Home page, click Create, and then click Dataset.
In Create Dataset, click Drop data file here or click to browse. In File Upload, select the zip_stats.xlsx file, and then click Open.
In Create Dataset Table from zip_stats.xlsx, click OK.

Description of the illustration zip_stats_join_diagram.png
Click Save . In Save Dataset As, enter zip stats in Name, and then click OK.
Click Go back .

Create a Data Flow

In this section, you create a data flow and add steps to modify the data. You also create a filter to eliminate NULL values from the dataset.

On the Home page, click Create, and then click Data Flow.
In Add Dataset, select the donations2024 dataset, and then click Add.

The data flow opens with donations2024 as the first node.
Description of the illustration df_first_node.png
Click Add a Step on the donations2024 node. Select Filter .
In Filter, select Click here for drag data to add a filter. From Available Data, select DATE_COMPLETED.
In DATE_COMPLETED, under Date Range keep Range as the value. In the first field, enter 1/1/2019 as the start date. In the second field enter 10/31/2023 as the end date, and then click outside the dialog.

The filter eliminates records that don't have a date completed value.
Description of the illustration date_completed_filter.png
Click Save. In Save Data Flow As, enter School Donations, and then click OK.

Add Columns with Expressions

In this section, you create columns by defining expressions. Manually enter the expressions instead of copying from this tutorial.

From Data Flow Steps, drag the Add Columns to Add a step on the Filter node. In Name, enter SCH_STATE. In the expression field, enter the following:
SUBSTRING(SCH_STATEZIP FROM 1 FOR 2)
Click Validate, and then click Apply.

Description of the illustration add_column_sch_state.png
In Add Columns, click Column . In Name, enter SCH_ZIP. In the expression field, enter the following:
CAST(SUBSTRING(SCH_STATEZIP FROM 4) AS int)
Click Validate, and then click Apply.
Click Add Column . In Name, enter YR_COMPLETED. In the expression field, enter the following:
YEAR(DATE_COMPLETED)
Click Validate, and then click Apply.
Select the SCH_ZIP column, click Options , and then select Convert to Text.
Click Validate and click Apply.
Select the YR_COMPLETED column, click Options , and then select Convert to Text. Click Validate and click Apply.

Description of the illustration add_columns.png

Add a Dataset to the Data Flow

In this section, you add the zip stats dataset to the data flow.

In Data Flow Steps, double-click Add Data. In Add Dataset, click zip stats, and then click Add.
Click the Join node, from Input 1 list, select All rows. Under Match columns, click PROJECTID and select SCH_ZIP for the Input 1 value.
Click the zip stats node. Select the Zip column, click Options , and then select Convert to Text.
Click Validate and click Apply.

Converting the Zip column to text validates the join between the datasets.
Description of the illustration join_transformation.png

Select Columns

In this section, you select columns to use from the joined datasets.

From Data Flow Steps, drag Select Columns to Add a step on the Join node.

Description of the illustration select_columns.png
Under Select Columns, click Remove all. Hold down the Ctrl key, select the following columns, and then click Add selected:
- PROJECTID
- SCHOOL_ID
- PRIMARY_FOCUS_SUBJECT
- RESOURCE_TYPE
- POVERTY_LEVEL
- GRADE_LEVEL
- STUDENTS_REACHED
- TOTAL_DONATIONS
- NUM_DONORS
- SCH_STATE
- SCH_ZIP
- YR COMPLETED
- Median
- Population
In Data Flow Steps, double-click Aggregate to add this step to the Select Columns node.
Under Group by, click Remove next to the PROJECTID row.
Under Aggregate, from the Function list, select Average in the following rows:
- STUDENTS_REACHED
- TOTAL_DONATIONS
- NUM_DONORS
- Median
- Population
Under Aggregate, click Add Aggregate, select PROJECTID. From the Function list, select Count, and then enter Number of Projects in New column name. In the Median row, enter Income in New column name.

Description of the illustration aggregate_columns.png

Define Aggregation

In this section, you use the Aggregate step to set the functions to use for measure columns in the dataset.

From Data Flow Steps, drag Add Columns to Add a step on the Aggregate node.
In Name, enter Donation by Population.
In the expression field, enter the following:
TOTAL_DONATIONS Average/Population Average
Click Validate. Click Apply.
In Add Columns, click Column . In Name, enter Average School Donation by Income.
In the expression field, enter the following:
avg(TOTAL_DONATIONS Average by SCHOOL_ID)/Income
Click Validate and click Apply.

Description of the illustration new_columns_after_join.png

Save and Run the Data Flow

In this section, you name the dataset that results from running the data flow and examine the columns in the new dataset. You use the Donations by School dataset in the next tutorial.

From Data Flow Steps, double-click Save Dataset.
In Save Dataset, enter Donations by School in Name. From Save data to, select Dataset Storage to save the data in Oracle Analytics.
Under Save Dateset in the Number of Projects row, click Sum in the Default Aggregation column and select Count.
In the Donations by Population and Average School Donation rows, click Sum in the Default Aggregation column and select Average.
Click Save.

Description of the illustration add_columns_after_join.png
Click Run Data Flow .
After the data flow run completes successfully, click Go back .
On the Home page, click Data, and then select the Donations by School dataset. Click the Actions , and then select Inspect.
In Donations by School dataset, click Data Elements to view the columns.

Description of the illustration donationsbysch_datase.png

Next Step

Analyze Your Data

Learn More

Title and Copyright Information

Explore Data Flows in Oracle Analytics

F22282-10

July 2025

Learn how to construct and run a data flow to create a dataset in Oracle Analytics.

This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms governing the U.S. Government's use of Oracle cloud services are defined by the applicable contract for such services. No other rights are granted to the U.S. Government.

This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.