Before You Begin
This 15-minute tutorial shows you how to create a data flow with two datasets curated for reporting and analysis in Oracle Analytics.
Background
In this tutorial, you create a data flow using datasets that were create from two spreadsheet files. You learn how to combine data from separate sources in a data flow uses graphical representation of functions that transform and modify the source datasets to create a curated dataset.
In the data flow, you modify the donation dataset using transform columns, filter, add columns, and select column steps. You add the zip stats dataset to the data flow and transform a column to join the data from each dataset. After joining the datasets, you remove columns that you don't need using the select column step. You add the Aggregate step to define the functions used with the measure columns and add columns with expressions. Every data flow ends with the Save Dataset step. You run the data flow to create a dataset from the source data.
This is the first tutorial in the Prepare and Analyze Data in Oracle Analytics Cloud series. If you are just interested in learning how to create a dataset with a dataflow, you can end the series after the first tutorial. Read the tutorials in the order listed.
- Explore Data Flows
- Analyze Data
- Create Aggregated Datasets
What Do You Need?
- Access to Oracle Analytics Cloud or Oracle Analytics Desktop
- Download the following source files:
Create a Dataset
In this section, you create a dataset using the donation2024.xlsx
file.
- Sign in to Oracle Analytics.
- On the Home page, click Create, and then click Dataset.
- In Create Dataset, click Drop data file here or click to browse. In File Upload, select the
donation2024.xlsx
file, and then click Open. - In Create Dataset Table from donation2024.xlsx, click OK.
- Click Save . In Save Dataset As, enter
donations2024
in Name, and then click OK. - In the donations2024 dataset page, click Go back .
Add a Second Dataset
In this section, you create a dataset that provides demographics for the donations2024 dataset.
- On the Home page, click Create, and then click Dataset.
- In Create Dataset, click Drop data file here or click to browse. In File Upload, select the
zip_stats.xlsx
file, and then click Open. - In Create Dataset Table from zip_stats.xlsx, click OK.
- Click Save . In Save Dataset As, enter
zip stats
in Name, and then click OK. - Click Go back .
Create a Data Flow
In this section, you create a data flow and add steps to modify the data. You also create a filter to eliminate NULL values from the dataset.
- On the Home page, click Create, and then click Data Flow.
- In Add Dataset, select the donations2024 dataset, and then click Add.
The data flow opens with donations2024 as the first node.
- In the SCH_LATITUDE column, click Options , select Convert to Text, click Validate, and then click Apply.
- In the SCH_LONGITUDE column, click Options , select Convert to Text, click Validate, and then click Apply.
- Click Add a Step on the last transform column node. Select Filter .
- Click Data. In the Data panel, select DATE_COMPLETED and drag it to the filter area in the data flow.
- In DATE_COMPLETED, under Date Range keep Range as the value. In the first field, enter
1/1/2019
as the start date. In the second field enter10/31/2023
as the end date, and then click outside the dialog.The filter eliminates records that don't have a date completed value.
- Click Save. In Save Data Flow As, enter
School Donations
, and then click OK.
Add Columns with Expressions
In this section, you create columns by defining expressions. Manually enter the expressions instead of copying from this tutorial.
- From Data Flow Steps, drag the Add Columns to Add a step on the Filter node. In Name, enter
SCH_STATE
. In the expression field, enter the following:SUBSTRING(SCH_STATEZIP FROM 1 FOR 2)
- Click Validate, and then click Apply.
- In Add Columns, click Column . In Name, enter
SCH_ZIP
. In the expression field, enter the following:CAST(SUBSTRING(SCH_STATEZIP FROM 4) AS int)
- Click Validate, and then click Apply.
- In Add Columns, click Column . In Name, enter
YR_COMPLETED
. In the expression field, enter the following:YEAR(DATE_COMPLETED)
- Click Validate, and then click Apply.
- Click Toggle auto-refresh Data Preview , and then scroll to view the new columns.
- Select the SCH_ZIP column, click Options , and then select Convert to Text. Select the YR_COMPLETED column, click Options , and then select Convert to Text.
Add a Dataset to the Data Flow
In this section, you add the zip stats dataset to the data flow.
- In Data Flow Steps, double-click Add Data. In Add Dataset, click zip_stats, and then click Add.
- Click zip stats. In the Zip column, click Options, and then select Convert to Text.
- Click Validate. Click Apply.
- Click the Join node, from Input 1 list, select All rows. Under Match columns, click PROJECTID and select SCH_ZIP for the Input 1 value.
The join uses Zip column from the zip_stats dataset and the SCH_ZIP columns from the donations2024 dataset.
Select Columns
In this section, you select columns to use from the joined datasets.
- From Data Flow Steps, drag Select Columns to Add a step on the Join node.
- Under Select Columns, click Remove all. Hold down the Ctrl key, select the following columns, and then click Add selected:
- PROJECTID
- SCHOOL_ID
- PRIMARY_FOCUS_SUBJECT
- RESOURCE_TYPE
- POVERTY_LEVEL
- GRADE_LEVEL
- STUDENTS_REACHED
- TOTAL_DONATIONS
- NUM_DONORS
- SCH_STATE
- SCH_ZIP
- YR COMPLETED
- Median
- Population
- From Data Flow Steps, drag Aggregate to Add a step on the Select Columns node.
- Under Group by, click Remove next to the PROJECTID row.
- Under Aggregate, from the Function list, select Average in the following rows:
- STUDENTS_REACHED
- TOTAL_DONATIONS
- NUM_DONORS
- Median
- Population
- Under Aggregate, click Add Aggregate, select PROJECTID. From the Function list, select Count, and then enter
Number of Projects
in New column name. In the Median row, enterIncome
in New column name.
Define Aggregation
In this section, you use the Aggregate step to set the functions to use for measure columns in the dataset.
- From Data Flow Steps, drag Add Columns to Add a step on the Aggregate node.
- In Name, enter
Donation by Population
. - To represent the average donation amount by the average population in the zip code, in the expression field, enter the following:
TOTAL_DONATIONS Average/Population Average
- Click Validate. Click Apply.
- In Add Columns, click Column . In Name, enter
Average School Donation by Income
. - In the expression field, enter the following:
avg(TOTAL_DONATIONS Average by SCHOOL_ID)/Income
- Click Validate. Click Apply.
Save and Run the Data Flow
In this section, you name the dataset that is output as a result of running the data flow, and examine the columns in the new dataset. You use the Donations by School dataset in the next tutorial.
- From Data Flow Steps, double-click Save Dataset.
- In Save Dataset, enter
Donations by School
in Name. From Save data to, select Dataset Storage to save the data in Oracle Analytics. - Click Save.
- Click Run Data Flow .
- After the data flow run completes successfully, click Go back .
- On the Home page, click Data, and then select the Donations by School dataset. Click the Actions , and then select Inspect.
- In Donations by School dataset, click Data Elements to view the columns.
Next Step
Learn More
Explore Data Flows in Oracle Analytics
F22282-06
May 2024
Learn how to construct and run a data flow to create a dataset in Oracle Analytics.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms governing the U.S. Government's use of Oracle cloud services are defined by the applicable contract for such services. No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.