You can use data flows to produce curated (combined, organized, and integrated) data sets.
Here are the common tasks for creating curated data sets with data flows.
Task | Description | More Information |
---|---|---|
Create a data flow | Create data flows from one or more data sets. | Creating a Data Flow |
Add filters | Use filters to limit the data in a data flow output. | Adding Filters to a Data Flow |
Add aggregates | Apply aggregate functions to group data in a data flow. | Adding Aggregates to a Data Flow |
Merge columns and rows of data sets | Combine two or more columns and rows of data sets in a data flow. | |
Create a binning column | Assign a value to add a binning column to the data set. | Creating a Binning Column in a Data Flow |
Create a sequence of data flows | Create and save a sequential list of data flows. | Creating a Sequence of Data Flows |
Create a group | Create a group column of attribute values in a data set. | Creating a Group in a Data Flow |
Add cumulative values | Group data by applying cumulative aggregate functions in a data flow. | Adding Cumulative Values to a Data Flow |
Save output data to a database | Connect to a database and save the output data from a data flow to a table in a database. | Saving Output Data from a Data Flow to a Database |
Execute a data flow | Execute data flows to create data sets. | Executing a Data Flow |
Run a data flow | Run a saved data flow to create data sets or to refresh the data in a data set. | Running a Data Flow |
Data flows let you take one or more data sets and organize and integrate them to produce a curated set of data that you can use to easily create effective visualizations.
You use the Data Visualization's data flow editor to select specific data from existing data sets, apply transformations, add joins and filters, remove unwanted columns, add new derived measures, add derived columns, and add other operations. The data flow is then run to produce a data set that you can use to create complex visualizations.
See Creating a Data Flow and Running a Data Flow.
Build your data flow by adding steps to select, limit, and customize your data.
The following image shows the Data Flow editor.
What Can You Do in the Data Flow Editor?
select, add, and rename columns
add or adjust aggregates
add filters
create a merge column
merge rows
create a binning column
add a sequence
create a group
add cumulative values
customize step names
schedule a data flow
add another data set
These helpful tips should help you to use the Step Editor pane more effectively:
You can hide or display the Step editor pane by clicking Step editor at the bottom of the Data Flow editor.
You can hide or display the Preview data columns pane by clicking Preview data at the bottom of the Data Flow editor.
The Preview data columns pane updates automatically as you make changes to the data flow.
For example, you could add a Select Columns step, remove some columns, and then add an Aggregate step. While working on the Aggregate step, the Preview data columns pane already shows the columns and data that you just specified in the Select Columns step.
You can specify whether or not to automatically refresh step changes in the Preview data columns pane by clicking Auto apply .
You can add another data set and join it to the existing data sets in your data flow by selecting Add Data in the Data Flow Steps panel.
Joins are created automatically when you add a data set; however, you can edit the join details in the Join dialog.
Oracle Data Visualization validates data flow steps as you add them to or delete them from the data flow.
If you’re adding an expression (in an Add Column step or a Filter step), then you must click Apply to finalize the step.
If you add a new step to the workflow diagram without clicking Apply, then your expression won’t be applied, and the next step that you add won’t use the correct data.
You can create a data flow from one or more data sets. With a data flow, you produce a curated data set that you can use to easily and efficiently create meaningful visualizations.
You can use filters to limit the amount of data included in the data flow output. For example, limiting sales revenue data of a column to the years 2010 through 2017.
You can group data by applying aggregate functions such as count, sum, and average.
You can combine two or more columns to display as one. For example, you can merge the street address, street name, state, and ZIP code columns so that they display as one item in the visualizations using the data flow’s output.
You can merge the rows of two data sets. The result can include all the rows from both data sets, the unique rows from each data set, the overlapping rows from both data sets, or the rows unique to one data set.
Before you merge the rows, do the following:
Confirm that each data set has the same number of columns.
Check that the data types of the corresponding columns of the data sets match. For example, column 1 of data set 1 must have the same data type as column 1 of data set 2.
Binning a measure creates a new column based on the value of the measure. You can assign a value to the bin dynamically by creating the number of equal size bins (such as the same number of values in each bin), or by explicitly specifying the range of values for each bin.
A sequence is a saved sequential list of specified data flows and is useful when you want to run multiple data flows as a single transaction. If any flow within a sequence fails, then all the changes done in the sequence are rolled back.
You can use binning attributes to define groups of attribute values in a data set.
You can group data by applying the cumulative aggregate functions such as the moving and running aggregate. A moving aggregate aggregates values over a row and a specific number of preceding rows. A running aggregate aggregates values over all the preceding rows. Because both the moving and running aggregates are based on the preceding rows, the sort order of rows is important. You can specify the order as part of the aggregate.
You can rename a data flow step and add or edit the description.
Executing a data flow produces a data set that you can use to create visualizations.
You can connect to a database and save output data from a data flow to a table in that database.
You can run a saved data flow to create a corresponding data set or to refresh the data in the data set created from the data flow.
Notes about running data flows:
To run a saved data flow, you must specify a Save Data step as its final step. To add this step to the data flow, click the data flow’s Actions menu and select Open. After you’ve added the step, save the data flow and try to run it again.
When creating a new database data source, set the database’s query mode to Live. Setting the query mode to Live allows the data flow to access data from the database (versus the data cache) and pushes any expensive operations such as joins to the database. See Managing Data Sets.
When you update a data flow that uses data from a database source, the data is either cached or live depending on the query mode of the source database.
Complex data flows take longer to run. While the data flow is running, you can go to and use other parts of the application, and then come back to the Data Flows pane to check the status of the data flow.
You can cancel a long-running data flow. To do so, go to the Data Flows section, click the data flow’s Action menu and select Cancel.
If it’s the first time you’ve run the data flow, then a new data set is created and you can find it in the Data Sets section of the Data page. The data set contains the name that you specify on the data flow’s Save Data step. If you’ve run the data flow before, then the resulting data source already exists and its data is refreshed.