About Data Flows
Data flows enable you to organize and integrate your data to produce curated datasets that your users can visualize.
Use data flows to manipulate your data visually without requiring manual coding skills.
For example, you might use a data flow to:
- Create a dataset.
- Combine data from different sources.
- Aggregate data.
- Train machine learning models or apply a predictive machine learning model to your data.
- Perform object detection, image classification, or text detection using artificial intelligence via the OCI Vision service.
You create data flows in the data flow design pane.
Description of the illustration data-flow-designer-new.png
To build a data flow, you add steps. Each step performs a specific function, for example, add data, join tables, merge columns, or transform data. Use the data flow editor to add and configure your steps. Each step is validated when you add or change it. When you've configured your data flow, you execute it to create or update a dataset.
When you add your own columns or transform data, you can use a wide range of SQL operators (for example, BETWEEN, LIKE, IN), conditional expressions (for example, CASE), and functions (for example, Avg, Median, Percentile).
Updating Datasets Generated by Data Flows
You can rerun data flows to keep your datasets up-to-date.
Note:
When you rerun a data flow, any transformations applied directly to the output dataset outside of the data flow are lost. The dataset is recreated from scratch.Data Flow Limits
If you're processing large amounts of data, note that there're data flow limits. See Data Flow Limits.
Database Support for Data Flows
With data flows you can curate data from datasets and subject areas.
You can execute data flows individually or in a sequence. You can include multiple data sources in a data flow and specify how to join them.
Use the Add Data step to add data to a data flow, and use the Save Data step to save output data from a data flow.
You can save the output data from a data flow in either a dataset or in one of the supported database types. If you save data to a database, you can transform the data source by overwriting it with data from the data flow. The data source and data flow tables must be in the same database and have the same name. Before you start, create a connection to one of the supported database types.
Data Output
- Oracle Autonomous AI Lakehouse
- Oracle Autonomous AI Transaction Processing
- Oracle Database/Oracle AI Database
- Apache Hive
- Hortonworks Hive
- MapR Hive
- Spark
Data Input
In data flows you can process data from datasets and subject areas. You can't pull data directly from databases - you have to create a dataset from the database table(s) first.
Working in the Data Flow Designer
You can prepare your data for analysis by building data flows in the data flow designer. For example, you might transform columns, merge columns, or categorize data into bins. You can also train machine learning models and analyze data by applying trained models to your data. Find out how to use the data flow designer to quickly get started.
The data flow design pane
Description of the illustration data-flow-designer-new.png
Areas in the data flow designer
Description of the illustration data-flow-designer-features.png
| Area | How to use |
|---|---|
| 1 | Steps selector. Drag and drop steps from here onto the step editor pane to build your data flow. |
| 2 |
Dataflow design pane. View and edit your data flow design and rearrange steps. |
| 3 |
Step configure pane. Configure the settings for the step you have selected in the dataflow design pane. |
| 4 | Data preview pane. View a sample of your data, and review the effect of your steps on your data. |
Working in the data flow designer:
| Feature | Icon | What it does |
|---|---|---|
| Compact layout | Group steps into a smaller view area to reduce scrolling. | |
| Data |
|
Display the data pane, where you can drag and drop data columns onto the designer pane. |
| Data Flow Steps |
Display the steps pane, from where you can drag and drop steps onto the designer pane. |
|
| Data Preview |
|
Hide or display the Preview data columns pane. The preview updates automatically when you make changes to the data flow (configure this using Auto apply). |
| Expanded layout | Aligns input data source steps at the left to improve readbility. | |
| Incomplete join or union |
|
Indicates a data source that isn't joined or unioned, and displays a suggested join target when you hover-over. |
| Run data flow | Execute (known as 'run') the data flow. | |
| Show available functions/hide functions |
|
Display or hide the expression pick list in the step configure pane for applicable steps (for example, "Add Columns" or "Transform Column"). |
| Step Editor |
|
Hides or displays the Step editor pane. |
| Toggle auto-refresh | Turn automatic refresh of preview data on or off in the data preview pane. | |
| Zoom enhancements | Zoom in and out in the dataflow design pane. |