About Data Flows

Data flows enable you to organize and integrate your data to produce curated datasets that your users can visualize.

Use data flows to manipulate your data visually without requiring manual coding skills.

For example, you might use a data flow to:

  • Create a dataset.
  • Combine data from different sources.
  • Aggregate data.
  • Train machine learning models or apply a predictive machine learning model to your data.
  • Perform object detection, image classification, or text detection using artificial intelligence via the OCI Vision service.

You create data flows in the data flow editor.
Data flow editor

To build a data flow, you add steps. Each step performs a specific function, for example, add data, join tables, merge columns, transform data, save your data. Use the data flow editor to add and configure your steps. Each step is validated when you add or change it. When you've configured your data flow, you execute it to produce or update a dataset.

When you add your own columns or transform data, you can use a wide range of SQL operators (for example, BETWEEN, LIKE, IN), conditional expressions (for example, CASE), and functions (for example, Avg, Median, Percentile).

Database Support for Data Flows

With data flows you can curate data from datasets, subject areas, or database connections.

You can execute data flows individually or in a sequence. You can include multiple data sources in a data flow and specify how to join them.

Use the Add Data step to add data to a data flow, and use the Save Data step to save output data from a data flow.

You can save the output data from a data flow in either a dataset or in one of the supported database types. If you save data to a database, you can transform the data source by overwriting it with data from the data flow. The data source and data flow tables must be in the same database and have the same name. Before you start, create a connection to one of the supported database types.

Note:

With data flows you can source data from remote databases (using a remote connection with Data Gateway). However, you can't save data to datasets that use remote connections.

Data Output

You can save output data from data flows to these database types:
  • Oracle Autonomous Data Warehouse
  • Oracle Autonomous Transaction Processing
  • Oracle Database
  • Apache Hive
  • Hortonworks Hive
  • MapR Hive
  • Spark

For database version information, see Supported Data Sources.

Data Input

You can input data to data flows from most database types (except Oracle Essbase and EPM Cloud).

Working in the Data Flow Editor

You prepare your data for analysis by building data flows in the data flow editor. For example, you might transform columns, merge columns, or categorize data into bins. Find out how to use the data flow editor to quickly get started preparing your data.

Working in the data flow editor:

Name Icon What it does
Data

Data pane icon

Display the data pane, where you can drag and drop data columns onto the data flow editor.

Data Flow Steps Data Flow Steps pane icon

Display the steps pane, where you can drag and drop steps onto the data flow editor.

Data Preview

Data Preview icon

Hide or display the Preview data columns pane by clicking Toggle Data Preview at the bottom right-hand corner of the data flow editor. This pane updates automatically when you make changes to the data flow. You can specify whether to automatically refresh step changes in the Preview data pane by clicking Auto apply.

Run data flow Run data flow Execute (known as 'run') the data flow.
Show available functions/hide functions

Show available functions/Hide functions icon

Display or hide the expression pick list. This icon is only displayed for steps that enable you to build your own expressions, for example, the "Add Columns" step or "Transform Column" step.

Step Editor

Steps editor icon

Hide or display the Step editor pane by clicking the Toggle Step Editor icon at the bottom right-hand corner of the data flow editor.

Toggle auto-refresh Toggle auto refresh for data in data flows Turn on to refresh the data preview as soon as you make changes in your data flow. For example, if you have a transform column step that changes text from low case to high case, you see the high case text in the data preview. If you turn off, the data preview is only refreshed if you click Refresh Data Preview.