About Data Flows

Data flows enable you to organize and integrate your data to produce curated datasets that your users can visualize.

For example, you might use a data flow to:

  • Create a dataset.
  • Combine data from different sources.
  • Aggregate data.
  • Train machine learning models or apply a predictive machine learning model to your data.
  • Perform object detection, image classification, or text detection using artificial intelligence via the OCI Vision service.

You create data flows in the data flow editor.
Data flow editor

To build a data flow, you add steps. Each step performs a specific function, for example, add data, join tables, merge columns, transform data, save your data. Use the data flow editor to add and configure your steps. Each step is validated when you add or change it. When you've configured your data flow, you execute it to produce or update a dataset.

When you add your own columns or transform data, you can use a wide range of SQL operators (for example, BETWEEN, LIKE, IN), conditional expressions (for example, CASE), and functions (for example, Avg, Median, Percentile).

Database Support for Data Flows

With data flows you can curate data from datasets, subject areas, or database connections.

You can execute data flows individually or in a sequence. You can include multiple data sources in a data flow and specify how to join them.

You can save the output data from a data flow in either a dataset or in one of the supported database types. If you save data to a database, you can transform the data source by overwriting it with data from the data flow. The data source and data flow tables must be in the same database and have the same name. Before you start, create a connection to one of the supported database types.

Note:

With data flows you can source data from remote databases (using a remote connection with Data Gateway). However, you can't save data to datasets that use remote connections.

To find out which databases you can write to from a data flow, refer to the More Information column in Supported Data Sources. Note: Oracle Essbase and EPM Cloud datasets can't be used in the Add Data step as inputs to Data Flows.