Create a Data Flow

A data flow defines how the data is moved and transformed between different systems.

A data flow in Data Transforms connects sources to targets through a flow of components such as Join, Filter, Aggregate, Set, Split, and so on. See Supported Database Functions for more information.

When you run a data flow, Data Transforms uses the joins, filters, mappings, and constraints to transform source data and load it to target tables. Note that you can run only one execution flow at a time. You cannot put multiple flows on a Data Flow and a flow cannot diverge into multiple flows.

You can create data flows in any of the following ways:

The newly created data flow is displayed in the Data Flows page of the associated project. Click the Actions icon (Actions icon) next to the selected data flow to edit, rename, copy, change folder, start, export, or delete it.

From the Projects page

To create a data flow from the Projects page,

  1. On the Projects page, click Create Data Flow.

    Create Data Flow page appears:

  2. In the Name field, enter a name for the new data flow.
  3. Select Create New Project, if you wish to create new project folder for the newly created data flow.
  4. Else, click Add to Existing Projects, if you wish to add the newly created data flow to an existing project folder.
  5. If you have selected Create New Project for the previous option, in the Project Name field, enter the name of the newly created project.
  6. Else, if you have selected Add to Existing Projects for the previous option, select the required project from the Project Name drop-down arrow.
  7. In the Description field, enter a description for the newly created data flow.
  8. Click Create.

From the Data Flows page within a project

To create a data flow from the Data Flows page within a project,

  1. On the Projects page click the project tile you wish to create a new data flow for. The Project Details page appears.
  2. In the Data Flows page, click Create Data Flow.
  3. Provide the Name and Description of the new data flow.
  4. Click Next.
  5. To define your source connection, from the Connection drop-down, select the required connection from which you wish to add the data entities.
  6. In the Schema drop-down, all schema corresponding to the selected connection are listed in two groups:
    • Existing Schema (ones that you've imported into Oracle Data Transforms) and
    • New Database Schema (ones that you've not yet imported).

    Select the schema that you want to use from the drop down. For Oracle Object Storage connections, the Schema drop-down lists the name of the bucket that you specified in the URL when you created the connection.

  7. Click Save.

    The Data Flow Editor appears that allows you to create a new data flow.

From the Home page

To create a data flow from the Home page,

  1. On the Home page, click Transform Data. The Create Data Flow page appears.
  2. Provide the Name and Description of the new data flow.
  3. Select a project name from the drop-down. Alternatively, click the + icon to create a project.
  4. Click Next.
  5. From the Connection drop-down, select the required connection from which you wish to add the data entities. Alternatively, click the + icon to create a new connection.
  6. In the Schema drop-down, all schema corresponding to the selected connection are listed in two groups:
    • Existing Schema (ones that you've imported into Oracle Data Transforms) and
    • New Database Schema (ones that you've not yet imported).

    Select the schema that you want to use from the drop down.

  7. Click Save.

About Data Flow Editor

The Data flow editor is divided into five parts, the Data Entity Panel, the Database Functions Toolbar, the Design Canvas, the Properties Panel, and the Status Panel.


Description of datafloweditor.png follows

  • Data Entities Panel: The data entity panel displays the Data Entities that are available to use in your Data flows. The displayed list can be filtered using the Name and Tags fields. The panel includes options that let you add schemas, import data entities, remove any of the schemas that are associated with the data flow, and refresh data entities. See Add Components for information about how to use these options.
  • Database Functions Toolbar: The Database Functions toolbar display the database functions that can be used in your data flows. Just like Data Entities, you can drag and drop the Database tools you want to use on the design canvas. See Supported Database Functions for more information.
  • Design Canvas: The design canvas is where you build your transformation logic. After adding the Data Entities and Database Functions to the design canvas, you can connect them in a logical order to complete your data flows.
  • Properties Panel: The properties panel displays the properties of the selected object on the design canvas. The Properties Panel is grouped into four Tabs. General, Attributes, Preview Data, Column Mapping, and Options. Not all tabs are available as they vary based on the selected object. See Component Properties to know more about these options.
  • Status Panel: When you run a data flow, the Status Panel shows the status of the job that is running in the background to complete the request. You can see the status of the job that is currently running or the status of the last job. For more information about the Status panel, see Monitor Status of Data Loads, Data Flows, and Workflows.

After designing the required data flow,

  • Click save icon, to save the created/designed data flow.
  • Click auto layout icon, to align the nodes of the designed data flow.
  • Click execute icon, to execute the created data flow.
  • Click validate icon, to validate the created data flow.
  • Click Zoom-in and zoom-out icons, to maximize or minimize the created data flow diagram in the design canvas.

Supported Database Functions

Oracle Data Transforms supports various database functions that you can drag and drop on the Design Canvas to connect components within a data flow.

The Database Functions toolbar in the Data Flow editor includes the following database functions that can be used in your data flows:

  1. Data Transformation

    It contains the following components:

    • Aggregate
    • Expression
    • Filter
    • Join
    • Distinct
    • Lookup
    • Set
    • Sort
    • Subquery Filter
    • Table Function
  2. Data Preparation

    It contains the following components:

    • Data Cleanse
    • Substitution
    • Equi_Width Binning
    • Quantile Binning
    • Lead
    • Lag
    • Replace
  3. Machine Learning

    It contains the following components:

    • Prediction
    • Outlier Detection
  4. Text

    It contains the following components:

    • REGEXP COUNT
    • REGEXP INSTR
    • REGEXP SUBSTR
    • REGEXP REPLACE
    • Edit Distance Similarity
    • Contains
  5. Oracle Spatial and Graph

    It contains the following components:

    • Buffer Dim
    • Buffer Tol
    • Distance Dim
    • Distance Tol
    • Nearest
    • Simplify
    • Point
    • Geocode Tools:
      Note

      The following Geocode Tools work only in non-Autonomous Database environment.
      • Geocode As Geometry
      • Geocode
      • Geocode Addr
      • Geocode All
      • Geocode Addr All
      • Reverse Geocode
      Note

      The following Geocode Tool works only in an Autonomous Database environment.
      • Geocode Cloud
    • Spatial Join

Add Components

Add the data entities and database functions to the Design Canvas, and connect them in a logical order to complete your data flows.

To add components to your data flow:
  1. In the Data Entities panel, click Add a Schema to add schemas that contain the data entities that you want to use in the data flow.
  2. In the Add a Schema page, select the connection and schema name.
  3. Click Import.
  4. In the Import Data Entities page, select the Type of Objects you want to import. Choose a Mask/filter if you don't want to import every object in the schema and click Start.
  5. The Data Entities panel lists the imported data entities. The panel includes various options that let you do the following:
    • Refresh Data Entities – Click the Refresh iconRefresh icon to refresh the displayed list.
    • Name - Search for data entities by name.
    • Tags - Filter the data entities by the name of the tag used.
    • Import Data Entities - Right-click the schema to see this option. Use this option to import the data entities.
    • Remove Schema - Right-click the data entity to see this option. Use this option to remove the schema from the list. Note that this option does not delete the schema, it only removes the association of the schema with this data flow.
  6. Similarly add more schemas to the Data Flow, if required.
  7. Drag the required Data Entities that you want to use in the data flow and drop them on the design canvas.
  8. From the Database Functions toolbar, drag the transformation component that you want to use in the data flow and drop them on the design canvas. You can use variables in the data flow. See Use Variables in a Data Flow for more information.
  9. Select an object on the design canvas, and drag the Connector icon (Connector icon) next to it to connect the components.
  10. After saving the data flow, there may be a Transfer icon overlaid on one or more of the component connections. This indicates that ODI has detected an additional step and it is required to move the data between data servers. You can click on this Icon to view properties associated with this step.
For example:
Description of add-components.png follows

Component Properties

The Properties Panel displays various settings for components selected in the Design Canvas.

Depending on the component selected, you may see any of the following icons:

  • General (General) - Displays the name of the component along with its connection and schema details. You can edit some of these properties.
  • Attributes (Attributes) - Displays the details of all the attributes associated with the component.
  • Column Mapping (Column Mapping) - Allows you to map all the columns automatically. See Map Data Columns for more information.
  • Preview (Preview) - Displays a preview of the component. For Oracle tables, you can also view the statistics of the selected data entity. See View Statistics of Data Entities for details about the statistical information available.
  • Options (Options) - Displays options such as
    • Truncate Table - Replaces any existing target table content with new data.
    • Append - Inserts records from the flow into the target. Existing records are not updated.
    • Incremental - Integrates data in the target table by comparing the records of the flow with existing records and updating the records when their associated data is not the same. Those that don't yet exist in the target are inserted.
      The option includes an Auto compression feature that is set to True by default. For data flow jobs that use the Incremental Update mode to load data onto a compressed Oracle target partition, the Auto compression feature recompresses the modified target partitions after the load completes successfully. For table partitions that are not originally compressed, the compression is skipped irrespective of whether Auto compression is set to true.
      Note

      The Auto compression option is available to the ADMIN user or to a user with the DWROLE role. For data flows that have schema users other than ADMIN you need to either assign the DWROLE to the user or disable Auto compression to avoid execution errors.

Map Data Columns

When you connect the source data entity with the target data entity, the column names are automatically mapped by the column names. You have a choice to map the columns by Position or by Name or map the columns manually using the Expression Editor.

To map columns by Position or by Name:

  1. Select the target Data Entity.
  2. Click the arrow icon present on the top right corner to expand the Properties Panel. This will give you more space to work with.
  3. In the Properties Panel, click the Column Mapping icon (Column Mapping icon).
  4. To map the columns by Position or by Name, from the Auto Map drop-down menu, select By Position or By Name.

To map the columns manually:

  1. From the Auto Map drop-down menu, select Clear to clear the existing mappings.
  2. Drag and drop the attributes from the tree on the left to map with the Expression column.
  3. To edit an expression, click the Edit icon of the respective column. The Expression Editor appears allowing you to perform the required changes (for example, you can just add an expression-"UPPER" or open the Expression Editor to edit the expression).
    Note

    Use the expression editor only if you have complex expressions for a particular column.
  4. Click OK.

Validate and Execute a Data Flow

After your mappings are ready, you can proceed to validate and execute the data flow.

Do the following:
  1. Click Save.

    After saving, if data needs to be staged before transforming, Transfer button is added to one or more links. You can click these buttons to set more options, if available.

  2. Click the Simulate Code icon (Simulate Code icon) if you want to check the code that will run to complete the tasks that are performed when you execute the data flow job. The source and target details are displayed in different colors for ease of reference. This is handy if you want to check if the mapping is correct before you run the job or if the job fails. Note that the code cannot be used for debugging. For detailed information about the job, see the Job Details page.
  3. Click the Validate icon (Validate icon) in the toolbar above the design canvas to validate the data flow.
  4. After a successful validation, click the Execute icon (Execute icon) next to the Validate icon to execute the data flow.

    A message appears that displays the execution Job ID and name. To check the status of the data flow, see the Status panel on the right below the Properties Panel. For details about the Status panel, see Monitor Status of Data Loads, Data Flows, and Workflows. This panel also shows the link to the Job ID that you can click to monitor the progress on the Jobs page. For more information, see Create and Manage Jobs.

    For data flows created using Oracle Object Storage connections, the data from the source CSV file is loaded into the target Oracle Autonomous Database. You can also export data from an Oracle Autonomous Database table to a CSV file in Oracle Object Storage.