Creating Data Flows

Perform the following steps to access the Data flow editor where you can start creating your data flows.

To start creating data flows:

  • On the Home page, click Transform Data.

    Or

  • On the Projects page, click Create Data Flow.

    Or

  • Select an existing project and on the Project Details page, click Create Data Flow.

Note:

When you see a blank page while clicking Create Data Flow, clear your browser cache and refresh the page to navigate to the Data Flow Editor.

The Data Flow Editor appears allowing you to create a new data flow.

About Data Flow Editor

The Data flow editor is divided into five parts, the Data Entity Panel, the Transformations Toolbar, the Design Canvas, the Properties Panel, and the Status Panel.



  • Data Entities Panel: The data entity panel displays the Data Entities that are available to use in your Data flows. The displayed list can be filtered using the Name and Tags fields. The panel includes options that let you add schemas, import data entities, remove any of the schemas that are associated with the data flow, and refresh data entities. See Add Components for information about how to use these options.
  • Database Functions Toolbar: The Database Functions toolbar display the database functions that can be used in your data flows. Just like Data Entities, you can drag and drop the Database tools you want to use on the design canvas.
  • Design Canvas: The design canvas is where you build your transformation logic. After adding the Data Entities and Database Functions to the design canvas, you can connect them in a logical order to complete your data flows.
  • Properties Panel: The properties panel displays the properties of the selected object on the design canvas. The Properties Panel is grouped into four Tabs. General, Attributes, Preview Data, Column Mapping, and Options. Not all tabs are available as they vary based on the selected object. See Component Properties to know more about these options.
  • Status Panel: When you run a data flow, the Status Panel shows the status of the job that is running in the background to complete the request. You can see the status of the job that is currently running or the status of the last job. For more information about the Status panel, see Viewing Status of Data Loads, Data Flows, and Workflows.

After designing the required data flow,

  • Click save icon, to save the created/designed data flow.
  • Click auto layout icon, to align the nodes of the designed data flow.
  • Click execute icon, to execute the created data flow.
  • Click validate icon, to validate the created data flow.
  • Click Zoom-in and zoom-out icons, to maximize or minimize the created data flow diagram in the design canvas.

Oracle Database Functions

Oracle Data Transforms supports the following components to perform various database functions:

  1. Data Transformation

    It contains the following components:

    • Aggregate
    • Expression
    • Filter
    • Join
    • Distinct
    • Lookup
    • Set
    • Sort
    • Split
    • Subquery Filter
    • Table Function
  2. Data Preparation

    It contains the following components:

    • Data Cleanse
    • Equi_Width Binning
    • Quantile Binning
    • Lead
    • Lag
    • Replace
  3. Machine Learning

    It contains the following components:

    • Prediction
    • Outlier Detection
  4. JSON/XML/Text

    It contains the following components:

    • REGEXP COUNT
    • REGEXP INSTR
    • REGEXP SUBSTR
    • REGEXP REPLACE
    • Edit Distance Similarity
    • Contains
  5. Oracle Spatial and Graph

    It contains the following components:

    • Buffer Dim
    • Buffer Tol
    • Distance Dim
    • Distance Tol
    • Nearest
    • Simplify
    • Point
    • Spatial Join
    • Geocode Tools:

      Note:

      The following Geocode Tools work only in non-Autonomous Database environment.
      • Geocode As Geometry
      • Geocode
      • Geocode Address
      • Geocode All
      • Geocode Address All
      • Reverse Geocode

Create a New Data Flow

A data flow defines how the data is moved and transformed between different systems.

You can create data flows in any of the following ways:

From the Projects page

To create a data flow from the Projects page,

  1. On the Projects page, click Create Data Flow.

    Create Data Flow page appears:

  2. In the Name field, enter a name for the new data flow.
  3. Select Create New Project, if you wish to create new project folder for the newly created data flow.
  4. Else, click Add to Existing Projects, if you wish to add the newly created data flow to an existing project folder.
  5. If you have selected Create New Project for the previous option, in the Project Name field, enter the name of the newly created project.
  6. Else, if you have selected Add to Existing Projects for the previous option, select the required project from the Project Name drop-down arrow.
  7. In the Description field, enter a description for the newly created data flow.
  8. Click Create.

From the Data Flows page within a project

To create a data flow from the Data Flows page within a project,

  1. On the Projects page click the project tile you wish to create a new data flow for. The Project Details page appears.
  2. In the Data Flows page, click Create Data Flow.
  3. Provide the Name and Description of the new data flow.
  4. Click Next.
  5. To define your source connection, from the Connection drop-down, select the required connection from which you wish to add the data entities.
  6. In the Schema drop-down, all schema corresponding to the selected connection are listed in two groups:
    • Existing Schema (ones that you've imported into Oracle Data Transforms) and
    • New Database Schema (ones that you've not yet imported).

    Select the schema that you want to use from the drop down. For Oracle Object Storage connections, the Schema drop-down lists the name of the bucket that you specified in the URL when you created the connection.

  7. Click Save.

    The Data Flow Editor appears that allows you to create a new data flow.

From the Home page

To create a data flow from the Home page,

  1. On the Home page, click Transform Data. The Create Data Flow page appears.
  2. Provide the Name and Description of the new data flow.
  3. Select a project name from the drop-down. Alternatively, click the + icon to create a project.
  4. Click Next.
  5. From the Connection drop-down, select the required connection from which you wish to add the data entities. Alternatively, click the + icon to create a new connection.
  6. In the Schema drop-down, all schema corresponding to the selected connection are listed in two groups:
    • Existing Schema (ones that you've imported into Oracle Data Transforms) and
    • New Database Schema (ones that you've not yet imported).

    Select the schema that you want to use from the drop down.

  7. Click Save.

Add Components

To add components to your data flow:
  1. In the Data Entities panel, click Add a Schema to add schemas that contain the data entities that you want to use in the data flow.
  2. In the Add a Schema page, select the connection and schema name.
  3. Click Import.
  4. In the Import Data Entities page, select the Type of Objects you want to import. Choose a Mask/filter if you don't want to import every object in the schema and click Start.
  5. The Data Entities panel lists the imported data entities. The panel includes various options that let you do the following:
    • Refresh Data Entities – Click the Refresh iconRefresh icon to refresh the displayed list.
    • Name - Search for data entities by name.
    • Tags - Filter the data entities by the name of the tag used.
    • Import Data Entities - Right-click the schema to see this option. Use this option to import the data entities.
    • Remove Schema - Right-click the data entity to see this option. Use this option to remove the schema from the list. Note that this option does not delete the schema, it only removes the association of the schema with this data flow.
  6. Similarly add more schemas to the Data Flow, if required.
  7. Drag the required Data Entities that you want to use in the data flow and drop them on the design canvas.
  8. From the Transformations toolbar, drag the transformation component that you want to use in the data flow and drop them on the design canvas.
  9. Select an object on the design canvas, and drag the Connector icon (Connector icon) next to it to connect the components.
  10. After saving the data flow, there may be a Transfer icon overlaid on one or more of the component connections. This indicates that ODI has detected an additional step and it is required to move the data between data servers. You can click on this Icon to view properties associated with this step.
Component Properties

The Properties Panel displays various settings for components selected in the Design Canvas. Depending on the component selected, you may see any of the following icons:

  • General (General) - Displays the name of the component along with its connection and schema details. You can edit some of these properties.
  • Attributes (Attributes) - Displays the details of all the attributes associated with the component.
  • Column Mapping (Column Mapping) - Allows you to map all the columns automatically. See Map Data Columns for more information.
  • Preview (Preview) - Displays a preview of the component. For Oracle tables, you can also view the statistics of the selected data entity. See Viewing Statistics of Data Entities for details about the statistical information available.
  • Options (Options) - Displays options such as
    • Truncate Table - Replaces any existing target table content with new data.
    • Append - Inserts records from the flow into the target. Existing records are not updated.
    • Incremental - Integrates data in the target table by comparing the records of the flow with existing records and updating the records when their associated data is not the same. Those that don't yet exist in the target are inserted.
      The option includes an Auto compression feature that is set to True by default. For data flow jobs that use the Incremental Update mode to load data onto a compressed Oracle target partition, the Auto compression feature recompresses the modified target partitions after the load completes successfully. For table partitions that are not originally compressed, the compression is skipped irrespective of whether Auto compression is set to true.

      Note:

      The Auto compression option is available to the ADMIN user or to a user with the DWROLE role. For data flows that have schema users other than ADMIN you need to either assign the DWROLE to the user or disable Auto compression to avoid execution errors.
Map Data Columns
When you connect the source data entity with the target data entity, the column names are automatically mapped by the column names. You have a choice to map the columns by Position or by Name or map the columns manually using the Expression Editor.

To map columns by Position or by Name:

  1. Select the target Data Entity.
  2. Click the arrow icon present on the top right corner to expand the Properties Panel. This will give you more space to work with.
  3. In the Properties Panel, click the Column Mapping icon (Column Mapping icon).
  4. To map the columns by Position or by Name, from the Auto Map drop-down menu, select By Position or By Name.

To map the columns manually:

  1. From the Auto Map drop-down menu, select Clear to clear the existing mappings.
  2. Drag and drop the attributes from the tree on the left to map with the Expression column.
  3. To edit an expression, click the Edit icon of the respective column. The Expression Editor appears allowing you to perform the required changes (for example, you can just add an expression-"UPPER" or open the Expression Editor to edit the expression).

    Note:

    Use the expression editor only if you have complex expressions for a particular column.
  4. Click OK.

Create Data Entities within the Data Flow editor

If you have already created or imported your target data entity, then you would drag the data entity onto the design canvas and complete the column mappings and options.

To create the definition of a new target table while in the Data Flow editor,

  1. Select the component at the end of your data flow.
  2. Click the Add Data Entity icon Add Data Entity icon present on the top right corner of the target component.
  3. Add Data Entity page appears allowing you to configure the following details of the target component:

    General tab

    • In the Name text box, enter the name of the newly created Data Entity.
    • The Alias text box is auto-populated with the name of the newly created Data Entity.
    • From the Connection Type drop-down, select the required connection from which you wish to add the newly created Data Entity.
    • It loads the server name coined at the time of connection creation. From the Server drop-down, select the required server name from which you wish to add the newly created Data Entity.
    • From the Schema drop-down, select the required schema.
    • Click Next.

    Columns tab

    It allows you to create, remove or edit the column definitions.

    • Click the Add Columns icon Add Columns icon, to add new columns to the newly created Data Entity.

      A new column is added to the displayed table.

    • The table displays the following columns:
      • Name
      • Data Type - Click the cell to configure the required Data Type.
      • Scale
      • Length
      • Actions - Click the cross icon to delete the created column.
    • To delete the columns in bulk, select the columns and click the Delete icon Delete icon.
    • To search for the required column details, in the Search text box enter the required column name and click enter. The details of the required column are displayed.
    • Click Next.

    Preview Data Entity tab

    It displays a preview of all the created columns and their configured details. If the data entity belongs to an Oracle database, you can also view statistics of the table. See Viewing Statistics of Data Entities for more information.

  4. Click Save.

    The new target Data Entity is created.

  5. Expand the Properties Panel in the right pane to view the following settings of the created components:
    • General - Displays the Name of the component along with its Connection and Schema details.
    • Attributes - Displays the details of all the attributes associated to the component.
    • Column Mapping - Click Auto Map to map all the columns automatically.
    • Preview - Click to have a preview of the component.
    • Options - Change the options as appropriate.

Validate and Execute a Data Flow

After your data flow is ready, do the following:
  1. Click Save.

    After saving, if data needs to be staged before transforming, Transfer button is added to one or more links. You can click these buttons to set more options, if available.

  2. Click the Simulate Code icon (Simulate Code icon) if you want to check the code that will run to complete the tasks that are performed when you execute the data flow job. The source and target details are displayed in different colors for ease of reference. This is handy if you want to check if the mapping is correct before you run the job or if the job fails. Note that the code cannot be used for debugging. For detailed information about the job, see the Job Details page.
  3. Click the Validate icon (Validate icon) in the toolbar above the design canvas to validate the data flow.
  4. After a successful validation, click the Execute icon (Execute icon) next to the Validate icon to execute the data flow.

    A message appears that displays the execution Job ID and name. To check the status of the data flow, see the Status panel on the right below the Properties Panel. For details about the Status panel, see Viewing Status of Data Loads, Data Flows, and Workflows. This panel also shows the link to the Job ID that you can click to monitor the progress on the Jobs page. For more information, see Managing Jobs.

    For data flows created using Oracle Object Storage connections, the data from the source CSV file is loaded into the target Oracle Autonomous Database. You can also export data from an Oracle Autonomous Database table to a CSV file in Oracle Object Storage.