2.7 Workflows

Workflows enable you to design machine learning processes using a drag-and-drop canvas composed of nodes and connectors. Each node represents a task or step in the machine learning workflow. The connectors define the flow between these steps.

To access Workflows, click Workflows on your OML UI home page. Alternatively, you can select Workflows under Project on the left navigation menu to open the Workflows page.

Figure 6-19 Home page with Workflows icon highlighted



The Workflows page lists all the workflows that are created.

Figure 6-20 Workflows Listing page



You can perform the following tasks:
  • Create: Click Create to create a new workflow.
  • Edit: Select a workflow from the list and click Edit to edit it.
  • Duplicate: Select any workflow from the list and click Duplicate to create a copy. The duplicated workflow will immediately appear with a Ready status.
  • Delete: Select any workflow from the list and click Delete to remove it. You cannot delete a workflow that is currently running. You must stop it before attempting to delete.
  • Start: If you have created a workflow but have not run it, click Start to run the workflow.
  • Stop: If a workflow is running, select it and click Stop to halt the running of the workflow.

2.7 High-Level Steps to Create a Workflow

This topic lists the high-level steps to create a workflow.

To create a workflow:
  1. Open an existing workflow, or click Create on the Workflow listing page.
  2. Add workflow nodes to the canvas by dragging them from the palette or by clicking their Add buttons.

    Note:

    This is a typical flow of nodes. The nodes can be used in multiple ways and you need not use all the nodes in a workflow.
    1. Data Source Node
    2. Feature Selection Node
    3. Model Build Node
    4. Model Apply Node
    5. Model Evaluation Node
    6. Deploy a model
  3. Configure the settings for each node.
  4. Run the nodes individually, or click Run All to run the entire workflow.
Here is a screenshot depicting a workflow with all node types.

Figure 6-21 A workflow with all node types



2.7 Create a Workflow

A workflow is a collection of interconnected machine learning tasks or operations, represented by workflow nodes. Each node represents a distinct computational task—such as data import, data transformation, feature selection, model training, model evaluation, or model scoring—within the workflow. Connectors define the sequence and dependencies between nodes.

To create a workflow:
  1. On the Oracle Machine Learning UI home page, click Workflows. Alternatively, open the left navigation menu, expand Project and then click Workflows. This opens the Workflows listing page.

    Figure 6-22 Home page with Workflows options highlighted



  2. On the Workflows listing page, click Create.

    Figure 6-23 Workflows listing page



    This opens the Create Workflow dialog.
  3. In the Create Workflow dialog, enter these details:

    Figure 6-24 Create Workflow dialog



    1. Name: Enter a name for the workflow. In this example, the name of the workflow is Predict.
    2. Comment: Enter comments, if any.
    3. Click OK. A blank workflow is created and it opens in the workflow editor. In the workflow editor, you can drag and drop the nodes from the Workflow nodes palette to the canvas, or simply click add node to add the nodes to the canvas and run it.

    Figure 6-25 A complete workflow



    On the Workflows editor:
    • To the left is the Workflow Nodes palette. This is where all the workflow nodes are available.
    • At the center is the canvas. This is where you create a workflow.
    • To the right is the Settings pane. This pane opens specific settings for the nodes that you select. By default, this pane is collapsed. Click expand pane to expand the pane. In the screenshot above, the Model apply node is selected, and the settings pane of this node is visible to the right.
    • At the top is the canvas toolbar.Toolbar
      • Click zoom out for a broader view of the canvas.
      • Click Zoom out for a closer view of the canvas.
      • Click Run all to run all the nodes in the workflow.

        Note:

        To use this option, ensure that all the nodes in the workflow are defined correctly.
      • Click Run to run a selected node.
      • Click Dependencies to run the nodes with dependencies—run all parents nodes (upstream nodes in the workflow) along with selected nodes, if any.
      • Click Dependents to run the node with dependents—run children nodes (downstream nodes) along with selected nodes, if any.
      • Click Stop to stop a workflow or a node that is running.
      • Click Close to exit the canvas. This takes you back to the Workflows listing page.

2.7 Workflow Nodes

A workflow node is an individual computational component in a workflow. Each node represents a specific task or operation, such as importing and preparing data, training a model, or evaluating results.

Add nodes to the canvas by dragging them from the palette and dropping them onto the canvas, or by selecting a node from the palette to place it.
The available workflow nodes include:
  • Data Source Node: Specifies the workflow’s data source (schema and table). This node supports data-related tasks such as data import, data splitting, and computing basic statistics. It typically serves as the starting point of the workflow.
  • Feature Selection Node: Identifies and selects a subset of the most important features (columns) to help improve model performance. This node evaluates attribute importance during model training.
  • Model Build Node: Trains machine learning or statistical models using supported algorithms. This node manages model training and model creation for downstream evaluation and scoring.
  • Model Evaluation Node: Measures model performance by running evaluation tasks.
  • Model Apply Node: Applies a trained model to input data to generate predictions or scores.
Here are the high-level steps to create a workflow:
  1. Open an existing workflow, or click Create on the Workflow listing page.
  2. Drag and drop the required workflow nodes from the palette onto the canvas.
  3. Configure settings for each node as needed.
  4. Run individual nodes or click Run all to run the entire workflow.
Below is a screenshot of a workflow containing all the available workflow nodes.

Figure 6-26 A complete workflow



2.7 Data Source Node

The Data source node defines the workflow’s data source by specifying the schema and table. It supports data-related tasks such as data import, data splitting, and basic statistical computations. This node typically serves as the starting point of a workflow.

Allowed Connections:
  • Upstream node: None
  • Downstream node: Feature Selection node, Model Build node, Model Evaluation node
To create and run a Data Source node:
  1. On the Workflows editor, drag and drop the Data Source node from the palette to the canvas. Alternatively, you may also click addicon next to the Data Source node to add it to the canvas.
  2. Adding a node automatically opens the settings pane on the right.

    Figure 6-27 Settings - Data Source node



    Here, define the data source:
    1. Name: Enter a name. In this example, it is Data source 1.
    2. Source Schema: Click on the drop-down menu to select the schema. Here, SH schema is selected.
    3. Source table: Click on the drop-down menu to select the table. Here, SUPPLEMENTARY_DEMOGRAPHICS table is selected. Once you select the table, the Features field below displays the columns of the table.
    4. Target ID column name: From the drop-down menu, select a target. This is the target for your prediction.
    5. Features: This field displays all the columns of the selected table, and has all of them selected. You may deselect any column.
    6. Split Data: By default, this option is selected and 20% is allotted for test data. You can edit this field.
    7. Compute Descriptive Stats: Select this option to allot a dataset percentage for descriptive statistics computation. The default is 100%.
    8. Compute Data Correlation: Select this option if you want to compute correlation between two variables in the dataset. Correlation is a bivariate analysis that measures the strength of association between two variables and the direction of the relationship. The correlation metrics are:
      • Pearson correlation: Measures the degree of the relationship (strength and direction) between two linearly related variables.
        • Strength: A value close to 1 or -1 indicates a strong linear relationship, while a value near 0 suggests a weak or nonexistent one.
        • Direction can be positive or negative. Positive Correlation—Occurs when one variable increases, the other also tends to increase. Negative Correlation—Occurs when one variable increases, the other tends to decrease.
      • Kendall correlation: A non-parametric test that measures the strength of dependence between two variables.
      • Spearman correlation: A non-parametric test that measures the degree of association between two variables.
  3. After defining the node, click on the ellipses on the node or right-click to open the menu. Click Run.
    Once the node runs successfully, it is indicated by the run successicon.

    Note:

    You also have the option to add all the nodes to the workflow first, define the settings and connections for each node, and then run all the nodes by clicking Run all.
  4. Click on the ellipses on the node or right click on the node and click Results to to view the results.
    • The Data Statistics tab displays the basic statistics computed for the selected features (columns). The computed statistics are Distinct Count, Number of Unique values, Top N, Frequent values, Mean, Std deviation, Min, 25, 50, 75 quantiles, and Max.

      Figure 6-29 Data Statistics tab



    • The Data Correlation tab displays the correlation level between the columns. Positive values indicate positive correlation, and the different shades of red indicate the different levels of positive correlation. Negative values indicate negative correlation, and the different shades of blue indicate the levels of negative correlation. A value of 1 indicates perfect positive correlation and a value of -1 indicates perfect negative correlation. See screenshot below:

      Figure 6-30 Data Correlation tab



    • The Data Preview tab shows the preview of the data set used for the workflow.
  5. You may now proceed to add the next node in the flow. If you need suggestions, click the ellipses on the Data Source node or right-click to open the available options, and click Recommend next node. The applicable nodes are displayed. Click on the one that applies to add it to the canvas. Here you have two options:

    Figure 6-32 Nodes - right-click options



    • Add node manually: Drag and drop the node from the palette to the canvas. In this example, the Feature Selection node is added. You must now connect the Data Source node to the Feature Selection node. For this, click on the connection port indicated by the dot on the Upstream node. A connector appears. Drag the connector to the downstream node to establish the connection.
    • Click Recommend next node: Click this option to view the list of recommended nodes.
  6. Clicking Recommend next node gives you the following options under two categories.
    1. Train Split
      • Feature Selection
      • Model Build
    2. Test Split
      • Model Evaluation
    Click on the node that you want to add to the workflow. The connector is automatically added from the Data Source node to the selected node.

2.7 Feature Selection Node

The Feature Selection node selects a subset of relevant features (columns) to help improve model performance. It uses attribute-importance results generated during model training to guide feature selection.

Allowed Connections:
  • Upstream node: Data Source node
  • Downstream nodes: Feature Selection Node, Model Evaluation node, Model Apply node
To add and run a Feature Selection node:
  1. You can add the Feature Selection node to the workflow in two ways:
    • From the workflow palette, drag and drop the Feature Selection node to the canvas. Alternatively, you may also click addon the node.
    • You may also click Recommend next node on the upstream node (Data Source node) and then click Feature Selection.

      Note:

      Using the Recommend next node option automatically connects the nodes.
  2. If you manually add the node, you must establish a connection from the Data Source node (upstream node) to the Feature Selection node (downstream node). For this, click on the connection port indicated by the dot on the Upstream node. A connector appears. Drag the connector to the downstream node to establish the connection.
    The Data Source node has two connection ports. The one on the left is for Train dataset, and the one on the right is for Test dataset.

    Figure 6-33 Connection ports on a Data Source node



  3. Once you connect the nodes, click on the Feature Selection node to open the settings. If the Settings pane is not visible, click pane open at the bottom to expand the pane.

    Figure 6-34 Data Source and Feature Selection node connection



  4. On the Settings pane, define the settings for the node.

    Figure 6-35 Settings - Feature Selection node



    In this example, AFFINITY_CARD is selected as the target id column name and CUST_ID is the Case id column. The Mining function selected is Classification. The selection of mining function is automatic. It depends on the target you selected. However, you can override it.
  5. Right click on the node or click on the ellipses and then click Run. Once the node runs successfully, it is indicated by the Successicon, as shown in the screenshot below.

    Figure 6-36 Feature Selection node



  6. Once the node is run successfully, right-click on the node and then click Results to view the results.

    Figure 6-37 Feature Selection Results



  7. You may now add the next node in the workflow. If you need suggestions, right-click on the node, and click Recommend next node to proceed with the workflow.
    The recommended next node is the Model Build node.

    Figure 6-38 Right-click options



2.7 Model Build Node

The Model Build node builds and trains machine learning or statistical models using supported algorithms.

To add the Model Build node to your workflow, you must first have the Data Source node and Feature Selection node defined:
Allowed Connections:
  • Upstream nodes: Data Source node, Feature Selection node
  • Downstream nodes: Model Evaluation node, Model Apply node
To add and run a Model Build node:
  1. On the Workflow editor, drag and drop a Model Build node from the palette to the canvas.
  2. From the Upstream node (Feature Selection node), establish a connection to the Model Build node. For this, click on the connection port indicated by the dot on the Upstream node. A connector appears. Drag the connector to the downstream node to establish the connection.

    Figure 6-39 Model Build node in the workflow



  3. Once you connect the nodes, define the settings. Click on the Model Build node to open the settings pane. If the Settings pane is not visible, click pane open on the bottom right to expand the pane.

    Figure 6-40 Settings - Model Build node



    For the Model Build settings, the value for Target ID column name is set to AFFINITY_CARD, Mining Function is set to Classification, Case ID is CUST_ID, Algorithms selected are Decision Tree, Naive Bayes, Neural Network, and Random Forest, and YRS_RESIDENCE is selected as the Partitioning column. The selection of mining function is automatic. It depends on the target you selected. However, you can override it.
  4. Right click on the node or click on the ellipses, and then click Run. Once the node runs successfully, it is indicated by the Successicon, as shown in the screenshot below.

    Figure 6-41 Model Build node in the workflow



  5. Once the node is run successfully, right-click on the node and then click Results to view the results. The results computed by the selected algorithms are displayed in respective tabs, as shown in the screenshot here.

    Figure 6-42 Results - Model Build node



    The results for each algorithm are displayed in separate tabs. In this example, the results are displayed in separate tabs by the name—Random Forest, Neural Network, Naive Bayes, and Decision Tree. These are the algorithm selected in the settings for this node. Each tab displays the Model Details (model name, algorithm, mining function), Model Settings for the specific algorithm, and Global Statistics containing the attributes and attribute values, as shown in the screenshot above.

    Note:

    You cannot rename the underlying model. However, you can assign a name to a model deployed to OML Services.
  6. You may now add the next node in the workflow. If you need suggestions, right-click on the node and click Recommend next node to proceed with the workflow.
    The recommended next nodes are Model Evaluation node and Model Apply node.

    Figure 6-43 Model Build node right-click options



    On the Model Build node, you also have the option to Deploy the model.
    See Deploy a model for more information.

2.7 Model Evaluation Node

The Model Evaluation node evaluates model performance using available evaluation metrics and reports.

Allowed Connections:
  • Upstream nodes: Data Source node, Model Build Node
  • Downstream nodes: None
To add and run a Model Evaluation node:
  1. In the Workflow editor, drag and drop a Model Evaluation node from the palette to the canvas. Alternatively, you may also click addon the node.
  2. From the Model Build node, establish a connection to the Model Evaluation node. If you are adding it manually, establish a connection from the Upstream node (Model Build node) to the Downstream node (Model Evaluation node). For this, click on the connection port indicated by the dot on the Upstream node. A connector appears. Drag the connector to the downstream node to establish the connection.
  3. Drag and drop another Data Source node and establish a connection to the Model Evaluation node. Define the second Data Source node with the same settings.

    Note:

    The second nub on the Data source 1 node can be a test data set for model evaluation.

    Figure 6-44 Model Evaluation node



  4. Once you connect the nodes, define the settings. Click on the node to open the settings pane. If the Settings pane is not visible, click pane openat the bottom to expand the pane.
  5. Right click on the node or click on the ellipses and then click Run. Once the node runs successfully, it is indicated by the Successicon, as shown in the screenshot below.

    Figure 6-45 Model Evaluation node in the workflow



  6. Once the node is run successfully, right-click on the node and then click Results to view the results.

    Figure 6-46 Results - Model Evaluation node



    Note:

    The Model Evaluation node is a terminal node. You cannot add any downstream nodes to it.

2.7 Model Apply Node

The Model Apply node applies a trained model to input data to generate predictions or scores.

Allowed Connections:
  • Upstream node: Data Source node, Model Build node.
  • Downstream node: None.
To add and run a model apply node:
  1. From the workflow palette, drag and drop the Model Apply node to the canvas. Alternatively, you may also click addon the node on the palette.
  2. From the Model Build node, establish a connection to the Model Apply node. If you are adding it manually, establish a connection from the Upstream node (Model Build node) to the Downstream node (Model Apply node). For this, click on the connection port indicated by the dot on the source node to the child node.
  3. Drag and drop another Data Source node to the palette. Establish a connection from the Upstream node (Data Source node) to the Downstream node (Model Apply node).
    You may add additional Data Source nodes for test and train datasets, as shown in the screenshot here.
  4. Once you connect the nodes, define the settings. Click on the node to open the settings pane. If the Settings pane is not visible, click pane open on the bottom right to expand the pane.

    Figure 6-48 Settings - Model Apply node



  5. Right click on the node or click on the ellipses and then click Run. Once the node runs successfully, it is indicated by the Successicon, as shown in the screenshot below.

    Figure 6-49 Model Apply node in the workflow



  6. Once the node is run successfully, right-click on the node and then click Results to view the results.

    Figure 6-50 Results - Model Apply node



    Note:

    The Model Apply node is a terminal node. You cannot add any downstream nodes to it.

2.7 Deploy a model

Deploying a model creates an Oracle Machine Learning Services endpoint that can be used for scoring.

You can deploy a model directly from the Model Build node in the Oracle Machine Learning workflow.
To deploy a model:
  1. Open the workflow and right-click on the Model Build node. You may also click on the ellipses on the node. This opens the right-click menu.
  2. Click Deploy. The models are listed.

    Figure 6-51 Deploy option in Model Build node



  3. Click on the model you want to deploy. The Deploy Model dialog opens.

    Note:

    The model name begins with a prefix made up of the first two letters of the algorithm name. For example, the model name RF_951498DC16 indicates that the model was built using the Random Forest algorithm. Similarly, NN indicates Neural Network, NB indicates Naive Bayes, DT indicates Decision Tree and so on.

    Once you've built an in-database model using Oracle Machine Learning Workflow, it is ready to be used in SQL queries. As such it is already "deployed." You can grant access to this model to other schemas, export this model and import it into another database for deployment, and deploy this model to OML Services, which is enabled directly through the Oracle Machine Learning Workflow UI. OML Services deployment supports lightweight, real-time scoring using REST, among other features.
  4. In the Deploy Model dialog, define the following:

    Figure 6-52 Deploy Model dialog



    1. In the Name field, the system-generated model name is displayed here by default. This is the name the model for use in OML Services. The in-database model name remains unchanged. You can edit this name, which must be a unique alphanumeric name with maximum 50 characters.
    2. In the URI field, enter a name for the model URI. The URI must be alphanumeric, and the length must be max 200 characters. In the example here, the neural_network is the name provided for URI.
    3. In the Version field, enter a version of the model. The version must be in the format x.x where x is a number.
    4. In the Namespace field, enter a name for the model namespace. In the example here, supp_demographics is the name provided for Namespace. This is an optional field.
    5. In the Comments field, you may enter any comments.
    6. Click Shared to allow users with access to the database schema to view and deploy the model.
    7. Click OK.
  5. After a model is successfully deployed, it is listed in the Deployments tab on the Models page.

    Figure 6-53 Deployed models in the Deployments tab



    In the screenshot here, the model NN_3D2A959C1B deployed in steps 4 is listed on the Deployments tab.

Related Topics