4 Workflow

A Data Miner workflow is a mechanism to define data mining operations such as model build, test, and apply.

These operations are defined in nodes, which comprise the workflow. You can create multiple workflows in a single project.

About Workflows

A workflow must contain one or more sources of data, such as a table or a model.

For example, to build a Naive Bayes model, you must first identify the input with a Data Source node. Then you create a Classification node to build and test the model.

By using a workflow, you can:

  • Build, analyze, and test data mining process

  • Identify and examine data sources

  • Build and apply models

  • Create predictive queries

Workflow Sequence

A workflow is built and run in a certain sequence.

The sequence is:

  1. Create a blank workflow.

  2. Add nodes to the workflow.

  3. Connect the nodes in a workflow.

  4. Run the nodes.

  5. Examine the results.

  6. Repeat the steps as required.

Workflow Terminology

A workflow is a directed graph consisting of nodes that are connected to one another.

There are specific terms used to indicate the nodes in relation to the workflow. For example, consider a workflow composed of two nodes, N1 and N2. Assume that N1 is connected to N2:

  • Parent node and child node: The node N1 is the parent node of N2, and the node N2 is a child of N1. A parent node provides information that the child node needs when it runs. You must build a model before you apply to new data.

  • Descendants and ancestors: The node N2 is called the descendant of N1 if there is a workflow connection starting from N1 that eventually connects to N2. N1 is called the ancestor of N2. N1 is always closer to the root node than N2.

  • Root nodes: The nodes that have no parent nodes are called root nodes. All workflows have at least one root node. A parent node can have multiple root nodes.

    Note:

    A parent node is closer to a root node than its child node.

  • Siblings: If a node has several child nodes, then the child nodes are referred to as siblings.

  • Upstream: Parent nodes are called upstream of child nodes.

Workflow Thumbnail

The thumbnail view of the workflow provides an overall view of the workflow. Thumbnails are most useful for large workflows.

To open the thumbnail viewer, go to View and click Thumbnail. Alternately, press Ctrl+Shift+T to open the Thumbnail viewer.

In a Thumbnail view, you can:

  • Navigate to a specific node in the workflow

  • Change the zoom level of the workflow

  • Change the node that is currently viewed in the workflow

thumbnail view

The Thumbnail view includes a rectangle. Click inside the viewer to move the rectangle around, thereby changing the focus of the display in the workflow editor. You can also resize the rectangle for a finer level of control.

When viewing models in the thumbnail view, if the model generates trees, then the thumbnail view automatically opens to provide an overall view to display the shape of the tree. The thumbnail view also shows your location in the overall model view.

Components

The Components pane lists the components that you can add to workflows.

To add a component to a workflow, drag and drop the component from the Components pane to the workflow.

The Components pane opens automatically when you load a workflow. If the Components pane is not visible, then go to View and click Components.

The Components pane includes the following:
  • My Components: Contains the following two tabs:
    • Favorites

    • Recently Used

  • Workflow Editor: Contains the nodes categorized into categories from where you can use them to create a workflow.

  • All Pages: Lists all the available components in one list.

Workflow Editor

The Workflow Editor contains nodes that are categorized into sections. You can create and connect the following nodes in a workflow:

Workflow Properties

The Workflow Properties tab enables you to add or change comments associated with the selected workflow.

Related Topics

Properties

Properties enables you to view and change information about the entire workflow or a node in a workflow.

To view the properties of a node, right-click the node and select Go to Properties.

If you select either the workflow or one of its node, then the respective properties are displayed in the Properties pane. For example, if you select a Data Source node, then the Data Source node properties are displayed in the Properties pane.

Note:

In the earlier releases of Oracle Data Miner, the Properties tab was known as the Property Inspector.

You can perform multiple tasks for a node and a workflow from:

  • Properties context menu

  • Properties pane

Working with Workflows

A workflow allows you to create a series of nodes, and link them to each other to perform the required processing on your data. You cannot use Oracle Data Miner until you have created a workflow for your work.

Ensure that you meet the workflow requirements. You can perform the following tasks with a workflow:

Creating a Workflow

You must create a workflow to define data mining operations such as model build, test, and apply.

Before you create a workflow, ensure that you meet the workflow prerequisite conditions.

To create a blank workflow:

  1. In the Data Miner tab, expand Connections in the left pane.

    If the Data Miner tab is not open, then click Tools and select Data Miner.

  2. Select the project under which you want to create the workflow.
  3. Right-click the project and select New Workflow from the context menu. The Create Workflow dialog box opens.
  4. In the Name field, enter a unique name for the workflow.
  5. Click OK. This creates a blank workflow.

Workflow Name Restrictions

A workflow name must meet the following conditions:

  • The character count for the workflow name should be between 1 and 128.

  • The workflow name must not have a slash (/).

  • The workflow name must be unique within the project.

Deploying Workflows

Oracle Data Miner enables you to generate a script from a workflow that recreates all the objects generated by that workflow.

In earlier releases of Data Miner, you could only generate a script for Transformation nodes.

A script that recreates all objects generated by that workflow, also enables you to replicate the behavior of the entire workflow. Such scripts provide a basis for application integration or a lightweight deployment of the workflow, that does not require Data Miner repository installation and workflows in the target and production system.

Oracle Data Miner provides the following two types of deployment:

Deploy Workflows using Data Query Scripts

Any node that generates data has the Save SQL option in its context menu.

The Save SQL option generates a SQL script. The generated SQL can be created using the SQL*Plus script or as a standard SQL script. The advantage of the script format is the ability to override table and model references with parameters.

The Save SQL option enables you to select the kind of script to generate, and the location to save the script.

Related Topics

Deploy Workflows using Object Generation Scripts

The Object Generation Script generates scripts that create objects. The scripts that generate objects are:

  • Scripts Generated

  • Script Variable Definitions

Running Workflow Scripts

You can run workflow scripts using Oracle Enterprise Manager Jobs or Oracle Scheduler Jobs. You can run the generated scripts in the following ways:

  • As Oracle Enterprise Manager Jobs, either as SQL Script or an operating system command.

  • As Oracle Scheduler Jobs:

    • External Jobs that calls the SQL Script

    • Environment: Using Oracle Enterprise Manager for runtime management or using PL/SQL

  • Run the scripts using SQL*PLus or SQL Worksheet.

    Note:

    To run a script, ensure that Oracle Data Miner repository is installed.

Deleting a Workflow

You can delete a workflow from the workflow context menu.

To delete a workflow:

  1. In the Data Miner tab, expand Projects and select the workflow that you want to delete.
  2. Right-click and click Delete.

Loading a Workflow

When you open a workflow, it loads in the workflow pane.

After creating a workflow, you can perform the following tasks:

  • Load a workflow: In the Data Miner tab, expand the project and double-click the workflow name. The workflow opens in a new tab.

  • Close a workflow: Close the tab in which the workflow is loaded and displayed.

  • Save a workflow: Go to File and click Save. If a workflow has any changes, then they are saved when you exit.

Managing a Workflow

After you create a workflow, the workflow is listed under Projects in the Data Miner tab.

To perform any one of the following tasks, right-click the workflow under Projects and select an option from the context menu:

  • New Workflow: To create a new workflow in the current project.

  • Delete: To delete the workflow from the project.

  • Rename: To rename the workflow.

  • Export: To export a workflow.

  • Import: To import a workflow.

Exporting a Workflow Using the GUI

You can export a workflow, save it as an XML file, and then import it into another project.

Ensure that the Oracle Data Miner versions, the one from which you export the workflow and the one into which you import the workflow, are compatible.

To export a workflow:

  1. In the Data Miner tab, expand Connection and click the project under which the workflow is created.
  2. Right-click the workflow that you want to export and click Export. The Save dialog box opens.
  3. In the Save dialog box, navigate to the location where you want to export the workflow and click Save. The workflow is saved as an XML file.

    Note:

    The default directory to save the workflow is the system default directory for saving documents. You can change this directory.

Related Topics

Import Requirements of a Workflow

To import a workflow, you must meet the requirements related to workflow compatibility, permissions, user account related rights.

The import requirements of a workflow are:

  • All the tables and views used as data sources in the exported workflow must be in the new account.

  • The tables or views must have the same name in the new account as they did in the old account.

  • It may be necessary to redefine the Data Source node.

  • If the workflow includes Model nodes, then the account where the workflow is imported must contain all models that are used. The Model node may have to be redefined in the same way that Data Source nodes are.

  • You must have permission to run the workflow.

  • The workflow must satisfy the compatibility requirements.

Data Table Names

The account from which the workflow is exported is encoded in the exported workflow.

Assume that a workflow is exported from the account DMUSER and contains the Data Source node with data MINING_DATA_BUILD.

If you import the schema into a different account, that is, an account that is not DMUSER, and try to run the workflow, then the Data Source node fails because the workflow looks for DMUSER.MINING_DATA_BUILD_V.

To resolve this issue:

  1. Right-click the Data Source node MINING_DATA_BUILD_V and select Define Data Wizard.

    A message appears indicating that DMUSER.MINING_DATA_BUILD_V does not exist in the available tables or views.

  2. Click OK and select MINING_DATA_BUILD_V in the current account. This resolves the issue.

Workflow Compatibility

Before you import or export a workflow, ensure that Oracle Data Miner versions are compatible.

The compatibility requirements for importing and exporting workflows are:

  • Workflows exported using a version of Oracle Data Miner earlier than Oracle Database 11g Release 2 (11.2.0.1.2) cannot always be imported into a later version of Oracle Data Miner.

  • Workflows exported from an earlier version of Oracle Data Miner may contain invalid XML. If a workflow XML file contains invalid XML, then the import is terminated.

To check the version of Oracle Data Miner:

  1. Go to Help and click Version.

  2. Click the Extensions tab.

  3. Select Data Miner.

Building and Modifying Workflows

You can build and modify workflows using the Workflow Editor. The Workflow Editor is the tool to modify workflows.

The Workflow Editor is supported by:

Missing Tables or Views

If Oracle Data Miner detects that some tables or views are missing from the imported schema, then it gives you the option to select another table.

The schema that is imported into a workflow may not have all the tables or views used by the workflow. When Data Miner detects this problem, it generates the message
Table Selection Failure
indicating that the table is missing. You have the choice to select another table.

To select another table, click Yes. The Define Data Source Wizard opens. Use the wizard to select a table and attributes.

Managing Workflows using Workflow Controls

Several controls are available as icon in the top border of a workflow, just below the name of the workflow.

You can perform the following tasks:

  • Zoom in and zoom out workflow nodes: Click the The magnifying glass icon with a plus sign indicating the option to zoom in and The magnifying glass icon with a minus sign indicating the option to zoom out icon respectively.

  • Control node size: Click the percent drop-down list and select the size. Default size is 100%.

  • View event log: Click the log icon.

  • Run selected nodes: Click the The green triangle icon indicating the option to run icon. When you select the node to be run, the triangle turns green.

  • Schedule workflow: Click workflow-schedule to create or edit a workflow schedule.

  • Refresh Workflow Data Definition: Click Refresh Workflow Data Definition to ensure that the workflow is updated with new columns that are either added or removed.

    Note:

    The Refresh Workflow Data Definition option is applicable only to Data Source node and SQL Query node.
  • Set performance settings: Click performance options to set the In-Memory and Parallel settings for the nodes in the workflow.

Managing Workflows and Nodes in the Properties Pane

In the Properties pane, you can view and change information about the entire workflow or a node in a workflow.

To view the properties of a node:

  1. Right-click the node and select Go to Properties. The corresponding Properties pane opens. For example, if you select a Data Source Node, then in the Properties pane, the details of the Data Source node is displayed.
  2. Use the Search field to find items in Properties.
  3. Click the The pencil icon to indicate the option to edit icon to open the editor for the item that you are viewing.
  4. Use the options in the context menu to perform additional tasks.

Performing Tasks from Workflow Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

To view a workflow, double-click the name of the workflow in the Data Miner tab. The workflow opens in a tab located between the Data Miner tab and the Components pane. If you open several workflows, then each workflow opens in a different tab. Only one workflow is active at a time.

The following options are available in the context menu:

  • Close: Closes the selected tab.

  • Close All: Closes all workflow tabs.

  • Close Other: Closes all tabs except the current one.

  • Maximize: Maximizes the workflow. The Data Miner tab, Components pane, and other items are not visible. To return to previous size, click the selection again.

  • Minimize: Minimizes the Properties tab to a menu. To return to the previous size, right-click the Properties tab, and click Dock.

  • Split Vertically: Splits the active editor or viewer into two documents. The two documents are split vertically. Click the selection again to undo the split.

  • Split Horizontally: Splits the active editor or viewer into two documents. The two documents are split horizontally. Click the selection again to undo the split.

  • New Document Tab Group: Adds the currently active editor or viewer into its own tab group. Use Collapse Editor Tab Groups to undo this operation.

  • Collapse Document Tab Groups: Collapses all the editors or viewers into one tab group. Only displayed after New Editor Tab Group.

  • Float: Converts the Properties tab into a movable pane.

  • Dock: Sets the location for floating window. Alternately, you can press Alt+Shift+D.

  • Clone: Creates a new instance of Properties.

  • Freeze Content: Freezes the content. To unfreeze Properties, click again on this selection.

Oracle Enterprise Manager Jobs

Oracle Enterprise Manager (OEM) allows database administrators to define jobs through the OEM application.

The job is run through OEM instead of Oracle Scheduler. The job can be run manually from OEM. The running of the job can be monitored. The result is either a success or a reported failure.

  • The job definitions can directly open the generated scripts files.

  • The job definition should define the master script invocation as a script file using a full file path.

  • The job can be run on a schedule or on demand.

  • All script that run within the master script must have fully qualified path names.

For information about Enterprise Manager, see the Oracle Enterprise Manager documentation set at Oracle Database 2 Day DBA for the database to which you are connected.

Renaming a Workflow

You can rename a workflow using the workflow content menu.

To change the name of a workflow or project:

  1. Right-click the name of the workflow or project in the Data Miner tab and select Rename.
  2. The Rename Workflow dialog box or Rename Projectdialog box opens. Enter the new name in the Rename To field.

    For a project, the new name must satisfy the project name restrictions. For a workflow the new name must satisfy the workflow name restrictions.

  3. Click OK.

Runtime Requirements

You must have the Data Miner repository installed on the system where the scripts are run.

The generated scripts require access to Data Miner repository objects. The script checks the repository version and ensures that the repository is the same version or greater than the version of the source system.

Running a Workflow

The View Event Log enables you to view the progress of a workflow that is running.

You can also view the time taken for workflow creation.

Model builds can be very resource-intensive. The MAX_NUM_THREADS parameter in the Oracle Data Miner server controls the number of parallel builds. MAX_NUM_THREADS specifies the maximum number of model builds across all workflows running in the server.

Default Value=10. Therefore, by default, 10 models can occur concurrently across all workflows. There is the MAX_NUM_THREADS parameter in the repository table ODMRSYS. ODMR$REPOSITORY_PROPERTIES, where you can specify the value.

If you increase the value of MAX_NUM_THREADS, then do it gradually. Workflows can appear to be running even though the network connection is lost.

To control the parallel model build behavior, the following parameters are used:

  • THREAD_WAIT_TIME. The default is 5. When MAX_NUM_THREADS is reached, further Build process will be put on queue until parallel model build count MAX_NUM_THREADS. This setting (in seconds) determines how often to check for parallel model build count.

  • MAX_THREAD_WAIT. The default is NULL. The timeout (in seconds) for Build process that has been put on queue. If NULL, then no timeout will occur.

Related Topics

Network Connection Interruption

You may encounter interruption in network connection while running a workflow.

If a network connection to the Data Miner server is interrupted while running a workflow, then the Workflow Editor may indicate that the workflow is still running indefinitely. On the other hand, the Workflow Jobs window may indicate that the connection is down.

Oracle Data Miner issues a message indicating that the connection was lost. If the connection is not recovered in a reasonable period, then close and open the Workflow Editor again.

Note:

Use the Workflow Jobs window to monitor connections.

Locking and Unlocking Workflows

A workflow is locked when it is opened, or one or all its nodes are running.

A workflow is locked under the following conditions:

  • When a node is running, the workflow is locked, and none of its nodes can be edited. Existing results may be viewed while the workflow is running. You can also go to a different workflow and edit or run it.

  • When a user opens a workflow, the workflow is locked so that other users cannot modify it.

  • When a workflow is running, an animation in the tool bar (circular arrow loop) shows that the workflow is running.

  • When you open a locked workflow, Locked is displayed in the tool bar and a running indicator if the workflow is running:

    locked

  • If no one else has the workflow locked, and you are given the lock, then lock icon is removed from the tool bar.

  • Unlocking a locked workflow: If a workflow seems to complete, but is still locked, then click Locked to unlock the workflow.

  • Refreshing Workflow: Once a locked workflow has stopped running, you can refresh the workflow by clicking the refresh icon. You can also try to obtain the lock yourself by clicking on the lock.

Scheduling a Workflow

Using the Workflow Schedule, you can define a schedule to run a workflow at a definite time and date.

You can also edit an existing Workflow Schedule and cancel any scheduled workflows. To create a workflow schedule:

  1. Click the arrow next to the workflowscheduler icon.
  2. Select an option from the drop-down list:
    • Click Create Schedule: to create a schedule. The Create Schedule dialog box opens, where you can create a schedule for:

      • All Nodes

      • Selected Nodes: Select the nodes for which you want to create the schedule.

      • Selected Nodes and Parents: Select the nodes for which you want to create the schedule.

      • Selected Nodes and Children: Select the nodes for which you want to create the schedule.

      • Children of Selected Nodes only: Select the nodes

    • Edit Schedule

    • Cancel Schedule

Create Schedule

Use the Create Schedule option to define a schedule to run a workflow at a definite time and date.

In the Create Schedule dialog box, you can create schedules for your workflows. To create workflow schedule:
  1. Start Date: Select a date to set as the start date of the schedule. Click calendar to select a date.
  2. Repeat: Select any one of the following options:
    • None: To schedule the workflow to run only once at the defined time.

    • Every Day: To schedule the workflow to run daily at the specified time.

    • Every Week: To schedule the workflow to run weekly at the specified time.

    • Custom: To customize your workflow schedule, click Custom. This opens the Repeat dialog box, where you can set how frequently the workflow should run.

  3. End Repeat: You can select any one of the following options:
    • None: To continue running the workflow every hour.

    • After: Select a number by clicking the arrows. This runs the workflow every hour, and would stop after the number of hours you have selected here. For example, if you select 8, then the workflow will run every hour, and after 8 hours, it will stop.

    • On Date: Select a particular date by clicking the calendar icon.

  4. Select Use Existing Schedule, and select a schedule from the drop-down list if you want to schedule the workflow as per the selected schedule.
    • Click edit to edit the selected schedule in the Schedule dialog box.

    • Click add to add a new schedule. You can also edit the selected schedule, and add it here.

    • Click delete to delete the selected schedule.

  5. Click OK.

To save the workflow schedule settings, click calendar. You can provide a name for the schedule in the Save a Schedule dialog box.

Related Topics

Repeat

In the Repeat dialog box, you can set how frequently the workflow scheduler should run. To set the repeat frequency:
  1. Frequency: Select an option to set how frequently your workflow should run.
  2. Every: The values in this field depends on the option that you select in the Frequency field. For example, if you select 2 in the Every field and Hourly in the Frequency field, then the workflow will run every 2 hours. If you select 2 in the Every field and Daily in the Frequency field, then the workflow will run every 2 days. Select an option as applicable.
  3. Click OK.

Repeat Hourly

If you select Hourly in the Frequency field, then select after how many hours, the workflow should run.

  1. The Frequency field displays Hourly.
  2. In Every field select a number by clicking the arrow. This number determines after how many hours the workflow should run. For example, if you select 5, then after every 5 hours, your workflow will run.
  3. Click OK. This takes you to the Create Schedule dialog box.

Repeat Daily

If you select Daily in the Frequency field, then select after how many days, the workflow should run.

  1. The Frequency field displays Daily.
  2. In Every field select a number by clicking the arrow. This number determines after how many days the workflow should run. For example, if you select 5, then after every 5 days, your workflow will run.
  3. Click OK. This takes you to the Create Schedule dialog box.

Repeat Weekly

If you select Weekly in the Frequency field, then select after how many weeks, and on which days of the week, the workflow should run.

  1. The Frequency field displays Weekly.
  2. In Every field, select a number by clicking the arrow. This number determines after how many weeks the workflow should run. For example, if you select 2, then after every 2 weeks, your workflow will run.
  3. Select the days of the week on which you want to run the workflow.
  4. Click OK. This takes you to the Create Schedule dialog box.

Repeat Monthly

If you select Monthly in the Frequency field, then select after how many months, and on which dates of the month, the workflow should run.

  1. The Frequencyfield displays Monthly.
  2. In Every field, select a number by clicking the arrow. This number determines after how many months the workflow should run. For example, if you select 2, then every 2 months, your workflow will run.
  3. In the Days of Month section, select either of the following options:
    • Each: To select a date on which you want your workflow to run. For example, if you select 26, then on the 26th of every month, the workflow will run.

    • On the: To select on which days of the month you want to run your workflow. For example, if you select First and Monday, from the two drop-down lists respectively, then on every first Monday of the month, your workflow will run.

  4. Click OK. This takes you to the Create Schedule dialog box.

Repeat Yearly

If you select Yearly in the Frequency field, then select after how many years, the workflow should run. Select the months in the appropriate field.

  1. The Frequency field displays Yearly.
  2. In Every field, select a number by clicking the arrow. This number determines after how many years the workflow should run.
  3. Select the months of the year, on which the workflow should run.
  4. Click OK. This takes you to the Create Schedule dialog box.

Schedule

In the Schedule dialog box, you can edit workflow schedule. To edit selected workflow schedule:
  1. In the Name field, the name of the selected workflow schedule is displayed. This is a non-editable field.
  2. In the Start Date field, click calendar to select a different date.
  3. In the Repeat field, select a different repeat option. To customize, select Custom and make necessary edits in the Repeat dialog box.
  4. In the End Repeat field, you may select any one of the following options:
    • Never: To continue running the workflow schedule indefinitely.

    • On Date: Click calendar to select a particular date on which to end the running of the workflow.

  5. Click OK.

Save a Schedule

The Save a Schedule dialog box allows you to save the workflow schedule. To save a workflow schedule:
  1. In the Name field, provide a name for the workflow schedule.
  2. Click OK.

Advanced Settings

In the Advanced Settings dialog box, you can set up email notifications, settings related to workflow jobs and nodes. To set up email notifications, and other settings:

  • In the Notification tab:

    1. Select Enable Email Notification to receive notifications.

    2. In the Recipients field, enter the email addresses to receive notifications.

    3. In the Subject field, enter an appropriate subject.

    4. In the Comments fields, enter comments, if any.

    5. Select one or more events for which you want to receive the notifications:

      • Started: To receive notifications for all jobs that started.

      • Succeeded: To receive notifications for all jobs that succeeded.

      • Failed: To receive notifications for all jobs that failed.

      • Stopped: To receive notifications for all jobs that stopped.

    6. Click OK.

  • In the Settings tab:

    1. In the Time Zone field, select a time zone of your preference.

    2. In the Job Priority field, set the priority of the workflow job by placing the pointer between High and Low.

    3. Select Max Failure and set a number as the maximum number of failed workflow execution.

    4. Select Max Run Duration and set the days, hours and minutes for the duration of maximum run time of the workflow job.

    5. Select Schedule Limit and set the days, hours and minutes.

    6. Click OK.

  • In the Nodes tab, all workflow nodes that are scheduled to run are displayed. This is a ready only display.

Workflow Prerequisites

Before you perform any task with a workflow, the workflow prerequisites must be met.

The workflow prerequisites are:

  • Create and establish a connection to the database.

  • Create a project under which the workflow is created.

Workflow Script Requirements

Workflow script requirements include the following:

Script File Character Set Requirements

Ensure that all script files are generated using UTF8 character set.

Scripts can contain characters based on character sets that will not be handled well unless the script file is generated using UTF8 character set.

Script Variable Definitions

Scripts have variable definitions that provide object names for the public objects created by the scripts.

The Master Script is responsible for calling all underlying scripts in order. So, the variable definitions must be defined in the Master Script.

The variable Object Types enables you to change the name of the object names that are input to the scripts, such as tables or views, and models. By default, these names are the original table or view, and model names.

All generated scripts should be put under the same directory.

Scripts Generated

Scripts Generated is a kind of deployment where several general scripts such as master script, cleanup script and so on are generated.

The generated scripts are:

  • Master Script: Starts all the required scripts in the appropriate order. The script performs the following:

    • Validates if the version of the script is compatible with the version of the Data Miner repository that is installed.

    • Creates a workflow master table that contains entries for all the underlying objects created by the workflow script.

    • Contains generated documentation covering key usage information necessary to understand operation of scripts.

  • Cleanup Script: Drops all objects created by the workflow script. The Cleanup Script drops the following objects:

    • Hidden objects, such as the table generated for Explore Data.

    • Public objects, such as Model Names created by Build Nodes.

    • Tables created by a Create Table Node.

  • Workflow Diagram Image: Creates an image (.png file) of the workflow at the time of script generation. The entire workflow is displayed.

The other scripts that are generated depend on the nodes in the chain. Table 4-1 lists the nodes and their corresponding script functionality.

Table 4-1 Nodes and Script Functionality

Node Script Functionality
  • Data Source node

  • Transform node

  • Aggregate node

  • Join node

  • Filter Rows node

  • Sample node

  • Model Details Node

  • Apply node

  • Apply Text node

  • Filter Columns Details node

Creates a view reflecting the output of the node.

Filter Column node

Creates a view reflecting the output of the Filter Column node such as other Transform type nodes.

If the Attribute Importance setting is specified, then a table is generated containing the AI result (private).

Build Text node

For each text transformation, the following objects are created:

  • Feature Table Name

  • Oracle Text Policy Object

Creates a view reflecting the output of the Build Text node. This is essentially the same output as an Apply Text node

Classification Build node

A model is created for each model build specification.

A master test result table is generated to store the list of test result tables generated per model.

GLM Model Row Diagnostics Table is created if row diagnostics is turned on.

Each Model Test has one table generated for each of the following test results: Performance, Performance Matrix, ROC (for binary classification only).

Each Model Test has one table for each of the following tests per target value (up to 100 maximum target values): List and Profit.

Regression Build node

A model is created for each model build specification.

GLM Model Row Diagnostics Table is created if row diagnostics is turned on.

A master test result table is generated to store the list of test result tables generated per model.

Each Model Test will have one table generated for each of the following test results: Performance and Residual.

  • Clustering Build

  • Anomaly Detection Build

  • Feature Extraction Build

  • Association Build

A model is created for each model build specification.

Test Node (Classification)

A master test result table is generated to store the list of test result tables generated per model.

GLM Model Row Diagnostics Table is created if row diagnostics is turned on.

Each Model Test has one table generated for each of these test results: Performance, Performance Matrix, ROC (for binary classification only).

Each Model Test has one table for each of Lift and Profit per target value up to 100 maximum target values.

Test Node (Regression)

GLM Model Row Diagnostics Table is created if row diagnostics is turned on.

A master test result table is generated to store the list of test result tables generated per model.

Each Model Test has one table generated for each of Performance and Residual.

  • Model Node

  • Text Reference Node

No scripts are generated. These nodes are just reference nodes to metadata.

Running Scripts using SQL*Plus or SQL Worksheet

If you run generated scripts using either SQL*Plus or SQL Worksheet, and require input from users, then you must run a command before the generated SQL.

Run the following command before the generated SQL:

set define off

You can either run this new line separately before running the generated SQL. You can also run the new line along with the generated SQL.

About Nodes

Nodes define the data mining operations in a workflow.

A workflow consists of one or more nodes that are connected by a link. The node categories, listed in Table 4-2, are available in the Components pane.

Node Name and Node Comments

Every node must have a name and may have a comment. Name assignment is fully validated to assure uniqueness. Oracle Data Miner generates a default node name.

When a new node of particular type is created, its default name is based on the node type, such as Class Build for a Classification node. If a node with that name already exists, then the name is appended with 1 to make it unique. So if Class Build exists, then the node is named Class Build 1. If Class Build 1 exists, then 2 is appended so that the third Classification node is named Class Build 2, and so on. Each node type is handled independently. For example, Filter Columns node have its own sequence, different from the sequence of Classification nodes.

You can change the default name to any name that satisfies the following requirements:

  • Node names must be unique within the workflow.

  • Node names must not contain any / (slash) characters.

  • Node names must be at least one character long. Maximum length of node name is 128 characters.

To change a node name, either change it in the Details tab of Properties or select the name in the workflow and type the new name.

Comments for the node are optional. If you provide any comments, then it must be not more than 4000 characters long.

Node Types

Oracle Data Miner provides different types of nodes for specific purposes such as build, model, linking, data mining operations, data transformation and so on.

Table 4-2 lists the different categories of nodes:

Table 4-2 Types of Nodes

Type Description

Model Nodes

They specify models to build or models to add to a workflow.

Model Operations

They evaluate and apply models.

Data Nodes

They specify data for mining operation, data transformation or to save data to a table.

Transforms Nodes

They perform one or more transformation on the table or tables identified in a Data node.

Predictive Query Nodes

They create predictive results without the need to build models. The predictive queries automatically generate refined predictions based on data partitions.

Text Nodes

They prepare data sources that contain one or more text columns so that the data can be used in Build and Apply models.

Link Nodes

They provide a way to link or connect nodes.

Node States

A node is always associated with a state which indicates its status.

Table 4-3 lists the states of a node.

Table 4-3 Node States

Node States Description Graphical Indicator

Invalid

Indicates that the node has not been completely defined and is not ready for execution.

Most nodes must be connected to be valid. That is, they must have the input defined. A Data node does not need to be connected to be valid.

error

Error

Indicates that the node attempted to run but encountered an error. For nodes that perform several tasks such as build several models, any single failure sets the status of the node to Error.

You must correct all problems to clear the Error state.

Any change clears the Error state to Ready, if the problem is a server runtime failure not attributable to standard specification validations. To find out if the problem is really fixed, run the node.

error

Ready

Indicates that the node is properly defined and is ready for execution.

Nodes in Ready state are also known as Valid.

No graphical indicator

Complete

Indicates that the node execution is successfully completed.

checkmark

Warning

Indicates that the node execution is complete but not with the expected result.

warning

Working with Nodes

You can perform the following tasks with any nodes.

Add Nodes or Create Nodes

You create or add nodes to the workflow.

For the node to be ready to run, you may have to specify information such as a table or view for a Data Source node or the target for a Classification node. To specify information, you edit nodes. You must connect nodes for the workflow to run. For example, you specify input for a Build node by connecting a Data node to a Build node.

To add nodes to a workflow:

  1. Load the workflow.
  2. In the Components pane, go to the Workflow Editor and expand the section for node. You can create and connect the following types of nodes in the workflow:

Copy Nodes

You can copy one or more nodes, and paste them into the same workflow or into a different workflow.

Copy and paste does not carry with it any mining model or results that belong to the original nodes. Since model names must be unique, unique names are assigned to models, using original name as a starting point.

For example, when a copied Build node is pasted into a workflow, the paste operation creates a new Build node with settings identical to those of the original node. However, models or test results do not exist until the node runs.

Related Topics

Edit Nodes

You can edit nodes by using any one of the following ways:

Editing Nodes through Edit Dialog Box

The Edit dialog box for each node gives you the option to provide and edit settings related to the node.

To display the Edit dialog box of any node:

  1. Double-click the node or right-click the node and select Edit.
  2. The Edit dialog box opens. Make the edits to the node as applicable, and click OK.

For some nodes, such as Data Source node, the Edit Data Source Node dialog box automatically opens either when the node is dropped in to the workflow or when an input node is connected to a node.

Editing Nodes through Properties

The Properties pane of a node is the pane that opens when a node in a workflow is selected. You can dock this pane.

To edit a node through Properties pane:

  1. Click the node that you want to edit. The Properties pane for the node is displayed in the lower right pane.
  2. Click the The pencil icon that indicates the option to edit icon to edit the node. For all other edits, click the respective sections in the Properties pane.

    All nodes have one common section named Details in the Properties pane. This section contains the node name and comment. The other sections in the Properties pane depend on the type of node.

  3. The corresponding Edit dialog box opens. Make the edits to the node as applicable, and click OK.

Link Nodes

Each link connects a source node to a target node.

A workflow is a directional graph. That is, it consists of nodes that are connected in order. To create a workflow, you connect or link nodes. When you connect two nodes, you indicate a relationship between the nodes. For example, to specify the data for a model build, link a Data Source node to a Model Build node.

You can link nodes, delete links, and cancel links using the following options:

Linking Nodes in Components Pane

You can connect two or more nodes by using the Linking nodes option.

To connect two nodes using the Linking Nodes option:

  1. In the Components pane, expand the Linking Nodes section.
  2. Click Link. Move the cursor to the source node and click. From the source node, drag the link that appears to the target node, and click again.
    • If the link is valid, then clicking the target node creates the link.

    • If the link is invalid, then the line does not terminate. To end the link process without completing a link, either press ESC or click in the Components pane.

Node Connection Dialog Box

In the Node Connection dialog box, you can link two or more nodes.

To link two nodes :

  1. In the Components pane, expand the Linking Nodes section.
  2. Click the Link icon and press ENTER. This opens the Select Source And Destination For A New Link dialog box.
  3. Select the source node in the Source List and the target node in the Destination List.

    The Source List lists all the nodes in the workflow. The Destination List lists the allowed target nodes for the selected node.

  4. Click OK.

Change Node Position

You can change the position of a node in a workflow.

You can change the position of workflow nodes in these ways:

  • Drag: To drag a node, click the node. Without releasing the mouse button, drag the node to the desired location. Links to the node are automatically repositioned.

  • Adjust with arrow keys: To adjust the position by small increments, select the node. Then press and hold Shift + CONTROL keys. Use the arrow keys to move the node.

Connect Option in Diagram Menu

Use the Connect option in the diagram menu to link two or more nodes.

To use the Connect option:

  1. Select the source node in the workflow.
  2. In Data Miner menu bar, click Diagram and select Connect.
  3. Move the cursor from the source node to the target node and click again.
    • If the link is valid, then clicking the target node creates the link.

    • If the link is invalid, then the line does not terminate. To end the link process without completing a link, press ESC.

Deleting a Link

You can delete an existing link, by selecting the link and pressing the DELETE key.

You may want to delete a link between two or more nodes in a workflow, if required.

Cancelling a Link

You can cancel a link while you are linking it, by pressing the ESC key.

To cancel a link while you are linking it, press the ESC key or select another item in the Components pane.

Refresh Nodes

Nodes such as Data Source node, Update Table node, and Model node rely on database resources for their definition. It may be necessary to refresh a node definition if the database resources change.

To refresh a node:

  1. Click the node that you want to refresh.
  2. In the right pane, go to Properties, and click the applicable section:
    • Data: For Data Source node

    • Model: For a Model node

    • Columns: For an Update Table node

  3. Click the refresh icon. Data Miner connects to the database and validates the attributes or models. The status of the attribute or model is set to either:
    • Valid: Indicated by the node without any mark on its top right corner.

    • Invalid: Indicated by the error mark or warning mark on the top right corner of the node.

Other possible validation error scenarios include:

  • Source table not present: Error message states that the source table or attribute is missing. You have the option to:

    • Select another table or replace a missing attribute.

    • Leave the node in its prior state by clicking Cancel.

  • Model is invalid: Click the Model refresh button on the Properties model tool bar to make the node valid. For Model nodes, the node becomes Invalid if a model is found to be invalid. The Model node cannot be run until the node is made Valid.

  • Model is missing: If the Model node is run and the server determines that the model is missing, then the node is set to Error. You can rerun the model after the missing model is replaced.

Run Nodes

You perform the tasks specified in the workflow by running one or more nodes.

If a node depends on outputs of one or more parent nodes, then the parent node runs automatically only if the outputs required by the running node are missing. You can also run one or more nodes by selecting the nodes and then clicking in the toolbar of the workflow.

Note:

Nodes cannot be run always. If any ancestor node is in the Invalid state and its outputs are missing, then the child nodes that depend on it cannot run.

Each node in a workflow has a state that indicates its status.

You can run a node by clicking the following options in the context menu:

  • Run

  • Force Run

Related Topics

Performing Tasks from the Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right-click a node to display the context list for the node. The context menu includes the following:

Connect

Use the Connect option to link nodes in a workflow.

To connect nodes:

  1. Right-click a node and click Connect. Alternately, go to Diagram and click Connect.
  2. Use the cursor to draw a line from this node to the target node.
  3. Click to establish the connection. Note the following:
    • You can create only valid connections.

    • You can create only one connection between any two nodes.

    • You can remove a connection by pressing the ESC key.

Related Topics

Run

Use the Run option to execute the tasks specified in the nodes that comprise the workflow.

The Data Miner server runs workflows asynchronously. The client does not have to be connected. You can run one or more nodes in the workflow:

  • To run one node: Right-click the node and select Run.

  • To run multiple nodes simultaneously: Select the nodes by holding down the Ctrl key and click each individual node. Then right-click any selected node and select Run.

If a node depends on outputs of one or more parent nodes, then the parent node runs automatically only if the outputs required by the running node are missing.

Force Run

Use the Force Run option to rerun one or more nodes that are complete.

Force Run deletes any existing models before building them once again.

To select more than one node, click the nodes while holding down the Ctrl key.

You can Force Run a node at any location in a workflow. Depending on the location of the node in the workflow, you have the following choices for running the node using Force Run:

  • Selected Node

  • Selected Node and Children (available if the node has child nodes)

  • Child Node Only (available if the node one or more child nodes)

  • Selected Node and Parents (available if the node has parent nodes)

Create Schedule

Use the Create Schedule option to define a schedule to run a workflow at a definite time and date.

In the Create Schedule dialog box, you can create schedules for your workflows. To create workflow schedule:
  1. Start Date: Select a date to set as the start date of the schedule. Click calendar to select a date.
  2. Repeat: Select any one of the following options:
    • None: To schedule the workflow to run only once at the defined time.

    • Every Day: To schedule the workflow to run daily at the specified time.

    • Every Week: To schedule the workflow to run weekly at the specified time.

    • Custom: To customize your workflow schedule, click Custom. This opens the Repeat dialog box, where you can set how frequently the workflow should run.

  3. End Repeat: You can select any one of the following options:
    • None: To continue running the workflow every hour.

    • After: Select a number by clicking the arrows. This runs the workflow every hour, and would stop after the number of hours you have selected here. For example, if you select 8, then the workflow will run every hour, and after 8 hours, it will stop.

    • On Date: Select a particular date by clicking the calendar icon.

  4. Select Use Existing Schedule, and select a schedule from the drop-down list if you want to schedule the workflow as per the selected schedule.
    • Click edit to edit the selected schedule in the Schedule dialog box.

    • Click add to add a new schedule. You can also edit the selected schedule, and add it here.

    • Click delete to delete the selected schedule.

  5. Click OK.

To save the workflow schedule settings, click calendar. You can provide a name for the schedule in the Save a Schedule dialog box.

Related Topics

Edit

Use the Edit option to edit the default settings of a node.

Nodes have default algorithms and settings. When you edit a node, the default algorithms and settings are modified. You can edit a node in any one of the following ways:

  • Edit nodes using the Edit dialog box

  • Edit nodes through Properties UI

View Data

Use the View Data option to view the data contained in a Data node.

The Data nodes are Create Table or View node, Data Source node, Explore Data node, Graph node, SQL Query node, and Update Table node.

Related Topics

View Models

Use the View Models option to view the details of the models that are built after running the workflow.

To view models, you must select a model from the list to open the model viewer. A model must be built successfully before it can be viewed.

Generate Apply Chain

Use the Generate Apply Chain to create a new node that contains the specification of a node that performs a transformation.

If you have several transformations performed in sequence, for example, Sample followed by a Custom transform, then you must select Generate Apply Chain for each transformation in the sequence.You must connect the individual nodes and connect them to an appropriate data source.

Generate Apply Chain helps you create a sequence of transformations that you can use to ensure that new data is prepared in the same way as existing data. For example, to ensure that Apply data is prepared in the same way as Build data, use this option.

The Generate Apply Chain option is not valid for all nodes. For example, it does not copy the specification of a Build node.

Refresh Input Data Definition

Use the Refresh Input Data Definition option if you want to update the workflow with new columns, that are either added or removed.

The Refresh Input Data Definition option is equivalent to SELECT* capability in the input source. The option allows you to quickly refresh your workflow definitions to include or exclude columns, as applicable.

Note:

The Refresh Input Data Definition option is available as a context menu option in Data Source nodes and SQL Query nodes.

Show Event Log

Use the Show Event Log option to view information about events in the current connection, errors, warnings, and information messages.

Clicking the Show Event Log option opens the View and Event Log dialog box.

Related Topics

Deploy

Use the Deploy option to deploy a node or workflow by creating SQL scripts that perform the tasks specified in the workflow.

The scripts generated by Deploy are saved to a directory.

Note:

You must run a node before deploying it.

You can generate a script that replicates the behavior of the entire workflow. Such a script can serve as the basis for application integration or as a light-weight deployment than the alternative of installing the Data Miner repository and workflows in the target and production system.

To deploy a workflow or part of a workflow:

  1. Right-click a node and select Deploy.
  2. Select any one of the deployment options:
    • Selected node and dependent nodes

    • Selected node, dependent nodes, and child nodes

    • Selected node and connected nodes

  3. After selecting the deployment option, the Generate SQL Script wizard opens. In the wizard, enter details for the following:

Show Graph

The Show Graph option opens the Graph Node Editor.

All graphs are displayed in the Graph Node Editor.

Related Topics

Cut

Use the Cut option to remove the selected object, which could be a node or connection.

You can also delete objects by selecting them and pressing DELETE on your keyboard.

Copy

Use the Copy option to copy one or more nodes and paste them into the same workflow or a different workflow.

To copy and paste nodes:

  1. Select the nodes to copy. To select several nodes, hold down the Ctrl key when you click the nodes.

    The selected node is highlighted. In this example Classification is selected. The other node is not selected.

    copy
  2. Right-click and select Copy from the context menu. Alternately, you can press Ctrl+C to copy the selected nodes.

Note:

Copying and pasting nodes do not carry any mining models or results from the original node.

Paste

Use the Paste option to paste the copied object in the workflow.

To paste an object, right-click the workflow and click Paste. Alternately, you can press Ctrl+V.

Note:

Node names and model names are changed to avoid naming collisions. To preserve names, use the option Extended Paste.

Related Topics

Extended Paste

Use the Extended Paste option to preserve node and model names while pasting them.

The default behavior of Paste is to change node names and model names to avoid naming collisions.

To go to the Extended Paste option, right-click the workflow and click Extended Paste. Alternately, you can press Control+Shift+V.

Note:

If model names are not unique, then the models may be overwritten when they are rebuilt.

Related Topics

Select All

Use the Select All option to select all the nodes in a workflow.

The selected nodes and links are highlighted in a dark blue border.

Performance Settings

Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes.

If you click Performance Settings in the context menu, or if you click Performance Options in the workflow toolbar, then the Edit Selected Node Settings dialog box opens. It lists all the nodes that comprise the workflow. To edit the settings in the Edit Selected Node Settings dialog box:

  • Click Parallel Settings and select:

    • Enable: To enable parallel settings in the selected nodes in the workflow.

    • Disable: To disable parallel settings in the selected nodes in the workflow.

    • All: To turn on parallel processing for all nodes in the workflow.

    • None: To turn off parallel processing for all nodes in the workflow.

  • Click In-Memory Settings and select:

    • Enable: To enable In-Memory settings for the selected nodes in the workflow.

    • Disable: To disable In-Memory settings for the selected nodes in the workflow.

    • All: To turn on In-Memory settings for the selected nodes in the workflow.

    • None: To turn off In-Memory settings for all nodes in the workflow

  • Click The pencil icon that indicated the option to edit to set the Degree of Parallel, and In-Memory settings such as Compression Method, and Priority Levels in the Edit Node Performance Settings dialog box.

    If you specify parallel settings for at least one node, then this indication appears in the workflow title bar:

    Performance Settings is either On for Selected nodes, On (for All nodes), or Off. You can click Performance Options to open the Edit Selected Node Settings dialog box.

  • Click edit to edit default the preferences for parallel processing.

    • Edit Node Default Settings: You can edit the Parallel Settings and In-Memory settings for the selected node in the Performance Options dialog box. You can access the Performance Options dialog box from the Preferences options in the SQL Developer Tools menu.

    • Change Settings to Default

Toolbar Actions

Use the Toolbar Action option to select actions in the toolbar from the context menu.

Current actions are Zoom In and Zoom Out.

Show Runtime Errors

Use the Show Runtime Errors to view errors related to node failure during runtime. This option is displayed only when running of the node fails at runtime.

The Event Log opens with a list of errors. Select the error to see the exact message and details.

Related Topics

Show Validation Errors

Use the Show Validation Errors option to view validation errors, if any.

This option is displayed only when there are validation errors. For example, if an Association node is not connected to a Data Source node, then select Show Validation Errors to view the validation error No build data input node connected.

You can also view validation errors by moving the mouse over the node. The errors are displayed in a tool tip.

Save SQL

Use the Save SQL option to generate SQL script for the selected node.

To generate SQL script for the selected node:

  1. Right-click the node and click Save SQL.
  2. Select any one of the options to save the generated SQL script:
    • SQL to Clipboard

    • SQL to File

    • SQL Script to Clipboard

    • SQL Script to File

    When you save to a file, the system provides a default location. You can browse to change this location. You can also create a folder for scripts.

    The saved SQL includes SQL generated by the current node and all of its parent nodes that are data providers. The SQL lineage ends when it encounters a node that represents persisted objects, such as tables or models.

    The generated script does not generate all behavior of the node. The script does not create any objects. For example, if you select Save SQLfor a Create Table node, then it does not generate a script to create the table. Instead, it generates a script to query the created table.

Validate Parents

Use the Validate Parents option to validate all parent nodes of the current node.

To validate parent nodes of a node, right-click the node and select Validate Parents.

You can validate parent nodes when the node is in Ready, Complete and Error state. All parent nodes must be in completed state.

Compare Test Results

Use the Compare Test Results option to view and compare test results of models that are built successfully.

For Classification and Regression models, this option displays the test results for all successfully built models to allow you to pick the model that best solves the problem.

View Test Results

Use the View Test Results option to view the test results of the selected model. This option is applicable only for Classification and Regression models.

The test results are displayed in the test viewer of the respective models:

Go to Properties

Use the Go to Properties option to open the Properties pane of the selected node.

Navigate

Use the Navigate option to view the links available from the selected node.

Note:

The Navigate option is enabled only if there are links to other nodes.

Navigate displays the collection of links available from this node. Selecting one of the links selects the link and the selected link is highlighted in the workflow. The link itself has context menu options as well so you can right click and continue with the Navigate option. You can also use the arrow keys to progress to the next node.

About Parallel Processing

In Parallel Query or Parallel Processing, multiple processes work simultaneously to run a single SQL statement.

Oracle Data Miner uses the specifications in a workflow to create SQL queries. These queries are passed and run in the Oracle Database.

By dividing the work among multiple processes, the Oracle Database can run the statement more quickly. For example, suppose four processes handle four different quarters in a year instead of one process handling all four quarters by itself.

The benefits of Parallel Processing:

  • Reduces response time for data-intensive operations on large databases such as data warehouses.

  • Enhances performance of symmetric multiprocessing (SMP) as statement processing are split up among multiple systems. Certain types of OLTP and hybrid systems also benefit from parallel processing.

In Oracle RAC systems, the service placement of a specific service controls parallel processing. Specifically, parallel processes run on the nodes on which the service is configured. By default, Oracle Database runs parallel processes only on an instance that offers the service used to connect to the database. This does not affect other parallel operations such as parallel recovery or the processing of GV$ queries.

Parallel processing must be configured by a Database Administrator (DBA). For more information on parallel processing in Oracle Database, see:

  • Oracle Database Data Warehousing Guide or Oracle Database VLDB and Partitioning Guide for more information about parallel processing

  • Oracle Real Application Clusters Administration and Deployment Guide for considerations about parallel processing in Oracle RAC environments

Parallel Processing Use Cases

This section lists some common use cases where you can use parallel processing.

Premise of the Parallel Processing Use Case

The use cases for parallel processing lists the conditions under which models are built in parallel.

Premise of the use case:

  • If the input source for a model is a table defined with parallel and no intervening workflow nodes generate a table that changes this state, then models are built in parallel without any changes to the workflow settings.

  • If the input source is not already defined in parallel, you can still build the model in parallel:

    • For Classification and Regression models: Turn on Parallel Processing for the workflow. The setting for Classification and Regression will be to Split input into Tables with the Parallel option.

    • For all models: Turn on Parallel Processing for the workflow. Insert a Create Table node before the Build node. Use the created table as input to the models.

Making Transformation Run Faster, using Parallel Processing and Other Methods

You can make transformations run faster by using Parallel Processing and other techniques.

The techniques are:

  • Turning on Parallel Processing for the workflow: All nodes that have a form of sample input could have some benefit. If the sample size is small, then the Oracle Database may not generate parallel queries. But a complex query could still trigger parallel processing.

  • Adding a Create Table node: You can add a Create Table node after expensive transformations to reduce repetitive querying costs.

  • Adding an index to Create Table node: You can add an index to a Create Table node to improve downstream join performance.

Running Graph Node in Parallel

You can run Graph nodes in parallel. Even if no other nodes are parallel, performance may improve.

To run a Graph node in parallel:

  1. Set parallel processing for the entire workflow.
  2. Turn off parallel processing for all nodes except for the Graph nodes.
  3. Run the Graph nodes. Graph node sample data is now generated in parallel. If the Graph node sample is small or the query is simple, then the query may not be made parallel.

Running a Node in Parallel to Test Performance

You can run parallel processing on a node just once to see if parallel processing results in improved performance.

To run parallel processing:

  1. Set parallel processing for the entire workflow.
  2. Run the workflow. Note the performance.
  3. Now, turn off parallel processing for the workflow.

Oracle Data Mining Support for Parallel Processing

Model scoring is done in parallel for all algorithms and data in Oracle Database 12.1 and later.

All algorithms do not support parallel build. For Oracle Database 12.1 and later, the following algorithms support parallel build:

  • Decision Trees

  • Naive Bayes

  • Minimum Description Length

  • Expectation Maximization

All other algorithms support serial build only.

Setting Parallel Processing for a Node or Workflow

By default, parallel processing is set to OFF for any node type.

Even if parallel processing is set to ON with a preference, the user can override the section for a specific workflow or node.

To set parallel processing for a node or workflow:

  1. Right-click the node and select Performance Settings from the context menu. The Edit Selected Node Settings dialog box opens.
  2. Click OK.

Performance Settings

Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes.

If you click Performance Settings in the context menu, or if you click Performance Options in the workflow toolbar, then the Edit Selected Node Settings dialog box opens. It lists all the nodes that comprise the workflow. To edit the settings in the Edit Selected Node Settings dialog box:

  • Click Parallel Settings and select:

    • Enable: To enable parallel settings in the selected nodes in the workflow.

    • Disable: To disable parallel settings in the selected nodes in the workflow.

    • All: To turn on parallel processing for all nodes in the workflow.

    • None: To turn off parallel processing for all nodes in the workflow.

  • Click In-Memory Settings and select:

    • Enable: To enable In-Memory settings for the selected nodes in the workflow.

    • Disable: To disable In-Memory settings for the selected nodes in the workflow.

    • All: To turn on In-Memory settings for the selected nodes in the workflow.

    • None: To turn off In-Memory settings for all nodes in the workflow

  • Click The pencil icon that indicated the option to edit to set the Degree of Parallel, and In-Memory settings such as Compression Method, and Priority Levels in the Edit Node Performance Settings dialog box.

    If you specify parallel settings for at least one node, then this indication appears in the workflow title bar:

    Performance Settings is either On for Selected nodes, On (for All nodes), or Off. You can click Performance Options to open the Edit Selected Node Settings dialog box.

  • Click edit to edit default the preferences for parallel processing.

    • Edit Node Default Settings: You can edit the Parallel Settings and In-Memory settings for the selected node in the Performance Options dialog box. You can access the Performance Options dialog box from the Preferences options in the SQL Developer Tools menu.

    • Change Settings to Default

Edit Node Performance Settings

The Edit Node Performance Settings window opens when you click edit icon in the Edit Selected Node Settings. You can set Parallel Processing settings and In-Memory settings for one or all nodes in the workflow.

To set Parallel Query settings and In Memory settings:
  • Select Parallel Query On to set parallel processing for the node. If you specify parallel processing for a node type, then the query generated by the node may not run in parallel.

    • System Determined: This is the default degree of parallelism.

    • Degree Value: To specify a value for degree of parallelism, select this option and choose a value by clicking the arrows. The default value is 1. The specified value is displayed in the Degree of Parallel column for the node type in the Performance Option and Edit Selected Node Settings dialog boxes.

  • Select In-Memory Columnar option to set the compression method and priority level for the selected node. The selected settings are displayed in the In-Memory Settings option in the Performance Option dialog box.

    Note:

    The In Memory option is available in Oracle Database 12.1.0.2 and later.
    • Compression Method: Allows you to set a compression method for the data.

      • None: In this method, the data is not compressed.

      • Low: This method results in the best query performance.

      • Medium: This method optimizes the data for DML operations and compresses the IM column

      • High: This method results in excellent query performance.

      • Higher: This method results in good query performance.

      • Highest: This method results in a fair query performance.

    • Priority Level: The priority level that you set here determines when the data of the database object is populated in the IM column store.

      • None: This is the default setting when priority is not included in the INMEMORY clause.

      • Low: Displays the data before database objects with the priority level NONE and after priority levels: MEDIUM, HIGH or HIGHEST.

      • Medium: Displays the data before database objects with the priority levels: NONE or LOW, and after priority levels: HIGH or HIGHEST.

      • High: Displays the data before database objects with the priority levels: NONE, LOW, or MEDIUM and after priority level: HIGHEST.

      • Highest: Displays the data before database objects with the priority levels: NONE, LOW, MEDIUM, or HIGH.

Edit Node Parallel Settings

In the Edit Node Parallel Settings dialog box, you can provide parallel query settings for the selected node.

To set parallel query settings for the selected node.

  1. Click Parallel Query On.
  2. For Degrees of Parallel, select:
    • System Determined: This is the default degree of parallelism.

    • Degree Value: To specify a value for degree of parallelism, select this option and choose a value by clicking the arrows. The default value is 1. The specified value is displayed in the Degree of Parallel column for the node type in the Performance Options and Edit Selected Node Settings dialog boxes.

  3. Click OK.

About Oracle Database In-Memory

The In-Memory Column store (IM column store) is an optional, static System Global Area (SGA) pool that stores copies of tables and partitions in a special columnar format in Oracle Database 12c Release 1 (12.1.0.2) and later.

The IM column store does not replace the buffer cache. It acts as a supplement so that both memory areas can store the same data in different formats.

Benefits of Oracle Database In-Memory Column Store

The IM column store enables the database to perform scans, joins, and aggregates much faster than when it uses the on-disk format exclusively.

The IM column store is particularly useful for:

  • Scanning many rows and applying filters that use operators such as =, <, >, and IN

  • Querying a subset of columns in a table. For example, selecting 5 of 100 columns

  • Accelerating joins by converting predicates on small dimension tables into filters on a large fact table.

Business applications, ad-hoc analytic queries, and data warehouse workloads benefit most. Pure OLTP databases that perform short transactions using index lookups benefit less.

The IM column store also provides the following advantages:

  • All existing database features are supported, including High Availability.

  • No application changes are required. The optimizer automatically takes advantage of the columnar format.

  • Configuration is simple.

    The INMEMORY_SIZE initialization parameter specifies the amount of memory reserved for use by the IM column store. DDL statements specify the tablespaces, tables, partitions, or columns to be read in to the IM column store.

  • Compression is optimized for query performance.

    These compression techniques increase the effective memory bandwidth by enabling sessions to read more data into memory.

  • Fewer indexes, materialized views, and OLAP cubes are required.

    The reduction in the number of pre-built objects in reduced storage space and significantly less processing overhead.

Use Cases of Oracle Database In-Memory

You can use Oracle Database In-Memory feature in areas where you have to handle vast amount of data and where you have the need for real time information based on complex analysis of this data.

Some of the most common scenarios where Oracle Database In-Memory can be used are:

  • Stock trading analysis: To analyze and execute trading data and generate information for brokers.

  • Telecom routing: To route telecom connections based on real time data about connection status, load, errors, and response time of the nodes within the network. In this scenario, the goal is to provide an optimal telecom route within a second or less, after dialing a number.

  • Fraud detection: To detect deviant patterns or anomalous transactions, by running the detection rules in In-Memory database tables. The complex fraud detection analysis can be done quickly and relevant notifications can be sent immediately after detecting a deviance.