Transforms Nodes

Aggregation

Aggregation is the process of consolidating multiple values into a single value.

For example, you could aggregate sales in several states into sales for a region that consists of several states. To perform aggregation, use an Aggregate node.

These topics describe Aggregate nodes:

Creating Aggregate Nodes
You must identify a Data Source node and columns to aggregate to create an Aggregation node.
Editing Aggregate Nodes
You can define and edit aggregation elements of an Aggregate node in the Edit Aggregate Node dialog box.
Aggregate Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Aggregate Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Related Topics

About Parallel Processing

Parent topic: Transforms Nodes

Creating Aggregate Nodes

You must identify a Data Source node and columns to aggregate to create an Aggregation node.

Identify or create the node to aggregate. The node can be any node that provides a data flow, including Data Source nodes.

To create an Aggregation node:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Expand the Transforms section. Click Aggregate.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Right-click the node from which to create the table, and click Connect in the context menu.
Right-click the Aggregate node and click Edit.
Right-click the Aggregate node and click Run. Monitor the running of the node in the Workflow Jobs. If the Workflow Jobs is not open, then go to View and click Data Miner. Under Data Miner, click Workflow Jobs.
After the running of the node is complete, right-click the Aggregate node and select View Data to view the results of the aggregation.

Related Topics

Editing Aggregate Nodes

Parent topic: Aggregation

Editing Aggregate Nodes

You can define and edit aggregation elements of an Aggregate node in the Edit Aggregate Node dialog box.

To edit an Aggregate node:

Double-click or right-click the node and click Edit.
To select the Group By columns or Group By expression, click Edit. The Edit Group By dialog box opens.
You can define the following:
- To use the Aggregation Wizard, click . The Define Aggregation wizard opens. You can add aggregations one by one.
- To edit an already defined aggregation column, select the aggregation element and click . The Edit Aggregate Element dialog box opens.
- To delete aggregation columns, click .
- To add aggregation columns, click . The Add Column Aggregation dialog box opens.
- To add custom aggregations (an expression), click . The Add Custom Aggregation dialog box opens.
When defining the aggregations is complete, click OK.

Edit Group By
In the Edit Group By dialog box, you can edit the Group By settings for aggregation.
Define Aggregation
You can define aggregations using the Define Aggregations wizard.
Edit Aggregate Element
You can define or modify the individual elements of an aggregation.
Add Column Aggregation
In the Add Column Aggregation dialog box, you can define how a column is aggregated.
Add Custom Aggregation
In the Add Custom Aggregation dialog box, you can define expressions for custom aggregation.

Parent topic: Aggregation

Edit Group By

In the Edit Group By dialog box, you can edit the Group By settings for aggregation.

The default type isColumn. You can change the type to: Expression.

If Type is Column, then select one or more columns in the Available Attributes list. You can search the list by name or by data type. Move the selected columns to the Selected Attributes list using the arrows.
If the Type is Expression, then type an appropriate expression in the Expression box.

Click Validate to validate the expression. After you are done, click OK.

Parent topic: Editing Aggregate Nodes

Define Aggregation

You can define aggregations using the Define Aggregations wizard.

To define aggregations:

Define the Function to use for aggregation. Available functions depend on the data type of the column that you are aggregating. For example, you can select SUM if you plan to aggregate one or more columns of numeric values. For DATE and TIMESTAMP data types, the functions available are COUNT(), COUNT (DISTINCT(), MAX(), MEDIAN(), MIN(), STATS_MODE()

Click Next.
Select one or more Columns to aggregate. You must select columns with a data type compatible with the function that you selected. For example, if the function is SUM, then you must select columns that have numeric data types.

Click Next.
Optionally, select a Sub-Group By column for aggregation. Specifying a Sub-Group By column creates a nested table. For example, you could use sub-group by to calculate amount sold per product per customer. The nested table contains columns with data type DM_NESTED_NUMERICALS.

You can select a Sub Group By expression by changing Type to Expression. If you define an Expression, then click Validate to validate the expression.

Click Next.
Review the default names for the columns. You can change the names.
Review the definitions if necessary. You can click Back to make changes.
After you are done, click Finish.

Parent topic: Editing Aggregate Nodes

Edit Aggregate Element

You can define or modify the individual elements of an aggregation.

To define or modify the individual element:

In Output, you can provide a name. To provide a name, deselect Auto Name and enter a name. By default, Auto Name is selected. Output is the name of the column that holds the results of the aggregation.
Select or change the Column that is being aggregated.
Select the function to apply to the column. The functions available depend on the data type of the column.
Click Edit to define a new Sub-Group By column. The Edit Group By dialog box opens.
Once done, click OK.

Related Topics

Parent topic: Editing Aggregate Nodes

Add Column Aggregation

In the Add Column Aggregation dialog box, you can define how a column is aggregated.

To add an attribute:

Click .
To provide a name, deselect Auto Name and type in the name. By default, Auto Name is selected. Output is the name of the column that holds the results of the aggregation.
Select Columns to aggregate from the list.
Select a Function to apply to the column. The functions available depend on the data type of the column. For example, you can specify average (AVG) for a numeric value. For DATE and TIMESTAMP data types, the functions available are COUNT(), COUNT (DISTINCT()), MAX(), MEDIAN(), MIN(), STATS_MODE().
To define a Sub Group By column, click Edit. The Edit Group By dialog box opens. It is not necessary to define a Sub Group By column.
After you are done, click OK.

Related Topics

Parent topic: Editing Aggregate Nodes

Add Custom Aggregation

In the Add Custom Aggregation dialog box, you can define expressions for custom aggregation.

To add a custom aggregation, click Add expression and follow these steps:

Output is the name of column that holds the results of the aggregation. Specify a name.
Expression is the expression to add. To define an expression, click to open Expression Builder.
This expression calculates all products bought by a customer and casts the result to a nested data type:
```
CAST (COLLECT (TO_CHAR (PROD_ID)) AS ODMR_NESTED_VARCHAR2)
```
To define a Sub Group By column, click Edit. The Edit Group By dialog box opens. It is not necessary to define a Sub Group By column.
Click Validate to validate the expression.
After you are done, click OK.

Related Topics

Parent topic: Editing Aggregate Nodes

Aggregate Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The Aggregate Node Properties pane has these sections:

Columns, the columns for the aggregation.
Cache
Details

Cache
The Cache section provides the option to generate a cache for output data.
Details
The Details section displays the name of the node and any comments about it.

Related Topics

Editing Aggregate Nodes

Parent topic: Aggregation

Cache

The Cache section provides the option to generate a cache for output data.

The default setting is to not generate the cache to optimize the viewing of results. You can generate the cache.

If you generate the cache, then specify the sampling size as either:

Number of rows: The default is 2000 rows.
Percent: The default is 60 percent.

Parent topic: Aggregate Node Properties

Details

The Details section displays the name of the node and any comments about it.

You can change the name and comments in the fields here:

Node Name
Node Comments

Parent topic: Aggregate Node Properties

Aggregate Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right-click an Aggregation node. The following options are available in the context menu:

Connect
Edit.
Validate Parents
Run
View Data.
Show Graph
Force Run
Deploy
Show Graph
Generate Apply Chain
Cut
Copy
Paste
Extended Paste
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Select All
Show Event Log
Show Runtime Errors. Displayed only if there are errors.
Show Validation Errors. Displayed if there are validation errors.
Navigate

Related Topics

Data Viewer

Parent topic: Aggregation

Data Viewer

You can view data when the Transform node is in a valid state.

To view data, right-click the node and select View Data from the context menu. The data viewer opens.

The data viewer has these tabs:

Data
The Data tab displays a sample of the data.
Graph
The Graph tab enables you to create graphs from numeric data.
Columns
The Column tab is a list of all the columns that are output from the node.
SQL
In the SQL tab, the SQL Details text area displays the SQL code that generated the data provided by the actual view displayed in the Data tab.

Parent topic: Transforms Nodes

Data

The Data tab displays a sample of the data.

The Data Viewer provides grid display of rows of data either from the sampling defined in the cache or pulled through the lineage of nodes back to the source tables.

The display is controlled with:

Refresh: To refresh the display, click .
View: Enables you to select either Cached Data or Actual Data.
Sort: Displays the Select Column to Sort By dialog box.
Filter: Enables you to type in a WHERE clause to select data.

Select Column to Sort By

Parent topic: Data Viewer

Select Column to Sort By

The Select Column to Sort By dialog box enables you to:

Select multiple columns to sort.
Determine the column ordering.
Determine ascending or descending order by column.
Specify Nulls First so that null values appear ahead of real data values.

The sort order is preserved until you clear it.

Column headers are also sort-enabled to provide a temporary override to the sort selection.

Parent topic: Data

Graph

The Graph tab enables you to create graphs from numeric data.

Related Topics

Graph Node

Parent topic: Data Viewer

Columns

The Column tab is a list of all the columns that are output from the node.

The display in the tab depends on the following conditions:

If the node has not run, then the Table or View structure provided by the database is displayed.
If the node has run successfully, then the structure of the sampled table is displayed. This is based on the sampling defined when the node was specified.

For each column, the following are displayed:

Name
Data type
Mining type
Length
Precision
Scale (for floating point)
Column ID

There are several filtering options that limit the columns displayed. The filter settings with the (or)/(and) suffixes allow you to enter multiple strings separated by spaces. For example, if the Name/Data Type/Mining Type(or) is selected, then the filter string A B produces all columns where the name or the data type or the mining type starts with the letter A or B.

Parent topic: Data Viewer

SQL

In the SQL tab, the SQL Details text area displays the SQL code that generated the data provided by the actual view displayed in the Data tab.

The SQL can be a stacked expression that includes SQL from parent nodes, depending on what lineage is required to access the actual data.

You can perform the following tasks:

Copy and run the SQL query in a suitable SQL interface. The following options are enabled:
- Select All (Ctrl+A)
- Copy (Ctrl+C)
Search texts. The Search control is a standard search control that highlights matching text and searches forward and backward.

Parent topic: Data Viewer

Expression Builder

Expression Builder helps you to enter and validate SQL expressions, such as constraints for filters.

An expression is a SQL statement or clause that transforms data or specifies a restriction. Expression Builder displays the available columns, provides a selection of functions and commonly used operators, and validates expressions.

To build and validate expressions in Expression Builder:

Click in the Add Custom Transform dialog box. The Expression Builder dialog box opens.

The Expression Builder dialog box has the following components:

Attributes: Lists attributes (columns) in the source data. To insert an attribute into the query that you are creating in the Expression box, or to replace characters that you selected, double-click the attribute at the current character position.
Functions: Lists the commonly used SQL Functions, divided into folders. Double-click a folder to view the functions listed there. To insert a function into the expression at the current character position or to replace characters that you selected, double-click the function.
Expression: The expression that you create is displayed in the Expression box. You can create an expression in any one of the following ways:
- Type the expression in the Expression box directly.
- Add Attributes and Functions by double-clicking them in the Attributes tab and the Functions tab respectively.
To add an operator into the expression, click the operator.

Commonly used operators are listed in below the Expression box. Click the appropriate operator, indicated by its symbols. If you prefer, you can enter an operator directly into the Expression box. Table 7-1 lists the operators that you can enter:

Table 7-1 Commonly Used Operators

To Enter this Operator	Click
Less than	<
Greater than	>
Less than or equal to	... for the symbol <=
Greater than or equal to	... for the symbol >=
Not equal to	!=
Equal to	=
Or (Logical Or)	...
And	...
Left parenthesis	(
Right parenthesis	)
Parallel symbol	\|\|
Add	+
Minus	-
Multiply	*
Divide	/
Percent	%

Validation Results text area (read-only): Displays the validation results.
Validate: Click Validate to validate the expression in the Expression box. The results appear in Validation Results.

When you have finished creating the expression, click OK.

Functions
Expression Builder includes a variety of functions that can be applied to character, numeric, and date data.

Related Topics

Functions

Parent topic: Transforms Nodes

Functions

Expression Builder includes a variety of functions that can be applied to character, numeric, and date data.

There are functions that support most of the common data preprocessing required for data mining, including missing values treatment. To see a list of the available functions, expand the category of interest.

Functions are divided into the following categories:

Character: Includes concatenate, trim, length, substring, and others.
Conversion: Converts to character, date, number, and others.
Date: Calculates next day, inserts time stamp, truncates, rounds, and performs other date operations.
Numeric: Includes functions such as absolute value, ceiling, floor, trigonometric functions, hyperbolic functions, logarithms, and exponential values.
Analytical: Performs analytical functions.
NULL Value Substitution: For dates, characters, and numbers.

The notation for the functions is that of SQL.

Related Topics

Oracle Database SQL Language Reference

Parent topic: Expression Builder

Filter Columns Node

Filter Columns filters out columns so that the columns are not used in subsequent workflow calculations.

For example, you might want to filter out or ignore columns that have more than 94 percent Null values.

Optionally, you can identify important attributes.

Filter Columns requires analysis after it runs. The transformation makes recommendations. You can decide which recommendation to accept.

Filter Columns can run in parallel.

These topics describe Filter Columns nodes:

Creating Filter Columns Node
You create a Filter Columns node to filters out columns so that the columns are not used in subsequent workflow calculations.
Editing Filter Columns Node
In the Edit Filter Columns Node dialog box, you can define or edit the filter performed by the Filter Columns node.
Filter Columns Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Filter Columns Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Related Topics

Attribute Importance

Parent topic: Transforms Nodes

Creating Filter Columns Node

You create a Filter Columns node to filters out columns so that the columns are not used in subsequent workflow calculations.

Before you define filter columns, you must identify a Data Source node and decide whether to find important attributes.

To define Filter Columns:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Expand the Transforms section and click Filter Columns.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Connect the Data Source node to the Filter Columns node:
1. Right-click the Data Source node, and select Connect.
2. Draw a line to the Filter Columns node and click again.
Right-click the Filter Columns node and click Edit.
Right-click the Filter Columns node and click Run. Monitor the running of the node in Workflow Jobs. If the Workflow Jobs is not open, then in View, go to Data Miner and click Workflow Jobs.
After the running of the node is complete, right-click the Filter Columns node and select View Data to view the results of the filtered columns.

Related Topics

Editing Filter Columns Node

Parent topic: Filter Columns Node

Editing Filter Columns Node

In the Edit Filter Columns Node dialog box, you can define or edit the filter performed by the Filter Columns node.

For supervised cases, the icon next to the attribute name indicates that it is the target attribute. Also, in the top right corner, the target attribute is indicated by this icon. You can perform the following tasks:

View attribute importance: Select Show Attribute Importance to view the attribute importance in the table columns. The information is displayed in the columns Rank and Importance.
View data quality: Select Show Data Quality to display Filter Columns settings in terms of percent age of null values (% Null), percentage of values that are unique (% Unique), and percentage of constants (% Constants).
Exclude Columns: You can exclude columns when you first edit the Filter Columns node.
Edit or view Filter Columns settings: You can edit or view Filter Columns settings when you first edit the Filter Columns node.
Calculate important attributes. Click Settings to enable attribute importance.
Evaluate hints and decide which columns to filter out. Additional information is available in the form of Hints after the Filter Columns node runs.
Apply recommended settings to selected settings, click
Revert output settings. After applying recommended settings to another settings, if you want to revert to the original settings, then click
Explore pairwise dependency in the dataset: Click to view the pairwise dependency between attributes in the Explore Dependencies dialog box.
Edit dependency settings: Click to view and edit the dependency settings in the Predictor Dependencies dialog Box.

Note:
This option is enabled only after the node is run in Supervised mode.

Exclude Columns
By default, all columns are selected for output. That is, all columns are passed to the next node in the workflow.
Define Filter Columns Settings
You can create and edit Filter Columns settings in the Define Filter Columns Settings dialog box.
Explore Dependencies
In the Explore Dependencies dialog box, you can view the pairwise dependency between two attributes.
Predictor Dependencies
You can view the dependency of the selected attribute with other attributes, and set them to be considered as output in the Predictor Dependencies window.
Performing Tasks After Running Filter Columns Node
Lists the tasks that you can perform after you run the Filter Columns node.
Columns Filter Details Report
When you run the Filter Columns node, the Columns Filter Details Report is generated in the Edit Column Details dialog box.
Attribute Importance
Oracle Data Miner ranks the attributes by significance in determining the target value.

Parent topic: Filter Columns Node

Exclude Columns

By default, all columns are selected for output. That is, all columns are passed to the next node in the workflow.

To exclude a column, click . The arrow is crossed out, indicated by . The excluded column is ignored, that is, it is not passed on.
To view or change settings, click Settings. The Define Filter Columns Settings dialog box opens.

Related Topics

Define Filter Columns Settings

Parent topic: Editing Filter Columns Node

Define Filter Columns Settings

You can create and edit Filter Columns settings in the Define Filter Columns Settings dialog box.

There are three kinds of settings:

Data Quality: Allows Filter Columns settings in terms of percent age of null values, percentage of values that are unique, and percentage of constants. The default values for Data Quality are specified in preferences. You can change the default. You can specify the following Data Quality criteria:
- % Nulls less than or equal: Indicates the largest acceptable percentage of Null values in a column of the data source. You may want to ignore columns that have a larger percentage of Null values. The default value is 95 percent.
- % Unique less than or equal: Indicates the largest acceptable percentage of values that are unique in a column of the data source. If a column contains many unique values, then it may not contain useful information for model building. The default value is 95 percent.
- % Constant less than or equal: Indicates the largest acceptable percentage of constant values in a column of the data source. If most of the values in a column is the same, then the column may not be useful for model building.
Attribute Importance: Enables you to build an attribute importance model to identify important attributes. By default, this setting is turned OFF. Filter Columns does not calculate Attribute Importance.
- Target: The value for which to find important attributes. Usually the target of a classification problem.
- Importance Cutoff: A number between 0 and 1.0. This value identifies the smallest value for importance that you want to accept. If the importance of an attribute is a negative number, then that attribute is not correlated with the target, so the cutoff should be nonnegative. The default cutoff is 0. The rank or importance of an attribute enables you to select the attribute to be used in building models.
- Top N: The maximum number of attributes. The default is 100.
- Attribute Dependency: Select this option to generate pairwise dependency information. In case of supervised mode, you can modify the output columns that are used in the result. Attribute Dependency is selected by default if Attribute Importance is selected.
  
  Note:
  You must select Attribute Importance to generate Attribute Dependency.
Sampling (Data Quality and Attribute Importance): Enables you to select the number of rows, that could be system determined or user specified. The default values for sampling are specified in preferences. You can change the default or even turn off sampling. The default sample size is 10,000 records.

Related Topics

Parent topic: Editing Filter Columns Node

Explore Dependencies

In the Explore Dependencies dialog box, you can view the pairwise dependency between two attributes.

You must select Attribute Importance in order to generate Attribute Dependency.

To view the pairwise dependency between attributes:

In the Attribute field, enter the name of the attribute for which you want to view its dependency with other attributes in pairs. The list of all pairwise dependencies that contains the selected name is displayed.
In the Fetch Size field, provide a number to determine the fetch size of the data in the table and click Query.
In the Sort By Dependency field, select:
- Ascending: To view the attribute dependency value in ascending order.
- Descending: To view the attribute dependency value in ascending order.
Click Query.
It lists the dependency of the attribute with other attributes in pairs.
Click Close.

Parent topic: Editing Filter Columns Node

Predictor Dependencies

You can view the dependency of the selected attribute with other attributes, and set them to be considered as output in the Predictor Dependencies window.

The Predictor Dependencies window comprises two panels.

The top panel or the Master Table displays the attributes that contain pairwise dependency. It displays the following columns for the selected attribute:

Columns
Importance
Weighted Dependency
Output: Indicates whether the attribute is considered as output or not. Click the green arrow to change it to so that the attribute is not considered as output.

The bottom panel or the Detail Table displays the attributes that are dependent with the selected attribute in the top panel. Select Show all Columns to view all attributes in the data source. If Show all Columns is deselected, then only the attributes present in the Master Table are displayed. It displays the following columns for the selected attribute:

Columns
Dependency
Output: Indicates if the column is used as an output or not. The green arrow indicates that the column is used in output, and the green arrow with a red cross indicates that it is not considered in the output.

You can search attributes and modify the fetch size.

To search an attribute either in the top panel or bottom panel, enter an attribute name and click .
The columns of the selected attribute are displayed in the respective panels.
In the Fetch Size field in the bottom pane, select a number to display that many number attributes. For example, if you enter 20, then only 20 attributes will be displayed.
After you enter the number, click Query.
The attributes and their dependency are displayed in the lower pane.
Click OK.

Parent topic: Editing Filter Columns Node

Performing Tasks After Running Filter Columns Node

Lists the tasks that you can perform after you run the Filter Columns node.

You can perform the following tasks:

View hints: To view hints, double-click the Filter Columns node. The Edit Column Filter Details Node dialog box displays hints, to indicate attributes that did not pass the data quality checks. For more information, click .
- Summary information is displayed about data quality.
- Values are indicated graphically in the Data Viewer.
If you specified Attribute Importance, then:
- Hints indicate attributes that do not have the minimum importance value.
- The importance of each column is displayed.
Exclude columns: Go to the Output column for the attribute and click . The icon in the Output column changes to . The selected columns are ignored or excluded, which means that the columns are not for subsequent nodes. It is not necessary to run the nodes again.
Accept recommendations:
- For several recommendations, select the attributes and click .
- For all recommendations, type Ctrl+A and click .
Apply recommended output settings: Attributes that have Hints are not passed on. Attributes that have no hints are not changed and they are passed on.
Create a Table or View node: The output of this node is a data flow. To create a table containing the results, use a Create Table or View Node.

Related Topics

Parent topic: Editing Filter Columns Node

Columns Filter Details Report

When you run the Filter Columns node, the Columns Filter Details Report is generated in the Edit Column Details dialog box.

Columns of the grid summarize the data quality information.

The default settings show both Attribute Importance and Data Quality.

If Attribute Importance is selected, then following are displayed:
- Rank
- Importance
If Data Quality is selected, then the following columns are displayed:
- % Null
- % Unique
- % Constant

The Hints column in the grid indicates the columns in the data set that do not pass data quality or do not meet the minimum importance value.

Bar graphs give a visual indication of values.

For example, if the percentage of Null values is larger than the value specified for % Nulls less than or equal, then a hint is generated that indicates that the percentage of Null values is exceeded. If the percent of NULL values is very large for a column, then you may want to exclude that column.

Parent topic: Editing Filter Columns Node

Attribute Importance

Oracle Data Miner ranks the attributes by significance in determining the target value.

If a data set has many attributes, then it is likely that not all attributes contribute to a predictive model. Some attributes may simply add noise, that is, they actually detract from the predictive value of the model. You can then filter out attributes that are not important in determining the target value.

Using fewer attributes does not necessarily result in loss of predictive accuracy. Using too many attributes, can affect the model and degrade its performance and accuracy. Mining using the smallest number of attributes can save significant computing time and may build better models.

The following are applicable for Attribute Importance:

Attribute Importance is most useful with Classification.
The target for Attribute Importance in Filter Column should be the same as the target of the Classification model that you plan to build.
Attribute Importance calculates the rank and importance for each attribute.
- The rank of an attribute is an integer.
- The Importance of an attribute is a real number, which can be negative.

Specify these values for attribute importance:

Target: The value for which to find important attributes. Usually the target of a classification problem.

Note:
For Unsupervised Attribute Importance, the Target is set to Not Specified, if the target is not specified by the user.
Importance Cutoff: A number between 0 and 1.0. This value identifies the smallest value for importance that you want to accept. If the importance of an attribute is a negative number, then that attribute is not correlated with the target, so the cutoff should be nonnegative. The default cutoff is 0. The rank or importance of an attribute enables you to select the attribute to be used in building models.
Top N: The maximum number of attributes. The default is 100.
Select a Sample technique for the Attribute Importance calculation. The default is system determined. You can also select Stratified or Random.

System determined has a stratified cutoff value with a default value of 10.
- If the distinct count of the selected column is greater than the cutoff value, then use random sampling.
- If the distinct count of the selected column is less than or equal to the cutoff value, then use stratified sampling.
Certain combinations of target and sampling may result in performance problems. You are given a warning if there is a performance problem.

Attribute Importance Viewer
To view an Attribute Importance model, build a Filter Columns node with Attribute Importance selected.

Parent topic: Editing Filter Columns Node

Attribute Importance Viewer

To view an Attribute Importance model, build a Filter Columns node with Attribute Importance selected.

Right-click the node and select View Data. Results are displayed in a new Filter Columns Details tab. The viewer has these tabs:

Attribute Importance: Lists those attributes with Importance greater than or equal to 0. Attributes are listed in order of rank from lowest rank (most important) to highest rank. The tab also displays the data type of each attribute. A blue bar indicates the rank. You can sort any of the columns by clicking the column heading.
- To filter columns, that is, to limit the number of columns displayed, use .
- To clear filter definition, click . You can also search by name, type, rank or importance.
Data: Lists the important attributes in order of importance, largest first. For each attribute rank and importance, values are listed. Only those attributes with an importance value greater than or equal to 0 are listed.
Columns: Displays the columns created by Attribute Importance, that is attribute name, rank, and importance value.
SQL: This is the SQL that generates the details.

Parent topic: Attribute Importance

Filter Columns Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

Filter Columns Node Properties has these sections:

Columns: Displays the columns of the data source. After the node runs, hints are displayed.
Filters.
Cache
Details

Related Topics

Define Filter Columns Settings

Parent topic: Filter Columns Node

Filter Columns Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right-click the Filter Columns node. The following options are available in the context menu:

Connect
Edit.
Validate Parents
Run
View Data.
Force Run
Deploy
Show Graph
Generate Apply Chain
Cut
Copy
Paste
Extended Paste
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Select All
Show Event Log
Show Runtime Errors. Displayed if there are errors.
Show Validation Errors. Displayed if there are validation errors.
Navigate

Related Topics

Editing Filter Columns Node

Parent topic: Filter Columns Node

Filter Columns Details

The Filter Columns Details node creates a data flow that consists of the result of Attribute Importance.

For each attribute, the rank and importance values are listed.

Note:

Filter Columns Details must be connected to a Filter Columns Node that has Attribute Importance selected in Settings. Otherwise, the Filter Columns Details node is Invalid.

Filter Columns Details can run in parallel.

This section consists of the following topics:

Creating the Filter Columns Details Node
You create a Filter Columns Details node to create a data flow that consists of the result of Attribute Importance.
Editing the Filter Columns Details Node
You can define or edit the filter performed by the Filter Columns node.
Filter Columns Details Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Filter Columns Details Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Related Topics

Parent topic: Transforms Nodes

Creating the Filter Columns Details Node

You create a Filter Columns Details node to create a data flow that consists of the result of Attribute Importance.

Before creating the Filter Columns Details node, you must identify a Filter Columns node, where Attribute Importance is selected in the Settings.

To create a Filter Columns Details node:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Expand the Transform section and click Filter Columns Details.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Connect the Filter Columns node to the Filter Columns Details node:
1. Right-click the Filter Columns node, and select Connect.
2. Draw a line to the Filter Columns Details node and click again.
You can right-click the Filter Columns Details node and select Edit. In this release there are no options to select.
Right-click the Filter Columns Details node and select Run. Monitor the running of the node in the Workflow Jobs. If the Workflow Jobs is not open, then go to View and click Data Miner. Under Data Miner, click Workflow Jobs.
After the running of the node is complete, right-click the Filter Columns Details node and select View Data to view the results.
The output of this node is a data flow. To create a table containing the results, use a Create Table or View node.

Note:
Filter Columns Details consists of the results of Attribute Importance only. It does not contain any information about Data Quality.

Related Topics

Create Table or View Node

Parent topic: Filter Columns Details

Editing the Filter Columns Details Node

You can define or edit the filter performed by the Filter Columns node.

You can perform the following tasks:

Exclude Columns: You can exclude columns when you first edit the Filter Columns node.
Edit or view Filter Columns settings: You can edit or view Filter Columns settings when you first edit the Filter Columns node.
Calculate important attributes: Click Settings to enable Attribute Importance.
Evaluate: Evaluate hints and decide which columns to filter out. Additional information.

Parent topic: Filter Columns Details

Filter Columns Details Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The Filter Columns Node Properties has these sections:

Output: The only valid value is Attribute Importance, which is the default.

A grid lists the data types of ATTRIBUTE_NAME, RANK, and IMPORTANCE_VALUE.
Cache
Details

Parent topic: Filter Columns Details

Filter Columns Details Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right-click the Filter Columns Details node. The following options are available in the context menu:

Connect
Edit.
Validate Parents
Run
View Data
Force Run
Deploy
Show Graph
Generate Apply Chain
Cut
Copy
Paste
Extended Paste
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Select All
Show Event Log
Show Runtime Errors. Displayed if there are errors.
Show Validation Errors. Displayed if there are validation errors.
Navigate

Related Topics

Attribute Importance Viewer

Parent topic: Filter Columns Details

Filter Rows

A Filter Rows node enables you to select rows by specifying a SQL statement that describes the rows.

For example, to select all rows where CUST_GENDER is F, specify: CUST_GENDER = 'F'

You can either write the SQL expression directly or use Expression Builder.

Filter Rows can run in parallel.

This section consists of the following topics:

Creating a Filter Rows Node
You create a Filter Rows node to select rows by specifying SQL statements, as applicable.
Edit Filter Rows
The Edit Filter Rows dialog box defines or edits the filter performed by the Filter Rows node.
Filter Rows Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Filter Rows Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Related Topics

Parent topic: Transforms Nodes

Creating a Filter Rows Node

You create a Filter Rows node to select rows by specifying SQL statements, as applicable.

Identify a Data Source node. Identify or create the node to filter. The node can be any node that provides a data flow, including Data Source nodes.

To define a Filter Rows node:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Expand the Transform section and click Filter Rows.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Connect the Data Source node to the Filter Rows node:
1. Move the cursor to the Data Source node.
2. Right-click the Data Source node, and select Connect.
3. Drag the line to the Filter Rows node and click again.
Right-click the Filter Rows node and select Edit. Use the Edit Filter Rows dialog box to define the filter.
Right-click the Filter Rows node and select Run. Monitor the running of the node in the Workflow Jobs. If the Workflow Jobs is not open, then go to View and click Data Miner. Under Data Miner, click Workflow Jobs.
After the running of the node is complete, right-click the Filter Rows node and select View Data to view the results of the Filter Rows.

Related Topics

Edit Filter Rows

Parent topic: Filter Rows

Edit Filter Rows

The Edit Filter Rows dialog box defines or edits the filter performed by the Filter Rows node.

The Edit Filter Rows dialog has two tabs:

Filter
The filter is one or more SQL expressions that describe the rows to select.
Columns
The Columns tab lists the output columns.

Parent topic: Filter Rows

Filter

The filter is one or more SQL expressions that describe the rows to select.

To create or edit a filter:

Open the Expression Builder by clicking .
Write the SQL query to use for filtering.
After specifying an expression, you can delete it. Select it and click .
After you are done, click OK. Data Miner validates the expression.

Related Topics

Expression Builder

Parent topic: Edit Filter Rows

Columns

The Columns tab lists the output columns.

You can filter in several ways.

Click OK when you have finished. Data Miner validates the expression.

Related Topics

Filter

Parent topic: Edit Filter Rows

Filter Rows Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.To view the properties of the Filter Rows node:

The Filter Rows node Properties tab has these sections:

Filter: The SQL expression created with Edit Filter Rows. You can modify the expression in the Properties by clicking .
Columns: The output data columns. For each column, name, alias (if any), and data types are listed.
Cache
Details

Related Topics

Parent topic: Filter Rows

Filter Rows Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right-click the Filter Rows node. The following options are available in the context menu:

Connect
Edit
Validate Parents
Run
View Data
Force Run
Deploy
Show Graph
Generate Apply Chain
Cut
Copy
Paste
Extended Paste
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Select All
Show Event Log
Show Runtime Errors. Displayed if there are errors.
Show Validation Errors. Displayed if there are validation errors.
Navigate

Related Topics

Data Viewer

Parent topic: Filter Rows

Join

A Join node combines data from two or more Data Source nodes into a new data source.

Technically, a Join node is a query that combines rows from two or more tables, views, or materialized views. For example, a Join node combines tables or views (specified in a FROM clause), selects only rows that meet specified criteria (WHERE clause), and uses projection to retrieve data from two columns (SELECT statement).

Join can run in parallel.

This section contains the following topics:

Create a Join Node
You create a Join node combines data from two or more Data Source nodes into a new data source.
Edit a Join Node
In the Edit Join Node dialog box, you can specify or change the characteristics of the models to build.
Join Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Join Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Related Topics

Oracle Database Concepts

Parent topic: Transforms Nodes

Create a Join Node

You create a Join node combines data from two or more Data Source nodes into a new data source.

Specify two or more Data Source nodes and at least one output column.

Joins are sometimes very slow. If you materialize the join input as an indexed table, then the join may be much faster. The output of a Join node is a data flow.

Note:

To materialize a join input as a table or view, connect it to a Create Table or View node.

To join two or more Data Source nodes:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Connect the Data Source nodes to be joined to the Join node:
1. Move the cursor to one of the nodes to join.
2. Right-click the node, and select Connect.
3. Draw a line to the Join node and click again.
4. Repeat until all nodes to join are connected to the Join node.
Right-click the Join node and select Edit. Use the Edit Join Node option to define the Join node.
Right-click the Join node and select Run. Monitor the running of the node in the Workflow Jobs. If the Workflow Jobs is not open, then go to View and click Data Miner. Under Data Miner, click Workflow Jobs.
After the running of the node is complete, right click the Join node and select View Data to view the results of the join.
You can also define the join and view results through the Join node properties.

Related Topics

Create Table or View Node

Parent topic: Join

Edit a Join Node

In the Edit Join Node dialog box, you can specify or change the characteristics of the models to build.

You can define a Join node in one of the following ways:

Either double-click the Join node, or right-click the node and select Edit. Click the Join tab.
Select the node. In the Properties pane, select the Join tab. Click .

In either case, the Edit Join Node dialog box opens.

Edit Join Node
In the Edit Join Node dialog box, you can add columns, define filters and resolve issues related to the join specifications.
Edit Columns
The default setting is to use Automatic Settings for the list of columns displayed.
Edit Output Data Column
In the Edit Output Data Column dialog box, you can exclude columns from the output.
Resolve
In the Resolve dialog box, you can resolve issues related to join specifications which may become invalid due to different reasons.

Parent topic: Join

Edit Join Node

In the Edit Join Node dialog box, you can add columns, define filters and resolve issues related to the join specifications.

Click the Join tab if it is not displayed. In the Edit Join Node dialog box, you can perform the following tasks:

To add a new Join column, click . The Edit Join Column dialog box opens.
1. In the Edit Join Column dialog box, select the Data Sources—Source 1 and Source 2. You can search columns in either Source by Name or by Data Type.
2. Select one entry in Source 1 and the corresponding entry in Source 2.
3. Click Add. Data Miner selects an appropriate Join Type. Column 1 (from Source 1), Column 2 (from Source 2), and the Join type are displayed in a grid. You can search this grid by Column 1, Column 2, or Join Type.
4. After you are done, click OK.
To select the columns in the Join, click the Column tab to display Edit Columns dialog box.
To define a filter for the Join, select the Filter tab and enter an appropriate SQL expression. You can also use SQL Worksheet (part of SQL Developer) to write a filter.

If there are problems with the join, for example, if one of the data nodes is no longer connected to the Join node, then an information indicator is displayed as follows:

Click Resolve Issues. This opens the Resolve dialog box.

Description of the illustration resolvejoin.eps

Related Topics

Parent topic: Edit a Join Node

Edit Columns

The default setting is to use Automatic Settings for the list of columns displayed.

To select columns, go to the Columns tab of Edit Joins Details in one of the following ways:

Right-click the Join node and select Edit. Then click Columns.
Select the Join node. In the Properties pane, click Columns.

To make changes, deselect Automatic Settings. You can perform the following tasks:

Edit the list of columns: Open the Edit Output Data Column dialog box and click .
Delete a column from the output: Select the column and click .

If the node was run, then you must run it again.

Related Topics

Edit Output Data Column

Parent topic: Edit a Join Node

Edit Output Data Column

In the Edit Output Data Column dialog box, you can exclude columns from the output.

The default setting is to include all columns from both tables in the output.

To remove a column from the output:

Move the columns from the Selected Attributes list to the Available Attributes list.
Click OK.

Parent topic: Edit a Join Node

Resolve

In the Resolve dialog box, you can resolve issues related to join specifications which may become invalid due to different reasons.

When a Data Source node is disconnected from a Join node, all the join specifications for that node are retained and are marked as Invalid. Before you run the Join node, you must resolve the issues. The Resolve dialog box provides two ways to resolve the join issues:

Remove: Removes all invalid entries from all specifications (Apply and Data).
Resolve: Displays a grid that enables you to associate an unassigned node with a missing node. The missing nodes are listed in the grid, and an action is suggested.

Parent topic: Edit a Join Node

Join Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.To view the properties of a Join node:

The Properties pane for a Join node has these sections:

Join: Defines the Join.
Columns: Displays the output columns of the Join. For each column, name, node, alias (if any) and data type are displayed. Up to 1000 columns are displayed.
Filter: Filters results by defining filter conditions using the Expression Builder. Open the Expression Builder by clicking .
Cache
Details

Related Topics

Parent topic: Join

Join Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right-click the Join node. The following options are available in the context menu:

Connect
Edit
Validate Parents
Run
View Data
Force Run
Deploy
Show Graph
Generate Apply Chain
Cut
Copy
Extended Paste
Paste
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Select All
Show Event Log
Show Runtime Errors. Displayed if there are errors.
Show Validation Errors. Displayed if there are validation errors.
Navigate

Related Topics

Data Viewer

Parent topic: Join

JSON Query

The support for the JSON data format in Oracle Data Miner (SQL Developer 4.1 and later ) is facilitated by the JSON Query node.

JSON or JavaScript Object Notation is a data format, that enables users to store and communicate sets of values, lists, and key-value mappings across systems.

The JSON Query node projects the JSON data format to the relational format. It supports only one input data provider node, such as Data Source node. You can perform the following tasks in the JSON Query node:

Select any JSON attributes in the source data to project it as relational data
Select relational columns in the source data to project it as relational data
Define aggregation columns on JSON data
Preview output data
Construct a JSON Query based on user specifications

Note:

JSON Query Node is supported on Oracle Database 12.1.0.2 and later.

Create JSON Query Node
A JSON Query node should be connected to an input provider node, such as a Data Source node.
JSON Query Node Editor
In the Edit JSON Query Node dialog box, you can only operate on the input columns that are pseudo JSON types.
JSON Query Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
JSON Query Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Data Types and their Supported Operators
Lists the data types and their supported operators for JSON data type.

Parent topic: Transforms Nodes

Create JSON Query Node

A JSON Query node should be connected to an input provider node, such as a Data Source node.

To run the nodes successfully, the input provide node must contain JSON data.

To create a JSON Query node:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
In the Transforms section, click JSON Query Node.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Right-click the node, for example a Data Source node, from which to create the connection and click Connect in the context menu.
Draw a line from the selected node to the JSON Query node and click again. This connects the JSON Query node to the Data Source node.

Parent topic: JSON Query

JSON Query Node Editor

In the Edit JSON Query Node dialog box, you can only operate on the input columns that are pseudo JSON types.

To open the Edit JSON Query Node dialog box:

Double-click the JSON Query node, or
Right-click the node and click Edit.

The Edit JSON Query Node dialog box consists of the following tabs:

JSON
In the JSON tab, you can select JSON data, specify filters on attributes and so on.
Additional Output
In the Additional Output tab, you can select relational columns in the source data for the output.
Aggregate
You can define aggregation column definitions on JSON attributes in the Aggregate tab.
Preview
You can preview the node output in the Preview tab.

Parent topic: JSON Query

JSON

In the JSON tab, you can select JSON data, specify filters on attributes and so on.

In the Columns drop-down list, only the input columns containing JSON data (pseudo JSON data types) are listed. You can also specify filters on attributes for the following data type: ARRAY, BOOLEAN, NUMBER and STRING. The filters are applied to the data in hierarchical order by using the logical operators as specified by you, such as Match All or Any. Select an input column from the drop-down list.

Click edit to set and apply filter settings. The Filter Settings dialog box opens.

The JSON tab consists of the following:

Structure
In the Structure tab, the JSON data structure for the selected column is displayed.
Data
The Data tab displays the JSON data that is used to create the JSON structure.
Filter Settings
In the Filter Settings dialog box, you can specify filters on attributes for the data type: ARRAY, BOOLEAN, NUMBER and STRING.

Parent topic: JSON Query Node Editor

Structure

In the Structure tab, the JSON data structure for the selected column is displayed.

The structure or the Data Guide table must be generated in the parent source node, for example, Data Source node. If the structure is not found, then a message is displayed to communicate the same.

The following information about the data structure is displayed:

JSON Attribute: Displays the generated JSON structure in a hierarchical format. You can select one or more attribute to import. When you select a parent attribute, all child attributes are automatically selected.
JSON Data Type: Displays the JSON data types of all attributes, derived from JSON data.
Unnest: All attributes inside an array are unnested in a relational format. By default, the Unnest option is enabled. When the Unnest option for an array attribute is disabled, then:
- The child attributes are displayed but cannot be selected.
- If the array attribute is selected for output, then the output column will contain JSON representation of the array.

You can perform the following tasks in the Structure tab:

Set viewing preferences: In the View drop-down list, you can set your viewing preferences by clicking any of the following:
- All: To view all attributes.
- Only Selected: To view only the selected attributes.
- Only with Filters: To view only those attributes, along with their parent attributes, that have filter definitions applied.
- Selected and with Filters: To view only the selected attributes that have filter definitions.
Select attributes: To select one or more attribute, click the check box against the attribute.

Note:

When you select an attribute, the parent attribute is selected automatically. If you select a parent attribute, then all its children attributes are automatically selected. This is applicable if none of its immediate child attribute is part of the group selection.
Copy filter: Click to copy the filter of an attribute to local cache. You can then apply the copied filter to another attribute with the same data type by using the paste option.

Note:

This option is enabled only if the selected attribute has a filter definition.
Paste filter: After copying a filter from an attribute, click the attribute to which you want to paste the filter and click .

Note:

Attributes with compatible data type can accept the copied filter. For example, a filter that is copied from an attribute with NUMBER data type can be pasted to attributes with NUMBER data type only.
Clear filter: Select the attribute from which you want to remove the filter and click .
Edit filter: You can add or edit filter for any of the attribute type STRING, NUMBER, BOOLEAN, and ARRAY, by using the in place edit option. To edit or add a filter to an attribute:
1. Select the attribute and click . Alternately, select the attribute and double click the corresponding Filter column cell. The in place edit option for the selected attribute is enabled with the applicable operators listed in the drop-down list. Select an operator from the drop-down list.
2. Select the value from the corresponding field by clicking . The Find Values dialog box opens.
3. In the Find Values dialog box, select the values and click OK. The filter and the values are now visible in the JSON Query Node Editor dialog box, as displayed below:
  
  Description of the illustration attributefilter.eps
To complete the editing, press the Enter key. To cancel editing, press the Esc key

Parent topic: JSON

Data

The Data tab displays the JSON data that is used to create the JSON structure.

In the text panel, the data is displayed in read-only mode. You can select text for copy and paste operations.

You can query the data to be viewed. To query data, click query .

Parent topic: JSON

Filter Settings

In the Filter Settings dialog box, you can specify filters on attributes for the data type: ARRAY, BOOLEAN, NUMBER and STRING.

You can set the filter settings for:

Edit Filter Settings for:
- All
- Any
Apply Filter Settings to:
- JSON Unnest: Applies filter to the JSON source data to be used for projection to relational data format. Only filtered data are projected.
- Aggregations: Applies filter to the JSON data to be used for aggregation only.
- JSON Unnest and Aggregations: Applies filter to both JSON unnest and data to be used for aggregation.

After you set the filter settings, click OK.

Parent topic: JSON

Additional Output

In the Additional Output tab, you can select relational columns in the source data for the output.

The input columns that are used by the aggregation definitions in the Aggregate tab are automatically added to the list for output.

You can perform the following tasks here:

Add relational columns: Click to add relational columns in the Edit Output Data Column Dialog.
Delete relational columns: Select the relational columns to delete and click .

Edit Output Data Column Dialog
In the Edit Output Data Column dialog box, all the relational columns available in the data source are listed. You can select one or more columns to be added to the output.

Parent topic: JSON Query Node Editor

Edit Output Data Column Dialog

In the Edit Output Data Column dialog box, all the relational columns available in the data source are listed. You can select one or more columns to be added to the output.

To add columns:

In the Available Attributes list, select the columns that you want to include in the output.
Click the right arrow to move the attributes to the Selected Attributes list. To exclude any columns from the output, select the attribute and click the left arrow.
Click OK. This includes the columns in the output, and they are listed in the Additional Output tab.

Related Topics

Additional Output

Parent topic: Additional Output

Aggregate

You can define aggregation column definitions on JSON attributes in the Aggregate tab.

The Aggregate tab displays the information in two sections:

Group By Attribute section: Here, the Group By attributes are listed, along with the attribute count. You can perform the following tasks:
- View JSON Paths: Click JSON Paths to display the attribute name with context information. For example, $."customers"."cust_id". If not enabled, then only the attribute name is displayed.
- Edit and Add Attributes: Click to add Group By attributes in the Edit Group By dialog box.
- Delete Attributes: Select the attributes to delete and click .
Aggregate Attributes section: Here, the aggregation columns are displayed along with the column count.
- View JSON Paths: Click JSON Paths to display the attribute name with context information. For example, $."customers"."cust_id". If not enabled, then only the attribute name is displayed.
- Define Aggregations Columns: Click to define an aggregation column in the Add Aggregations dialog box.
- Delete Aggregation column: Click to delete selected columns.

Add Aggregations
In the Add Aggregation dialog box, you can define functions for JSON attributes.
Edit Sub Group By
In the Edit Sub Group By dialog box, you can add Sub Group By attributes to the selected JSON attribute.
Edit Group By
The Edit Group By dialog box displays the relational columns above the JSON attribute collection.

Parent topic: JSON Query Node Editor

Add Aggregations

In the Add Aggregation dialog box, you can define functions for JSON attributes.

The dialog box displays JSON structures in a hierarchical view. You can select multiple attributes and then apply an aggregation function to it.

Note:

Object and Array type attributes cannot be selected.

You can perform the following tasks:

Define aggregation functions:
1. Select the JSON attributes. You can select multiple attributes by pressing the Ctrl key and clicking the attributes for which you want to define functions.
2. Click to select and apply a function for the selected attributes. The applicable functions are listed. Select the function that you want to apply.
  
  Alternately, click the corresponding row in the Function column. The applicable functions are listed in a drop-down list. Select the function that you want to apply. Using this option you can define function for only one attribute at a time.
3. Click OK.
Clear Aggregation Definition: Select the attribute and click . The defined function along with the output and Sub Group By entries are deleted.
Edit Sub Group By Elements: Select the attribute and click . The Edit Sub Group By dialog box opens.
Search: Click to find attribute based on partial attribute name.

Related Topics

Edit Sub Group By

Parent topic: Aggregate

Edit Sub Group By

In the Edit Sub Group By dialog box, you can add Sub Group By attributes to the selected JSON attribute.

To add attributes:

In the upper pane, expand the Available Attributes folder.
Select the attributes that you want to add as Sub Group By attributes. The selected attributes are listed in the lower pane, which also displays the count of attributes that are added.
Click OK.

Parent topic: Aggregate

Edit Group By

The Edit Group By dialog box displays the relational columns above the JSON attribute collection.

You can add relational columns as part of the top level Group By. To add relational columns:

In the upper pane, expand the Available Attributes folder.
Select the columns that you want to add. The selected columns are listed in the lower pane.
Click OK.

Parent topic: Aggregate

Preview

You can preview the node output in the Preview tab.

The output is displayed in two tabs:

Parent topic: JSON Query Node Editor

Output Columns

In the Output Column tab, the columns in the header are displayed in a grid format. Click JSON Paths to view source attribute name.

If you click JSON Paths, then the source attribute name along with the contextual information is displayed. For example, $."customers"."cust_id".
If you do not click JSON Paths, then only the attribute name is displayed. For example, cust_id.

The following details of the columns are displayed in the Output Columns tab:

Name: Displays the output column names
Data Type: Displays the data type of the output column
Data Source: Displays the source of the attribute name
JSON Paths: Displays the attribute source
Aggregate: Displays the aggregation function used for the aggregations
Group By: Displays the Group By attributes
Sub Group By: Displays the Sub-Group By attributes used in the aggregations

Parent topic: Preview

Output Data

In the Output Data tab, the query result of the top N rows is displayed. The query reflects the latest user specifications. The query result is displayed in a grid format.

Parent topic: Preview

JSON Query Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The JSON Query node Properties pane has these sections:

Output
The Output section in the Properties pane displays the Output Columns in read-only mode.
Cache
The Cache section provides the option to generate cache for the output data.
Details
The Details section displays the name of the node and any comments about it.

Parent topic: JSON Query

Output

The Output section in the Properties pane displays the Output Columns in read-only mode.

Related Topics

Output Columns

Parent topic: JSON Query Node Properties

Cache

The Cache section provides the option to generate cache for the output data.

To generate cache output:

Select Generate Cache of Output Data to Optimize Viewing of Results to generate cache output.
In the Sampling Size field, select an option:
- Number of Rows: The default sampling size is 2000. Use the arrows to set a different number.
- Percent: Move the pointer to set the percentage.

Parent topic: JSON Query Node Properties

Details

The Details section displays the name of the node and any comments about it.

You can change the name and add comments in the fields here:

Node Name
Node Comments

Parent topic: JSON Query Node Properties

JSON Query Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

The context menu for a JSON Query Node has these selections:

Connect
Run
Force Run
Edit
View Data
Generate Apply Chain
Show Event Log
Validate Parents
Deploy
Save SQL
Cut
Copy
Paste
Select All
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Go to Properties
Navigate

Related Topics

JSON Query Node Editor

Parent topic: JSON Query

Data Types and their Supported Operators

Lists the data types and their supported operators for JSON data type.

Table 7-2 Data Types and their Supported Operators

Data Type	Supported Operators	Description
Array	In	Retrieves elements using specified indices, such as 0, 1, or index range such as 2:4, or a combination of both.
Array	<	Retrieves elements that have indices less than the specified index.
Boolean	True	Retrieves elements that matches the condition.
Boolean	False	Retrieves elements that do not match the condition.
Numbers	In	Retrieves numbers that are in the condition.
	Not In	Retrieves numbers that are not included in the condition.
	=	Retrieves numbers that are equal to the given condition.
	!=	Retrieves numbers that are not equal to the given condition.
	>	Retrieves numbers that are greater than the given condition.
	>=	Retrieves numbers that are greater than or equal to the given condition.
	<	Retrieves numbers that are lesser than the given condition.
	<=	Retrieves numbers that are lesser than or equal to the given condition.
String	In	Retrieves elements that are in the condition.
	Not In	Retrieves elements that are not in the condition.
	Starts With	Retrieves elements that starts with the string in the condition.
	Contains	Retrieves elements that contains elements that matches in the condition.
	=	Retrieves elements that are equal to the condition.
	!=	Retrieves elements that are not equal to the condition.
	>	Retrieves elements that are greater than the condition.
	>=	Retrieves elements that are greater than or equal to the condition.
	<	Retrieves elements that are lesser than the condition.
	<=	Retrieves elements that are lesser than or equal to the condition.

Parent topic: JSON Query

Sample

You can sample your data in the Sample tab.

A Sample node enables you to sample data in one of the following ways:

Random Sample: A sample in which every element of the data set has an equal chance of being selected.
Top N Sample: The default sample that selects the first N values.
Stratified Sample: A sample is created as follows:
- First, the data set is divided into disjoint subsets or strata.
- Then, a random sample is taken from each subsets.
This technique is used when the distribution of target values is skewed greatly. For example, the response to a marketing campaign may have a positive target value of 1% of the time or less.

Sampling nested data is best done with a Case ID. The Sample node can run in parallel.

This section has the following topics:

Sample Nested Data
Sampling nested data may require a case ID.
Creating a Sample Node
You create a Sample node to create samples your data.
Edit Sample Node
In the Edit Sample Node dialog box, you can define and edit a sample. The settings describe the type of sample to create and the size of the sample.
Sample Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Sample Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Parent topic: Transforms Nodes

Sample Nested Data

Sampling nested data may require a case ID.

If you do not specify a case ID, then the sample operation may fail for a nested column that is very dense and deep. Failure may occur if the amount of nested data per row exceeds the maximum of 30,000 for a specific column or row.

A case ID also allows Data Miner to perform stratified sorts on data that is dense and deep.

Parent topic: Sample

Creating a Sample Node

You create a Sample node to create samples your data.

Before creating a Sample node, you must identify a Data Source node and the details of the sample

To create a Sample node:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Expand the Transform section and click Sample.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Connect the Data Source node to the Sample node:
1. Move the cursor to the Data Source node.
2. Right-click the Data Source node and select Connect from the context menu.
3. Drag the line to the Sample node and click again.
Either double-click the Sample node, or right-click the Sample node, and click Edit. The Edit Sample Node dialog box opens.
Define the sample in the Edit Sample Node dialog box.
Right-click the Sample node and click Run. Monitor the running of the node in the Workflow Jobs. If the Workflow Jobs is not open, then go to View and click Data Miner. Under Data Miner, click Workflow Jobs.
After the running of the node is complete, right-click the Sample node and select View Data to view the results of the sample.

Related Topics

Edit Sample Node

Parent topic: Sample

Edit Sample Node

In the Edit Sample Node dialog box, you can define and edit a sample. The settings describe the type of sample to create and the size of the sample.

To edit the settings for the Sample node:

Open the Edit Sample Node dialog box:
- Either double-click the Sample node, or right-click the Sample node, and select Edit.
- Select the node and go to the Settings tab in the Sample node Properties pane.
In the Edit Sample Node dialog box, you can provide and edit the following details:
- Sample Size: This is the number of rows in the sample. You can specify the number of rows in terms of:
  - Number of rows (default)
  - Percent The default is 60 percent.
- Rows: This is the number of rows in the sample. You can change the default value, and enter a different value. The default is 2000.
- Sample Type: The options are:
  - Random (Default)
  - Top N
  - Stratified

Random
Top N
Stratified
Custom Balance
The Custom Balance dialog box enables you to specify exactly how the selected column is balanced.

Parent topic: Sample

Random

For a random sample, specify the following:

Seed: The default seed is 2345 You can specify a different integer.
Case ID (optional): Select a case ID from the drop-down list.

If you specify a seed and a case ID, then the sample is reproducible.

Parent topic: Edit Sample Node

Top N

No other specifications are available for Top N.

Parent topic: Edit Sample Node

Stratified

For a stratified sample, specify the following:

Column: Select the column for stratification.
Seed: Default seed=12345 You can specify a different integer.
Case ID (optional): Select a case ID from the drop-down list.

If you specify a seed and a case ID, then the sample is reproducible.
Distribution: Specify how the sample is to be created. There are three options:
- Original: The distribution of the selected column in the sample is the same as the distribution in data source. For example, if the column GENDER has M as the value for 95 percent of the cases, then in the sample, the value of GENDER is M for 95% of the cases.
- Balanced: The distribution of the values of the column is equal in the sample, regardless of the distribution in the data source. If the column is GENDER, and GENDER has two values M and F, then 50% of the time the value of GENDER is M.
- Custom: You define how the values of the columns are distributed in the sample. You must run the node once before you define the custom distribution. Click Edit to open the Custom Balance dialog box.

The Stratified dialog box displays a histogram of the values of the selected column at the bottom of the window. To see more details, click View to display the Custom Balance dialog box.

Related Topics

Custom Balance

Parent topic: Edit Sample Node

Custom Balance

The Custom Balance dialog box enables you to specify exactly how the selected column is balanced.

You must run the node to collect statistics before you create custom balance. After you run the node, edit it, select Custom Distribution and click View. The Custom Balance dialog opens:

You can either create custom entries for each value of the stratify attribute, or you can click Original or Balanced to provide a starting point. Click Reset to reset to the original values.

To create a custom value, select the attribute to change and click edit .

Change the value in the Sample Count column to the custom value. Press Enter. The new sample is displayed as output in the lower part of the screen. Change as many values as required. Click OK when you have finished.

Description of stratified_custom.eps follows

Description of the illustration stratified_custom.eps

Parent topic: Edit Sample Node

Sample Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The Sample node Properties pane has these sections:

Settings: You can specify the following:
- Sample Size: Select a sample size in terms of:
  - Percent. Default=60%
  - Number of Rows. The default number of rows is 2000.
- Sample Type: The options are:
  - Random (Default)
  - Stratified
  - Top N
- Seed: The default seed is 12345. You can specify a different integer.
- Case ID: This is an optional field. Select a case ID from the drop-down list. If you specify a seed and a case ID, then the sample is reproducible.
Cache
Details

Cache
The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference.
Details
The Details section displays the name of the node and any comments about it.

Related Topics

Parent topic: Sample

Cache

The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference.

You can perform the following tasks:

Generate Cache of Output Data to Optimize Viewing of Results: Select this option to generate a cache. The default setting is to not generate a cache.
- Sampling Size: You can select caching or override the default settings. Default sampling size is Number of Rows Default Value=2000

Related Topics

Transforms

Parent topic: Sample Node Properties

Details

The Details section displays the name of the node and any comments about it.

You can change the name and comments in the fields here:

Node Name
Node Comments

Parent topic: Sample Node Properties

Sample Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right click the Sample node. The following options are available in the context menu:

Connect
Edit
Validate Parents
Run
View Data
Force Run
Deploy
Show Graph
Generate Apply Chain
Cut
Copy
Extended Paste
Paste
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Select All
Show Event Log
Show Runtime Errors. Displayed if there are errors.
Show Validation Errors. Displayed if there are validation errors.
Navigate

Related Topics

Data Viewer

Parent topic: Sample

Transform

A Transform node can use either sampled data or all data to calculate statistics.

You define transformation on a per-column basis. After you have defined a transformation, you can transform several columns in the same way. The Transform node can run in parallel. To use a Transform node, connect it to a data flow, that is a Data Source node or some other node such as a filtering node that produces attributes. Then select the attributes to transform.

Supported Transformations
The transformations available depend on the data type of the attribute. For example, normalization cannot be performed on character data.
Support for Date and Time Data Types
Lists the supported data types for Transform nodes.
Creating Transform Node
You create a Transform node to define transformation and transform columns.
Edit Transform Node
You can define and edit the Transform node using the Edit Transform Node dialog box.
Transform Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Transform Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Parent topic: Transforms Nodes

Supported Transformations

The transformations available depend on the data type of the attribute. For example, normalization cannot be performed on character data.

You define transformation on a per-column basis. After you have defined a transformation, you can transform several columns in the same way.

You can use these statistics as a guide to defining one of several transformation. The following transformations are supported:

Binning
Binning is a transformation type that converts continuous variable into a categorical variable, continuous value into a continuous value, and categorical value into a categorical variable.
Custom
In the Custom dialog box, you can compute a new value for a field based on combinations of existing attributes and common function.
Missing Values
The Missing Values Transform enables you to specify how missing values are treated.
Normalization
Normalization consists of transforming numeric values into a specific range, such as [–1.0,1.0] or [0.0,1.0] such that x_new = (x_old-shift)/scale. Normalization applies only to numeric attributes.
Outlier
An Outlier is a data value that is not in the typical population of data, that is, extreme values. In a normal distribution, outliers are typically at least 3 standard deviations from the mean.

Related Topics

Parent topic: Transform

Binning

Binning is a transformation type that converts continuous variable into a categorical variable, continuous value into a continuous value, and categorical value into a categorical variable.

Binning converts the following:

A continuous variable into a categorical variable.
A continuous value into a continuous value. For example, age can be converted to 10 groups from 1 to 10.
A categorical value with many values into a categorical variable with fewer variables.

For example, salary is a continuous variable. If you divide salary into 10 bins, then you convert salary into a categorical variable with 10 values, where each value represents a salary range.

You can bin both numbers and character types VARCHAR2 and CHAR.

Recode

Parent topic: Supported Transformations

Recode

Oracle Data Miner does not support recode transformations. However, you can use custom binning to perform a recode transformation. For example, to recode the US states ME, NH, VT, CT, MA, and RI to the value NE, create a custom bin that puts the 5 states into a bin named NE.

Parent topic: Binning

Custom

In the Custom dialog box, you can compute a new value for a field based on combinations of existing attributes and common function.

Use Expression Builder to create the new attribute.

Related Topics

Expression Builder

Parent topic: Supported Transformations

Missing Values

The Missing Values Transform enables you to specify how missing values are treated.

A data value can be missing for a variety of reasons:

If the data value was not measured, that is, it has a Null value.
If the data value was not answered.
If the data value was unknown.
If the data value was lost.

Data Mining algorithms vary in the way they treat missing values:

Ignore the missing values, and then omit any records that contain missing values.
Replace missing values with the mode or mean.
Infer missing values from existing values.

Parent topic: Supported Transformations

Normalization

Normalization consists of transforming numeric values into a specific range, such as [–1.0,1.0] or [0.0,1.0] such that x_new = (x_old-shift)/scale. Normalization applies only to numeric attributes.

Oracle Data Miner enables you to specify the following kinds of normalization:

Min Max:: Normalizes each attribute using the transformation x_new = (x_old-min)/(max-min)
Linear Scale: Normalizes each attribute using the transformation x_new = (x_old-shift)/scale
Z-Score: Normalizes numeric attributes using the mean and standard deviation computed from the data. Normalizes each attribute using the transformation x_new = (x-mean)/standard deviation
Custom: The user defines normalization.

Normalization provides transformations that perform min-max normalization, scale normalization, and z-score normalization.

Note:

You cannot normalize character data.

Parent topic: Supported Transformations

Outlier

An Outlier is a data value that is not in the typical population of data, that is, extreme values. In a normal distribution, outliers are typically at least 3 standard deviations from the mean.

You specify a treatment by defining what constitutes an outlier (for example, all values in the top and bottom 5 percent of values) and how to replace outliers.

Note:

Usually, you can replace outliers with null or edge values.

For example:

Mean of an attribute distribution=10

Standard deviation=5

Outliers are values that are:

Less than -5 (The mean minus 3 times the standard deviation)
Greater than 25 (The mean plus three times the standard deviation)

Then, in this case you can either replace the outlier -10 with Null or with 5.

Parent topic: Supported Transformations

Support for Date and Time Data Types

Lists the supported data types for Transform nodes.

The Transform Node provides limited support for these data types for date and time:

DATE
TIMESTAMP
TIMESTAMP_WITH_TIMEZONE
TIMESTAMP_WITH_LOCAL_TIMEZONE

You can bin date and time attributes using Equal Width or Custom binning. You can apply Statistic and Missing Value transformation with either Statistic or Value treatment.

Parent topic: Transform

Creating Transform Node

You create a Transform node to define transformation and transform columns.

Before you specify a transform, you must identify a Data Source node or any other node which provides data, such as the Create Table Node, and the details of the transform.

To create a Transform node:

In the Components pane, go to Workflow Editor. If the Components pane is not visible, then in the SQL Developer menu bar, go to View and click Components. Alternately, press Ctrl+Shift+P to dock the Components pane.
Expand the Transform section and click Transform.
Drag and drop the node from the Components pane to the Workflow pane.
The node is added to the workflow. The GUI shows that the node has no data associated with it. Therefore, it cannot be run.
Connect the Data Source node to the Transform node:
1. Move the cursor to the Data Source node.
2. Right-click the Data Source node and select Connect.
3. Drag the line to the Transform node and click again.
Either double-click the Transform node or right-click it node and select Edit. Use the Edit Transform Node dialog box to define the transform.
Right-click the Transform node and select Run. Monitor the running of the node in Workflow Jobs. If Workflow Jobs is not open, then go to View and click Data Miner. Under Data Miner, click Workflow Jobs.
After the running of the node is complete, right-click the Transform node and select View Data to view the results of the transform.

Related Topics

Edit Transform Node

Parent topic: Transform

Edit Transform Node

You can define and edit the Transform node using the Edit Transform Node dialog box.

This dialog box has two tabs:

Transformations
Statistics

In the Transformation tab, the statistics for each column is displayed. If you do not want to see statistics, then deselect Show Statistics.

Note:

You must run the node to see statistics.

You can perform the following tasks in the Transformations tab:

Define Transformation: Select one or more original columns, that is, columns that are not transformed. Click .

The Add Transform dialog box opens if you selected one or fewer columns. Otherwise, the Apply Transform Wizard opens.
Define Custom Transformation: Select one or more original columns, that is, columns that are not transformed. Click .

The Add Custom Transform dialog box opens. You can add custom transformations here.

The default behavior is to ignore the original column and to use the transformed column as output. The values displayed in the Output column are indicated by:
- : For a column that is included
- : For a column that is ignored
Change Values in the Output Column: Click the icon displayed in the Output column to edit the values in the Add Transform dialog box.
Edit Transformed Column: You can edit transformed columns only. For example, you can edit AGE_BIN, but not AGE. To edit transforms, select one or more transformed columns and click . The Edit Transform dialog box opens if you selected one or fewer columns.
Delete Transform: Select one or more transformed columns and click .
Filter Column: To limit the number of columns displayed, click . You can search by:
- Output Column
- Transform
- Source Column
Clear filter definition: To clear filter definitions, click .
View effects of transform: To view the effect of a transformation:
- Run the node.
- After running of the node is complete, double-click the node.
- Select the transformed column to see histograms that compare the original column with the transformed column.

When a column has a transformation applied to it, a new row in the list of columns is generated. Because each column must have a name, the name of the new row is based on the name of the old column and the type of transformation performed. Typically users want to transform a column and only have the output of the transform node contained in the new column. The original column has an option set to prevent it from being passed as one of the output columns. For example, if you bin AGE creating AGE_BIN, then AGE is not passed on and AGE_BIN is passed on.

Add Transform
The Add Transform dialog box allows you to add custom transformation such as Binning, Missing Values, Normalization and so on, depending on the attribute of the data type.
Add Custom Transform
In the Add Custom Transform dialog box, you can define a custom transformation.
Apply Transform Wizard
The Apply Transform wizard enables you to define or edit transformations for several columns at the same time.
Edit Transform
If the node has run, then the Edit Transform dialog box displays information about both the untransformed column and transformed version
Edit Custom Transform
In the Edit Custom Transform dialog box, you can edit expressions using the Expression Builder.

Parent topic: Transform

Add Transform

The Add Transform dialog box allows you to add custom transformation such as Binning, Missing Values, Normalization and so on, depending on the attribute of the data type.

To add a transformation:

In the Edit Transform Node dialog box, click . The Add Transform dialog box opens. To add a custom transformation, click .
In the Transform Type field, select a transform type, that is, the type of transformation that you want to define. Default type is Binning. The field in the Add Transform dialog box depend on the transform type that you select:
- Binning:. For binning transformation, enter the details as applicable.
- Missing Values
- Normalization
- Outlier
- Use Existing Column
After you are done, click OK.

Binning
Binning is a type of transformation.
Bin Equal Width (Number)
The Bin Equal Width selection determines bins for numeric attributes by dividing the range of values into a specified number of bins of equal size.
Bin Quantile
The Bin Quantile selection divides attributes into bins so that each bin contains approximately the same number of cases.
Bin Top N
The Bin Top N type bins categorical attributes. The bin definition for each attribute is computed based on the occurrence frequency of values that are computed from the data.
Custom
Custom binning enables you to define custom bins.
Missing Values
Missing Values is a transformation type that replaces missing values with an appropriate value.
Normalization
Normalization consists of transforming numeric values into a specific range, such as [–1.0,1.0] or [0.0,1.0] so that x_new = (x_old-shift)/scale.
Outlier
An outlier is a data value that does not come from the typical population of data. In other words, it is an extreme value.
Use Existing Column
The Use Existing Column option is available only when there is at least one transformation.
Add or Edit Multiple Transforms
You can define or edit transformations for several column at a time. You can also apply an existing transformation to one or more columns.

Parent topic: Edit Transform Node

Binning

Binning is a type of transformation.

You can use Binning to:

Transform a continuous variable to a discrete one.
Transform a variable with large number of discrete values to one with a smaller number of discrete values.

The default transformation type is Binning. The types of binning supported depend on the data type of the column:

For numeric data type NUMBER, the supported types of binning are:
- Bin Equal Width (Number) (Default)
- Bin Quantile
- Custom
For categorical data type VARCHAR2, the supported types of binning are:
- Bin Top N (Default)
- Custom
For date and time data types DATE, TIMESTAMP, TIMESTAMP WITH LOCAL TIMEZONE, and TIMESTAMP WITH TIMEZONE, the supported types of binning are:
- Bin Equal Width (Number) (Default)
- Custom

Note:

The number of bins must be two.

Parent topic: Add Transform

Bin Equal Width (Number)

The Bin Equal Width selection determines bins for numeric attributes by dividing the range of values into a specified number of bins of equal size.

Edit the following fields:

Bin Count: You can change the Bin Count to any number greater than or equal to 2. The default count is set to 10.
Bin Label: Select a different Bin Label scheme from the list. The default is set to Range.

Click OK when you have finished.

Parent topic: Add Transform

Bin Quantile

The Bin Quantile selection divides attributes into bins so that each bin contains approximately the same number of cases.

Edit the following fields:

Bin Count: You can change the Bin Count to any number greater than or equal to 2. The default count is set to 10.
Bin Label: You can select a different Bin Label scheme from the list. The default is set to Range.

Click OK when you have finished.

Parent topic: Add Transform

Bin Top N

The Bin Top N type bins categorical attributes. The bin definition for each attribute is computed based on the occurrence frequency of values that are computed from the data.

Specify N, the number of bins. Each of the bins bin_1, …, bin_N contains the values with the highest frequencies. The last bin_N contains all remaining values.

You can change the Bin Count to any number greater than or equal to 3. The default count is set to 10.

Click OK when you have finished.

Parent topic: Add Transform

Custom

Custom binning enables you to define custom bins.

To define bins, click Bin Assignment and then modify the default bins. After you generate default bins, you can modify the generated bins in several ways:

Edit Bin Name: If it is a Range label.
Delete Bins: Select it and click .
Add Bins: Click .
Edit Bins: Select a bin and click .

Bin Assignment
In the Bin Assignment dialog box, you can specify options for bin type, bin count, bin labels and so on for different data types.
Edit Bin
The data type of the attribute determines the way to edit bins.
Add Bin
You can add bins for Categorical and Numerical data types.

Parent topic: Add Transform

Bin Assignment

In the Bin Assignment dialog box, you can specify options for bin type, bin count, bin labels and so on for different data types.

To assign bins, select the following options:

Binning Type: The default type depends on the data type of the attribute that you are binning:
- If the data type of the attribute is Number, then the default binning type is Bin Equal Width.
- If the data type of the attribute is Character, then the default binning type is Bin Top N.
You can change the binning type for numbers.
Bin Count: The default count is 10. You can change this to any integer greater than 2.
Bin Labels: The default label for numbers is Range. You can change the bin label to Number.
Transform NULLs: If the Transform NULLs check-box is selected for a binning transformation that produces NUMBER data type, then null values are placed into the last bin. For example, if the AGE column has null values and Equal Width Binning was requested with Bin Labels value equal to Number, and the number of bins is 10, then null values will be in bin number 11. For this option, the following conditions apply:
- When deselected, null values are excluded from the generated transformation SQL.
  
  Note:
  
  Applicable only for those binning transformations that produce VARCHAR2 data type after transformation.
- This field is not editable for those binning transformations which produce numeric data type after transformation.
- For legacy workflows, this field is selected by default, and the corresponding field contains the value Null bin.

Click OK when you have finished. You return to the Custom display where you modify the generated bins.

Parent topic: Custom

Edit Bin

The data type of the attribute determines the way to edit bins.

You can edit bins for number and characters in the following ways:

For numbers: Edit lower bounds in the grid. You cannot edit a bin without a lower bound. You cannot add a value that is less than the prior bin lower bound value or greater than the following bin lower bound value.
For characters: The Edit Custom Categorical Bins dialog box has two columns:
- Bins: You can add bins, delete a selected bin and change the name of the selected bin.
- Bin Assignment: You can delete values for the selected Bin.

Click OK when you have finished editing bins. If you are editing custom bins for categorical, then first click OK twice, once to closed the Edit Custom Categorical Bins dialog box.

Parent topic: Custom

Add Bin

You can add bins for Categorical and Numerical data types.

To add bins:

Categoricals: Open the Edit Custom Categorical Bins and click . The new bin has a default name that you can change. Add values to the bin in the Bin Assignment column.
Numericals: Select a bin and click . You can change the name of the bin and add a range of values.

Parent topic: Custom

Missing Values

Missing Values is a transformation type that replaces missing values with an appropriate value.

To specify a Missing Values transformation:

In the Transform Type field, select the option Missing Values.
In the Missing Values field, select an option:
- Statistic: Replaces the missing value with a statistical measure. Statistic is the default treatment for missing values. The applicable Statistic type depends on the data type of the column:
  - For numeric columns, you can replace missing values with Mean (default), Median, Minimum, Maximum.
  - For categorical columns, you can replace missing values with Mode (default).
- Value: Replaces Missing Values with the specified value. Oracle Data Miner provides a default value that you can change.
  - If statistics are not available, then the default value is 0.
  - If statistics are available, then the default value is: Mean for numerical columns Mode for categorical columns
  Both these treatments can be applied to attributes that have a date or time data type DATE, TIMESTAMP, TIMESTAMP_WITH_LOCAL_ TIMEZONE, and TIMESTAMP_WITH_TIMEZONE.
After you are done, click OK.

Parent topic: Add Transform

Normalization

Normalization consists of transforming numeric values into a specific range, such as [–1.0,1.0] or [0.0,1.0] so that x_new = (x_old-shift)/scale.

Normalization usually results in values whose absolute value is less than or equal to 1.0.

Note:

Normalization applies only to numeric columns. Therefore, you can normalize numeric attributes only.

To normalize a column:

In the Transform Type field, select the option Normalization.
In the Normalization Type field, select a type from the drop-down list. Oracle Data Miner supports these types of normalization:
- Min Max: Normalizes the column using the transformation x_new = (x_old-min)/ (max-min). The default is min-max.
- Z-score: Normalizes numeric columns using the mean and standard deviation computed from the data. Normalizes each column using the transformation x_new = (x-mean)/standard deviation.
- Linear Scale: Normalizes each column using the transformation x_new = (x-0)/ max(abs(max), abs(min)).
- Manual: Defines normalization by specifying the shift and scale for the transformation x_new = (x_old-shift)/scale. If you select Manual, then specify the following:
  - Shift
  - Scale
After you are done, click OK.

Parent topic: Add Transform

Outlier

An outlier is a data value that does not come from the typical population of data. In other words, it is an extreme value.

In a normal distribution, outliers are typically at least 3 standard deviations from the mean. Outliers are usually replaced with values that are not extreme, or they are replaced with Null.

Note:

You can define outlier treatments for numeric columns only.

To define an Outlier transformation:

In the Transform Type field, select the option Outlier.
In the Outlier Type field, select any one of the following options:
- Standard Deviation: This is the default Outlier type. For this outlier type, enter a Standard Deviation to define the Outlier in the following field:
  - Multiples of Sigma: This is the number of standard deviations that define an outlier. The default is 3, that is, 3 standard deviations. 3 Standard Deviation means that an outlier is a value less than mean - 3 * Standard Deviation or greater than mean + 3* Standard Deviation.
- Percent: Enables you to specify that outliers are values in a bottom percentage and a top percent. The default is to specify that outliers are in the bottom 5 percent or in the top 5 percent. You can change the defaults by entering values in these fields:
  - Lower Percent Value
  - Upper Percent Value
- Value: Enables you to specify a lower value and an upper value so that outliers are those values less than the lower value or greater than the upper value. You can change these values, but the upper value must be bigger than the lower value.
  - Lower Value: If statistics are available, then the default is -3* standard deviation. If statistics are not available, then the default is 0.
  - Upper Value: If statistics are available, then the default is +3* standard deviation. If statistics are not available, then the default is 1.
In the Replace With field, select an option to specify how to replace outliers. The options are:
- Null (Default)
- Edge Value
  
  For example: If the mean of a column distribution is 10, and If standard deviation is 10,then the outliers can be:
  - Values that are less than -5, that is, Mean-3*Standard Deviation
  - Values that are greater than 25, that is, Mean+3*Standard Deviation
  Outlier=-10. You can replace -10 with Null or with -5, which is the edge value.
After you are done, click OK.

Parent topic: Add Transform

Use Existing Column

The Use Existing Column option is available only when there is at least one transformation.

This selection is used when you add, or edit multiple transformation.

Related Topics

Add or Edit Multiple Transforms

Parent topic: Add Transform

Add or Edit Multiple Transforms

You can define or edit transformations for several column at a time. You can also apply an existing transformation to one or more columns.

To add or edit transformations for multiple transforms:

Double-click the Transform node. The Transformation Editor opens.
To define the same transform for several columns, select the columns. You can select columns with different but compatible data types. For example, CHAR and VARCHAR are characters, and are compatible data types. If there are no transformations that apply to all the columns, then you get a message. Click .

The Apply Transform Wizard opens.
1. Select the Transform type to apply to all columns.
2. Provide the specific details related to the transform type that you have selected.
3. Click Next.
4. Click Generate Statistic.
5. Click Finish.
If you have already transformed a column, then you can define the same transformation for several other columns.

Suppose you have binned AGE creating AGE_BIN. To bin several columns in the same way, select AGE and the columns that you want to bin in the same way. Click .

The Apply Transform Wizard opens.
1. For Transform Type, select Use Existing AGE_BIN is listed as the Transformed column. You cannot change any other values.
2. Click Next. You can change the names of the output columns.
3. Select Generate Statistic on Finish.
4. Click Finish.
To edit several transformations at the same time, select the transformations and click .

The Apply Transform Wizard opens. Edit the transformation and click Finish.

Related Topics

Apply Transform Wizard

Parent topic: Add Transform

Add Custom Transform

In the Add Custom Transform dialog box, you can define a custom transformation.

The default name for the new attribute is EXPRESSION. You can change this name.

In the Add Custom Transform dialog box, you can perform the following tasks:

Add an expression: Click . The Expression Builder opens. Use expression build to define an expression.
- Validate the expression.
- Click OK.
Edit a custom transformation.
Delete a custom Transformation: Click .

Related Topics

Parent topic: Edit Transform Node

Apply Transform Wizard

The Apply Transform wizard enables you to define or edit transformations for several columns at the same time.

The first step of the wizard is similar to the Add Transform dialog box. You cannot select custom transformations.

In the Choose Transformation section:
- Transform: Select the transform type.
- Provide the details related to the selected transformation type.
Click Next.
In the Choose Columns section, specify names for the transformed columns. You can accept the names or change them. The available transformations are those transformations that can be performed on all selected columns. This is an optional section.
Click Finish.

Define Columns
The second step of the Apply Transform wizard enables you to specify names for the transformed columns. You can accept the names or change them.

Related Topics

Add Transform

Parent topic: Edit Transform Node

Define Columns

The second step of the Apply Transform wizard enables you to specify names for the transformed columns. You can accept the names or change them.

The default is not to generate statistics on finish. Select the check box to generate statistics.

Click Finish when you have finished.

Parent topic: Apply Transform Wizard

Edit Transform

If the node has run, then the Edit Transform dialog box displays information about both the untransformed column and transformed version

The Edit Transform dialog box is similar to the Add Transform dialog box.

The Edit dialog box displays information about both the untransformed column and transformed version in the following tabs:

The Histogram tab shows histogram for both the untransformed attribute and the transformed in two sets of histograms. On the left side of the tab are histograms for the untransformed column. On the right side of the tab are histograms for the transformed column.
The Statistics tab shows statistics for the transformed data and for the original data.

Note:

When you transform data, the transformed data may have a different data type from the type of the original data. For example, AGE has type NUMBER and AGE_BIN has type VARCHAR2.

Related Topics

Add Transform

Parent topic: Edit Transform Node

Edit Custom Transform

In the Edit Custom Transform dialog box, you can edit expressions using the Expression Builder.

To edit an expression:

Select the attribute and click . The Expression Builder opens.
Use the Expression Builder to modify the expression.
Validate the expression.
Click OK.

To delete an expression, click delete .

Related Topics

Expression Builder

Parent topic: Edit Transform Node

Transform Node Properties

In the Properties pane, you can examine and change the characteristics or properties of a node.

To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.

The Transform node Properties pane has these sections:

Transform: Specifies how the transformations are defined. You can modify these values.

The transformations are summarized in a grid. For each column name (data) type, transform, and output are displayed. If you bin AGE creating AGE_BIN, then AGE is not used as output, that is, is not passed to subsequent nodes.
Histogram: Specifies the number of bins used in histograms. You can specify a different number of bins for histograms created for numeric, categorical, and date data types. The default is 10 bins for all data types.
Sample
Cache
Details

Related Topics

Edit Transform Node

Parent topic: Transform

Transform Node Context Menu

The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.

Right click the Transform node. The following options are available in the context menu:

Connect
Edit
Validate Parents
Run
View Data
Force Run
Deploy
Show Graph
Generate Apply Chain
Cut
Copy
Extended Paste
Paste
Select All
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Show Event Log
Show Runtime Errors. Displayed if there are errors.
Show Validation Errors. Displayed if there are validation errors.
Navigate

Related Topics

Data Viewer

Parent topic: Transform