7 Transforms Nodes
A Transforms node performs one or more transformations on the table or tables identified in a Data node.
Transformations are available in the Transforms section in the Components pane. The Evaluate and Apply Data nodes must be prepared in the same way that build data was prepared.
- Aggregation
Aggregation is the process of consolidating multiple values into a single value. - Data Viewer
You can view data when the Transform node is in a valid state. - Expression Builder
Expression Builder helps you to enter and validate SQL expressions, such as constraints for filters. - Filter Columns Node
Filter Columns filters out columns so that the columns are not used in subsequent workflow calculations. - Filter Columns Details
The Filter Columns Details node creates a data flow that consists of the result of Attribute Importance. - Filter Rows
A Filter Rows node enables you to select rows by specifying a SQL statement that describes the rows. - Join
A Join node combines data from two or more Data Source nodes into a new data source. - JSON Query
The support for the JSON data format in Oracle Data Miner (SQL Developer 4.1 and later ) is facilitated by the JSON Query node. - Sample
You can sample your data in the Sample tab. - Transform
A Transform node can use either sampled data or all data to calculate statistics.
Aggregation
Aggregation is the process of consolidating multiple values into a single value.
For example, you could aggregate sales in several states into sales for a region that consists of several states. To perform aggregation, use an Aggregate node.
These topics describe Aggregate nodes:
- Creating Aggregate Nodes
You must identify a Data Source node and columns to aggregate to create an Aggregation node. - Editing Aggregate Nodes
You can define and edit aggregation elements of an Aggregate node in the Edit Aggregate Node dialog box. - Aggregate Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Aggregate Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Related Topics
Parent topic: Transforms Nodes
Creating Aggregate Nodes
You must identify a Data Source node and columns to aggregate to create an Aggregation node.
Related Topics
Parent topic: Aggregation
Editing Aggregate Nodes
You can define and edit aggregation elements of an Aggregate node in the Edit Aggregate Node dialog box.
To edit an Aggregate node:
- Edit Group By
In the Edit Group By dialog box, you can edit the Group By settings for aggregation. - Define Aggregation
You can define aggregations using the Define Aggregations wizard. - Edit Aggregate Element
You can define or modify the individual elements of an aggregation. - Add Column Aggregation
In the Add Column Aggregation dialog box, you can define how a column is aggregated. - Add Custom Aggregation
In the Add Custom Aggregation dialog box, you can define expressions for custom aggregation.
Parent topic: Aggregation
Edit Group By
In the Edit Group By dialog box, you can edit the Group By settings for aggregation.
The default type isColumn.
You can change the type to: Expression.
-
If Type is
Column,
then select one or more columns in the Available Attributes list. You can search the list by name or by data type. Move the selected columns to the Selected Attributes list using the arrows. -
If the Type is
Expression,
then type an appropriate expression in the Expression box.
Click Validate to validate the expression. After you are done, click OK.
Parent topic: Editing Aggregate Nodes
Define Aggregation
You can define aggregations using the Define Aggregations wizard.
To define aggregations:
Parent topic: Editing Aggregate Nodes
Edit Aggregate Element
You can define or modify the individual elements of an aggregation.
To define or modify the individual element:
- In Output, you can provide a name. To provide a name, deselect Auto Name and enter a name. By default, Auto Name is selected. Output is the name of the column that holds the results of the aggregation.
- Select or change the Column that is being aggregated.
- Select the function to apply to the column. The functions available depend on the data type of the column.
- Click Edit to define a new Sub-Group By column. The Edit Group By dialog box opens.
- Once done, click OK.
Related Topics
Parent topic: Editing Aggregate Nodes
Add Column Aggregation
In the Add Column Aggregation dialog box, you can define how a column is aggregated.
To add an attribute:
- Click .
- To provide a name, deselect Auto Name and type in the name. By default,
Auto Name
is selected. Output is the name of the column that holds the results of the aggregation. - Select Columns to aggregate from the list.
- Select a Function to apply to the column. The functions available depend on the data type of the column. For example, you can specify average (AVG) for a numeric value. For DATE and TIMESTAMP data types, the functions available are
COUNT(), COUNT (DISTINCT()), MAX(), MEDIAN(), MIN(), STATS_MODE().
- To define a Sub Group By column, click Edit. The Edit Group By dialog box opens. It is not necessary to define a Sub Group By column.
- After you are done, click OK.
Related Topics
Parent topic: Editing Aggregate Nodes
Add Custom Aggregation
In the Add Custom Aggregation dialog box, you can define expressions for custom aggregation.
To add a custom aggregation, click and follow these steps:
Related Topics
Parent topic: Editing Aggregate Nodes
Aggregate Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Aggregate Node Properties pane has these sections:
-
Columns, the columns for the aggregation.
-
Cache
-
Details
- Cache
The Cache section provides the option to generate a cache for output data. - Details
The Details section displays the name of the node and any comments about it.
Related Topics
Parent topic: Aggregation
Cache
The Cache section provides the option to generate a cache for output data.
The default setting is to not generate the cache to optimize the viewing of results. You can generate the cache.
If you generate the cache, then specify the sampling size as either:
-
Number of rows: The default is
2000
rows. -
Percent: The default is
60
percent.
Parent topic: Aggregate Node Properties
Details
The Details section displays the name of the node and any comments about it.
You can change the name and comments in the fields here:
-
Node Name
-
Node Comments
Parent topic: Aggregate Node Properties
Aggregate Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Right-click an Aggregation node. The following options are available in the context menu:
-
Edit.
-
View Data.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there are errors.
-
Show Validation Errors. Displayed if there are validation errors.
Related Topics
Parent topic: Aggregation
Data Viewer
You can view data when the Transform node is in a valid state.
To view data, right-click the node and select View Data from the context menu. The data viewer opens.
The data viewer has these tabs:
- Data
The Data tab displays a sample of the data. - Graph
The Graph tab enables you to create graphs from numeric data. - Columns
The Column tab is a list of all the columns that are output from the node. - SQL
In the SQL tab, the SQL Details text area displays the SQL code that generated the data provided by the actual view displayed in the Data tab.
Parent topic: Transforms Nodes
Data
The Data tab displays a sample of the data.
The Data Viewer provides grid display of rows of data either from the sampling defined in the cache or pulled through the lineage of nodes back to the source tables.
The display is controlled with:
-
Refresh: To refresh the display, click .
-
View: Enables you to select either Cached Data or Actual Data.
-
Sort: Displays the Select Column to Sort By dialog box.
-
Filter: Enables you to type in a
WHERE
clause to select data.
Parent topic: Data Viewer
Select Column to Sort By
The Select Column to Sort By dialog box enables you to:
-
Select multiple columns to sort.
-
Determine the column ordering.
-
Determine ascending or descending order by column.
-
Specify
Nulls First
so that null values appear ahead of real data values.
The sort order is preserved until you clear it.
Column headers are also sort-enabled to provide a temporary override to the sort selection.
Parent topic: Data
Graph
The Graph tab enables you to create graphs from numeric data.
Related Topics
Parent topic: Data Viewer
Columns
The Column tab is a list of all the columns that are output from the node.
The display in the tab depends on the following conditions:
-
If the node has not run, then the Table or View structure provided by the database is displayed.
-
If the node has run successfully, then the structure of the sampled table is displayed. This is based on the sampling defined when the node was specified.
For each column, the following are displayed:
-
Name
-
Data type
-
Mining type
-
Length
-
Precision
-
Scale (for floating point)
-
Column ID
There are several filtering options that limit the columns displayed. The filter settings with the (or)/(and)
suffixes allow you to enter multiple strings separated by spaces. For example, if the Name/Data Type/Mining Type(or)
is selected, then the filter string A B
produces all columns where the name or the data type or the mining type starts with the letter A or B.
Parent topic: Data Viewer
SQL
In the SQL tab, the SQL Details text area displays the SQL code that generated the data provided by the actual view displayed in the Data tab.
The SQL can be a stacked expression that includes SQL from parent nodes, depending on what lineage is required to access the actual data.
You can perform the following tasks:
-
Copy and run the SQL query in a suitable SQL interface. The following options are enabled:
-
Select All (Ctrl+A)
-
Copy (Ctrl+C)
-
-
Search texts. The Search control is a standard search control that highlights matching text and searches forward and backward.
Parent topic: Data Viewer
Expression Builder
Expression Builder helps you to enter and validate SQL expressions, such as constraints for filters.
An expression is a SQL statement or clause that transforms data or specifies a restriction. Expression Builder displays the available columns, provides a selection of functions and commonly used operators, and validates expressions.
To build and validate expressions in Expression Builder:
-
Click in the Add Custom Transform dialog box. The Expression Builder dialog box opens.
-
The Expression Builder dialog box has the following components:
-
Attributes: Lists attributes (columns) in the source data. To insert an attribute into the query that you are creating in the Expression box, or to replace characters that you selected, double-click the attribute at the current character position.
-
Functions: Lists the commonly used SQL Functions, divided into folders. Double-click a folder to view the functions listed there. To insert a function into the expression at the current character position or to replace characters that you selected, double-click the function.
-
Expression: The expression that you create is displayed in the Expression box. You can create an expression in any one of the following ways:
-
Type the expression in the Expression box directly.
-
Add Attributes and Functions by double-clicking them in the Attributes tab and the Functions tab respectively.
To add an operator into the expression, click the operator.
-
-
Commonly used operators are listed in below the Expression box. Click the appropriate operator, indicated by its symbols. If you prefer, you can enter an operator directly into the Expression box. Table 7-1 lists the operators that you can enter:
Table 7-1 Commonly Used Operators
To Enter this Operator Click Less than
<
Greater than
>
Less than or equal to
... for the symbol <=
Greater than or equal to
... for the symbol >=
Not equal to
!=
Equal to
=
Or (Logical Or)
...
And
...
Left parenthesis
(
Right parenthesis
)
Parallel symbol
||
Add
+
Minus
-
Multiply
*
Divide
/
Percent
%
-
Validation Results text area (read-only): Displays the validation results.
-
Validate: Click Validate to validate the expression in the Expression box. The results appear in Validation Results.
-
-
When you have finished creating the expression, click OK.
- Functions
Expression Builder includes a variety of functions that can be applied to character, numeric, and date data.
Related Topics
Parent topic: Transforms Nodes
Functions
Expression Builder includes a variety of functions that can be applied to character, numeric, and date data.
There are functions that support most of the common data preprocessing required for data mining, including missing values treatment. To see a list of the available functions, expand the category of interest.
Functions are divided into the following categories:
-
Character: Includes concatenate, trim, length, substring, and others.
-
Conversion: Converts to character, date, number, and others.
-
Date: Calculates next day, inserts time stamp, truncates, rounds, and performs other date operations.
-
Numeric: Includes functions such as absolute value, ceiling, floor, trigonometric functions, hyperbolic functions, logarithms, and exponential values.
-
Analytical: Performs analytical functions.
-
NULL Value Substitution: For dates, characters, and numbers.
The notation for the functions is that of SQL.
Related Topics
Parent topic: Expression Builder
Filter Columns Node
Filter Columns filters out columns so that the columns are not used in subsequent workflow calculations.
For example, you might want to filter out or ignore columns that have more than 94 percent Null values.
Optionally, you can identify important attributes.
Filter Columns requires analysis after it runs. The transformation makes recommendations. You can decide which recommendation to accept.
Filter Columns can run in parallel.
These topics describe Filter Columns nodes:
- Creating Filter Columns Node
You create a Filter Columns node to filters out columns so that the columns are not used in subsequent workflow calculations. - Editing Filter Columns Node
In the Edit Filter Columns Node dialog box, you can define or edit the filter performed by the Filter Columns node. - Filter Columns Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Filter Columns Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Related Topics
Parent topic: Transforms Nodes
Creating Filter Columns Node
You create a Filter Columns node to filters out columns so that the columns are not used in subsequent workflow calculations.
Related Topics
Parent topic: Filter Columns Node
Editing Filter Columns Node
In the Edit Filter Columns Node dialog box, you can define or edit the filter performed by the Filter Columns node.
For supervised cases, the icon next to the attribute name indicates that it is the target attribute. Also, in the top right corner, the target attribute is indicated by this icon. You can perform the following tasks:
-
View attribute importance: Select Show Attribute Importance to view the attribute importance in the table columns. The information is displayed in the columns Rank and Importance.
-
View data quality: Select Show Data Quality to display Filter Columns settings in terms of percent age of null values (% Null), percentage of values that are unique (% Unique), and percentage of constants (% Constants).
-
Exclude Columns: You can exclude columns when you first edit the Filter Columns node.
-
Edit or view Filter Columns settings: You can edit or view Filter Columns settings when you first edit the Filter Columns node.
-
Calculate important attributes. Click Settings to enable attribute importance.
-
Evaluate hints and decide which columns to filter out. Additional information is available in the form of Hints after the Filter Columns node runs.
-
Apply recommended settings to selected settings, click
-
Revert output settings. After applying recommended settings to another settings, if you want to revert to the original settings, then click
-
Explore pairwise dependency in the dataset: Click to view the pairwise dependency between attributes in the Explore Dependencies dialog box.
-
Edit dependency settings: Click to view and edit the dependency settings in the Predictor Dependencies dialog Box.
Note:
This option is enabled only after the node is run in Supervised mode.
- Exclude Columns
By default, all columns are selected for output. That is, all columns are passed to the next node in the workflow. - Define Filter Columns Settings
You can create and edit Filter Columns settings in the Define Filter Columns Settings dialog box. - Explore Dependencies
In the Explore Dependencies dialog box, you can view the pairwise dependency between two attributes. - Predictor Dependencies
You can view the dependency of the selected attribute with other attributes, and set them to be considered as output in the Predictor Dependencies window. - Performing Tasks After Running Filter Columns Node
Lists the tasks that you can perform after you run the Filter Columns node. - Columns Filter Details Report
When you run the Filter Columns node, the Columns Filter Details Report is generated in the Edit Column Details dialog box. - Attribute Importance
Oracle Data Miner ranks the attributes by significance in determining the target value.
Parent topic: Filter Columns Node
Exclude Columns
By default, all columns are selected for output. That is, all columns are passed to the next node in the workflow.
-
To exclude a column, click . The arrow is crossed out, indicated by . The excluded column is ignored, that is, it is not passed on.
-
To view or change settings, click Settings. The Define Filter Columns Settings dialog box opens.
Related Topics
Parent topic: Editing Filter Columns Node
Define Filter Columns Settings
You can create and edit Filter Columns settings in the Define Filter Columns Settings dialog box.
There are three kinds of settings:
-
Data Quality: Allows Filter Columns settings in terms of percent age of null values, percentage of values that are unique, and percentage of constants. The default values for Data Quality are specified in preferences. You can change the default. You can specify the following Data Quality criteria:
-
% Nulls less than or equal: Indicates the largest acceptable percentage of Null values in a column of the data source. You may want to ignore columns that have a larger percentage of Null values. The default value is
95
percent. -
% Unique less than or equal: Indicates the largest acceptable percentage of values that are unique in a column of the data source. If a column contains many unique values, then it may not contain useful information for model building. The default value is 95 percent.
-
% Constant less than or equal: Indicates the largest acceptable percentage of constant values in a column of the data source. If most of the values in a column is the same, then the column may not be useful for model building.
-
-
Attribute Importance: Enables you to build an attribute importance model to identify important attributes. By default, this setting is turned
OFF.
Filter Columns does not calculate Attribute Importance.-
Target: The value for which to find important attributes. Usually the target of a classification problem.
-
Importance Cutoff: A number between 0 and 1.0. This value identifies the smallest value for importance that you want to accept. If the importance of an attribute is a negative number, then that attribute is not correlated with the target, so the cutoff should be nonnegative. The default cutoff is 0. The rank or importance of an attribute enables you to select the attribute to be used in building models.
-
Top N: The maximum number of attributes. The default is
100.
-
Attribute Dependency: Select this option to generate pairwise dependency information. In case of supervised mode, you can modify the output columns that are used in the result. Attribute Dependency is selected by default if Attribute Importance is selected.
Note:
You must select Attribute Importance to generate Attribute Dependency.
Sampling (Data Quality and Attribute Importance): Enables you to select the number of rows, that could be system determined or user specified. The default values for sampling are specified in preferences. You can change the default or even turn off sampling. The default sample size is
10,000
records. -
Related Topics
Parent topic: Editing Filter Columns Node
Explore Dependencies
In the Explore Dependencies dialog box, you can view the pairwise dependency between two attributes.
Parent topic: Editing Filter Columns Node
Predictor Dependencies
You can view the dependency of the selected attribute with other attributes, and set them to be considered as output in the Predictor Dependencies window.
-
Columns
-
Importance
-
Weighted Dependency
-
Output: Indicates whether the attribute is considered as output or not. Click the green arrow to change it to so that the attribute is not considered as output.
-
Columns
-
Dependency
-
Output: Indicates if the column is used as an output or not. The green arrow indicates that the column is used in output, and the green arrow with a red cross indicates that it is not considered in the output.
Parent topic: Editing Filter Columns Node
Performing Tasks After Running Filter Columns Node
Lists the tasks that you can perform after you run the Filter Columns node.
You can perform the following tasks:
-
View hints: To view hints, double-click the Filter Columns node. The Edit Column Filter Details Node dialog box displays hints, to indicate attributes that did not pass the data quality checks. For more information, click .
-
Summary information is displayed about data quality.
-
Values are indicated graphically in the Data Viewer.
If you specified Attribute Importance, then:
-
Hints indicate attributes that do not have the minimum importance value.
-
The importance of each column is displayed.
-
-
Exclude columns: Go to the Output column for the attribute and click . The icon in the Output column changes to . The selected columns are ignored or excluded, which means that the columns are not for subsequent nodes. It is not necessary to run the nodes again.
-
Accept recommendations:
-
For several recommendations, select the attributes and click .
-
For all recommendations, type Ctrl+A and click .
-
-
Apply recommended output settings: Attributes that have Hints are not passed on. Attributes that have no hints are not changed and they are passed on.
-
Create a Table or View node: The output of this node is a data flow. To create a table containing the results, use a Create Table or View Node.
Related Topics
Parent topic: Editing Filter Columns Node
Columns Filter Details Report
When you run the Filter Columns node, the Columns Filter Details Report is generated in the Edit Column Details dialog box.
Columns of the grid summarize the data quality information.
The default settings show both Attribute Importance and Data Quality.
-
If Attribute Importance is selected, then following are displayed:
-
Rank
-
Importance
-
-
If Data Quality is selected, then the following columns are displayed:
-
% Null
-
% Unique
-
% Constant
-
The Hints column in the grid indicates the columns in the data set that do not pass data quality or do not meet the minimum importance value.
Bar graphs give a visual indication of values.
For example, if the percentage of Null values is larger than the value specified for % Nulls less than or equal, then a hint is generated that indicates that the percentage of Null values is exceeded. If the percent of NULL
values is very large for a column, then you may want to exclude that column.
Parent topic: Editing Filter Columns Node
Attribute Importance
Oracle Data Miner ranks the attributes by significance in determining the target value.
If a data set has many attributes, then it is likely that not all attributes contribute to a predictive model. Some attributes may simply add noise, that is, they actually detract from the predictive value of the model. You can then filter out attributes that are not important in determining the target value.
Using fewer attributes does not necessarily result in loss of predictive accuracy. Using too many attributes, can affect the model and degrade its performance and accuracy. Mining using the smallest number of attributes can save significant computing time and may build better models.
The following are applicable for Attribute Importance:
-
Attribute Importance is most useful with Classification.
-
The target for Attribute Importance in Filter Column should be the same as the target of the Classification model that you plan to build.
-
Attribute Importance calculates the rank and importance for each attribute.
-
The rank of an attribute is an integer.
-
The Importance of an attribute is a real number, which can be negative.
-
Specify these values for attribute importance:
-
Target: The value for which to find important attributes. Usually the target of a classification problem.
Note:
For Unsupervised Attribute Importance, the Target is set toNot Specified,
if the target is not specified by the user. -
Importance Cutoff: A number between 0 and 1.0. This value identifies the smallest value for importance that you want to accept. If the importance of an attribute is a negative number, then that attribute is not correlated with the target, so the cutoff should be nonnegative. The default cutoff is
0.
The rank or importance of an attribute enables you to select the attribute to be used in building models. -
Top N: The maximum number of attributes. The default is
100.
-
Select a Sample technique for the Attribute Importance calculation. The default is system determined. You can also select Stratified or Random.
System determined has a stratified cutoff value with a default value of
10.
-
If the distinct count of the selected column is greater than the cutoff value, then use random sampling.
-
If the distinct count of the selected column is less than or equal to the cutoff value, then use stratified sampling.
Certain combinations of target and sampling may result in performance problems. You are given a warning if there is a performance problem.
-
- Attribute Importance Viewer
To view an Attribute Importance model, build a Filter Columns node with Attribute Importance selected.
Parent topic: Editing Filter Columns Node
Attribute Importance Viewer
To view an Attribute Importance model, build a Filter Columns node with Attribute Importance selected.
Right-click the node and select View Data. Results are displayed in a new Filter Columns Details tab. The viewer has these tabs:
-
Attribute Importance: Lists those attributes with Importance greater than or equal to 0. Attributes are listed in order of rank from lowest rank (most important) to highest rank. The tab also displays the data type of each attribute. A blue bar indicates the rank. You can sort any of the columns by clicking the column heading.
-
To filter columns, that is, to limit the number of columns displayed, use .
-
To clear filter definition, click . You can also search by name, type, rank or importance.
-
-
Data: Lists the important attributes in order of importance, largest first. For each attribute rank and importance, values are listed. Only those attributes with an importance value greater than or equal to 0 are listed.
-
Columns: Displays the columns created by Attribute Importance, that is attribute name, rank, and importance value.
-
SQL: This is the SQL that generates the details.
Parent topic: Attribute Importance
Filter Columns Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
Filter Columns Node Properties has these sections:
Related Topics
Parent topic: Filter Columns Node
Filter Columns Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Right-click the Filter Columns node. The following options are available in the context menu:
-
Edit.
-
View Data.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed if there are errors.
-
Show Validation Errors. Displayed if there are validation errors.
Related Topics
Parent topic: Filter Columns Node
Filter Columns Details
The Filter Columns Details node creates a data flow that consists of the result of Attribute Importance.
For each attribute, the rank and importance values are listed.
Note:
Filter Columns Details must be connected to a Filter Columns Node that has Attribute Importance selected in Settings. Otherwise, the Filter Columns Details node is Invalid.
Filter Columns Details can run in parallel.
This section consists of the following topics:
- Creating the Filter Columns Details Node
You create a Filter Columns Details node to create a data flow that consists of the result of Attribute Importance. - Editing the Filter Columns Details Node
You can define or edit the filter performed by the Filter Columns node. - Filter Columns Details Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Filter Columns Details Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Related Topics
Parent topic: Transforms Nodes
Creating the Filter Columns Details Node
You create a Filter Columns Details node to create a data flow that consists of the result of Attribute Importance.
Related Topics
Parent topic: Filter Columns Details
Editing the Filter Columns Details Node
You can define or edit the filter performed by the Filter Columns node.
-
Exclude Columns: You can exclude columns when you first edit the Filter Columns node.
-
Edit or view Filter Columns settings: You can edit or view Filter Columns settings when you first edit the Filter Columns node.
-
Calculate important attributes: Click Settings to enable Attribute Importance.
-
Evaluate: Evaluate hints and decide which columns to filter out. Additional information.
Parent topic: Filter Columns Details
Filter Columns Details Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Filter Columns Node Properties has these sections:
Parent topic: Filter Columns Details
Filter Columns Details Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Right-click the Filter Columns Details node. The following options are available in the context menu:
-
Edit.
-
View Data
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed if there are errors.
-
Show Validation Errors. Displayed if there are validation errors.
Related Topics
Parent topic: Filter Columns Details
Filter Rows
A Filter Rows node enables you to select rows by specifying a SQL statement that describes the rows.
For example, to select all rows where CUST_GENDER
is F
, specify: CUST_GENDER = 'F'
You can either write the SQL expression directly or use Expression Builder.
Filter Rows can run in parallel.
This section consists of the following topics:
- Creating a Filter Rows Node
You create a Filter Rows node to select rows by specifying SQL statements, as applicable. - Edit Filter Rows
The Edit Filter Rows dialog box defines or edits the filter performed by the Filter Rows node. - Filter Rows Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Filter Rows Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Parent topic: Transforms Nodes
Creating a Filter Rows Node
You create a Filter Rows node to select rows by specifying SQL statements, as applicable.
Related Topics
Parent topic: Filter Rows
Edit Filter Rows
The Edit Filter Rows dialog box defines or edits the filter performed by the Filter Rows node.
The Edit Filter Rows dialog has two tabs:
- Filter
The filter is one or more SQL expressions that describe the rows to select. - Columns
The Columns tab lists the output columns.
Parent topic: Filter Rows
Filter
The filter is one or more SQL expressions that describe the rows to select.
To create or edit a filter:
- Open the Expression Builder by clicking .
- Write the SQL query to use for filtering.
- After specifying an expression, you can delete it. Select it and click .
- After you are done, click OK. Data Miner validates the expression.
Related Topics
Parent topic: Edit Filter Rows
Columns
The Columns tab lists the output columns.
You can filter in several ways.
Click OK when you have finished. Data Miner validates the expression.
Related Topics
Parent topic: Edit Filter Rows
Filter Rows Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.To view the properties of the Filter Rows node:
The Filter Rows node Properties tab has these sections:
-
Filter: The SQL expression created with Edit Filter Rows. You can modify the expression in the Properties by clicking .
-
Columns: The output data columns. For each column, name, alias (if any), and data types are listed.
-
Cache
-
Details
Related Topics
Parent topic: Filter Rows
Filter Rows Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Right-click the Filter Rows node. The following options are available in the context menu:
-
Edit
-
View Data
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed if there are errors.
-
Show Validation Errors. Displayed if there are validation errors.
Related Topics
Parent topic: Filter Rows
Join
A Join node combines data from two or more Data Source nodes into a new data source.
Technically, a Join node is a query that combines rows from two or more tables, views, or materialized views. For example, a Join node combines tables or views (specified in a FROM
clause), selects only rows that meet specified criteria (WHERE
clause), and uses projection to retrieve data from two columns (SELECT
statement).
Join can run in parallel.
This section contains the following topics:
- Create a Join Node
You create a Join node combines data from two or more Data Source nodes into a new data source. - Edit a Join Node
In the Edit Join Node dialog box, you can specify or change the characteristics of the models to build. - Join Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Join Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Related Topics
Parent topic: Transforms Nodes
Create a Join Node
You create a Join node combines data from two or more Data Source nodes into a new data source.
Note:
To materialize a join input as a table or view, connect it to a Create Table or View node.Related Topics
Parent topic: Join
Edit a Join Node
In the Edit Join Node dialog box, you can specify or change the characteristics of the models to build.
You can define a Join node in one of the following ways:
-
Either double-click the Join node, or right-click the node and select Edit. Click the Join tab.
-
Select the node. In the Properties pane, select the Join tab. Click .
In either case, the Edit Join Node dialog box opens.
- Edit Join Node
In the Edit Join Node dialog box, you can add columns, define filters and resolve issues related to the join specifications. - Edit Columns
The default setting is to use Automatic Settings for the list of columns displayed. - Edit Output Data Column
In the Edit Output Data Column dialog box, you can exclude columns from the output. - Resolve
In the Resolve dialog box, you can resolve issues related to join specifications which may become invalid due to different reasons.
Parent topic: Join
Edit Join Node
In the Edit Join Node dialog box, you can add columns, define filters and resolve issues related to the join specifications.
Click the Join tab if it is not displayed. In the Edit Join Node dialog box, you can perform the following tasks:
-
To add a new Join column, click . The Edit Join Column dialog box opens.
-
In the Edit Join Column dialog box, select the Data Sources—Source 1 and Source 2. You can search columns in either Source by Name or by Data Type.
-
Select one entry in Source 1 and the corresponding entry in Source 2.
-
Click Add. Data Miner selects an appropriate Join Type. Column 1 (from Source 1), Column 2 (from Source 2), and the Join type are displayed in a grid. You can search this grid by Column 1, Column 2, or Join Type.
-
After you are done, click OK.
-
-
To select the columns in the Join, click the Column tab to display Edit Columns dialog box.
-
To define a filter for the Join, select the Filter tab and enter an appropriate SQL expression. You can also use SQL Worksheet (part of SQL Developer) to write a filter.
If there are problems with the join, for example, if one of the data nodes is no longer connected to the Join node, then an information indicator is displayed as follows:
Click Resolve Issues. This opens the Resolve dialog box.
Related Topics
Parent topic: Edit a Join Node
Edit Columns
The default setting is to use Automatic Settings for the list of columns displayed.
To select columns, go to the Columns tab of Edit Joins Details in one of the following ways:
-
Right-click the Join node and select Edit. Then click Columns.
-
Select the Join node. In the Properties pane, click Columns.
To make changes, deselect Automatic Settings. You can perform the following tasks:
-
Edit the list of columns: Open the Edit Output Data Column dialog box and click .
-
Delete a column from the output: Select the column and click .
If the node was run, then you must run it again.
Related Topics
Parent topic: Edit a Join Node
Edit Output Data Column
In the Edit Output Data Column dialog box, you can exclude columns from the output.
The default setting is to include all columns from both tables in the output.
To remove a column from the output:
- Move the columns from the Selected Attributes list to the Available Attributes list.
- Click OK.
Parent topic: Edit a Join Node
Resolve
In the Resolve dialog box, you can resolve issues related to join specifications which may become invalid due to different reasons.
When a Data Source node is disconnected from a Join node, all the join specifications for that node are retained and are marked as Invalid. Before you run the Join node, you must resolve the issues. The Resolve dialog box provides two ways to resolve the join issues:
-
Remove: Removes all invalid entries from all specifications (Apply and Data).
-
Resolve: Displays a grid that enables you to associate an unassigned node with a missing node. The missing nodes are listed in the grid, and an action is suggested.
Parent topic: Edit a Join Node
Join Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.To view the properties of a Join node:
The Properties pane for a Join node has these sections:
-
Join: Defines the Join.
-
Columns: Displays the output columns of the Join. For each column, name, node, alias (if any) and data type are displayed. Up to 1000 columns are displayed.
-
Filter: Filters results by defining filter conditions using the Expression Builder. Open the Expression Builder by clicking .
-
Cache
-
Details
Related Topics
Parent topic: Join
Join Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Right-click the Join node. The following options are available in the context menu:
-
Edit
-
View Data
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed if there are errors.
-
Show Validation Errors. Displayed if there are validation errors.
Related Topics
Parent topic: Join
JSON Query
The support for the JSON data format in Oracle Data Miner (SQL Developer 4.1 and later ) is facilitated by the JSON Query node.
JSON or JavaScript Object Notation is a data format, that enables users to store and communicate sets of values, lists, and key-value mappings across systems.
The JSON Query node projects the JSON data format to the relational format. It supports only one input data provider node, such as Data Source node. You can perform the following tasks in the JSON Query node:
-
Select any JSON attributes in the source data to project it as relational data
-
Select relational columns in the source data to project it as relational data
-
Define aggregation columns on JSON data
-
Preview output data
-
Construct a JSON Query based on user specifications
Note:
JSON Query Node is supported on Oracle Database 12.1.0.2 and later.
- Create JSON Query Node
A JSON Query node should be connected to an input provider node, such as a Data Source node. - JSON Query Node Editor
In the Edit JSON Query Node dialog box, you can only operate on the input columns that are pseudo JSON types. - JSON Query Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - JSON Query Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node. - Data Types and their Supported Operators
Lists the data types and their supported operators for JSON data type.
Parent topic: Transforms Nodes
Create JSON Query Node
A JSON Query node should be connected to an input provider node, such as a Data Source node.
To run the nodes successfully, the input provide node must contain JSON data.
To create a JSON Query node:
Parent topic: JSON Query
JSON Query Node Editor
In the Edit JSON Query Node dialog box, you can only operate on the input columns that are pseudo JSON types.
To open the Edit JSON Query Node dialog box:
-
Double-click the JSON Query node, or
-
Right-click the node and click Edit.
The Edit JSON Query Node dialog box consists of the following tabs:
- JSON
In the JSON tab, you can select JSON data, specify filters on attributes and so on. - Additional Output
In the Additional Output tab, you can select relational columns in the source data for the output. - Aggregate
You can define aggregation column definitions on JSON attributes in the Aggregate tab. - Preview
You can preview the node output in the Preview tab.
Parent topic: JSON Query
JSON
In the JSON tab, you can select JSON data, specify filters on attributes and so on.
In the Columns drop-down list, only the input columns containing JSON data (pseudo JSON data types) are listed. You can also specify filters on attributes for the following data type: ARRAY, BOOLEAN, NUMBER and STRING. The filters are applied to the data in hierarchical order by using the logical operators as specified by you, such as Match All or Any. Select an input column from the drop-down list.
Click to set and apply filter settings. The Filter Settings dialog box opens.
The JSON tab consists of the following:
- Structure
In the Structure tab, the JSON data structure for the selected column is displayed. - Data
The Data tab displays the JSON data that is used to create the JSON structure. - Filter Settings
In the Filter Settings dialog box, you can specify filters on attributes for the data type: ARRAY, BOOLEAN, NUMBER and STRING.
Parent topic: JSON Query Node Editor
Structure
In the Structure tab, the JSON data structure for the selected column is displayed.
The structure or the Data Guide table must be generated in the parent source node, for example, Data Source node. If the structure is not found, then a message is displayed to communicate the same.
The following information about the data structure is displayed:
-
JSON Attribute: Displays the generated JSON structure in a hierarchical format. You can select one or more attribute to import. When you select a parent attribute, all child attributes are automatically selected.
-
JSON Data Type: Displays the JSON data types of all attributes, derived from JSON data.
-
Unnest: All attributes inside an array are unnested in a relational format. By default, the Unnest option is enabled. When the Unnest option for an array attribute is disabled, then:
-
The child attributes are displayed but cannot be selected.
-
If the array attribute is selected for output, then the output column will contain JSON representation of the array.
-
You can perform the following tasks in the Structure tab:
-
Set viewing preferences: In the View drop-down list, you can set your viewing preferences by clicking any of the following:
-
All: To view all attributes.
-
Only Selected: To view only the selected attributes.
-
Only with Filters: To view only those attributes, along with their parent attributes, that have filter definitions applied.
-
Selected and with Filters: To view only the selected attributes that have filter definitions.
-
-
Select attributes: To select one or more attribute, click the check box against the attribute.
Note:
When you select an attribute, the parent attribute is selected automatically. If you select a parent attribute, then all its children attributes are automatically selected. This is applicable if none of its immediate child attribute is part of the group selection.
-
Copy filter: Click to copy the filter of an attribute to local cache. You can then apply the copied filter to another attribute with the same data type by using the paste option.
Note:
This option is enabled only if the selected attribute has a filter definition.
-
Paste filter: After copying a filter from an attribute, click the attribute to which you want to paste the filter and click .
Note:
Attributes with compatible data type can accept the copied filter. For example, a filter that is copied from an attribute with NUMBER data type can be pasted to attributes with NUMBER data type only.
-
Clear filter: Select the attribute from which you want to remove the filter and click .
-
Edit filter: You can add or edit filter for any of the attribute type STRING, NUMBER, BOOLEAN, and ARRAY, by using the in place edit option. To edit or add a filter to an attribute:
-
Select the attribute and click . Alternately, select the attribute and double click the corresponding Filter column cell. The in place edit option for the selected attribute is enabled with the applicable operators listed in the drop-down list. Select an operator from the drop-down list.
-
Select the value from the corresponding field by clicking . The Find Values dialog box opens.
-
In the Find Values dialog box, select the values and click OK. The filter and the values are now visible in the JSON Query Node Editor dialog box, as displayed below:
To complete the editing, press the Enter key. To cancel editing, press the Esc key
-
Parent topic: JSON
Data
The Data tab displays the JSON data that is used to create the JSON structure.
In the text panel, the data is displayed in read-only mode. You can select text for copy and paste operations.
You can query the data to be viewed. To query data, click .
Parent topic: JSON
Filter Settings
In the Filter Settings dialog box, you can specify filters on attributes for the data type: ARRAY, BOOLEAN, NUMBER and STRING.
You can set the filter settings for:
-
Edit Filter Settings for:
-
All
-
Any
-
-
Apply Filter Settings to:
-
JSON Unnest: Applies filter to the JSON source data to be used for projection to relational data format. Only filtered data are projected.
-
Aggregations: Applies filter to the JSON data to be used for aggregation only.
-
JSON Unnest and Aggregations: Applies filter to both JSON unnest and data to be used for aggregation.
-
After you set the filter settings, click OK.
Parent topic: JSON
Additional Output
In the Additional Output tab, you can select relational columns in the source data for the output.
The input columns that are used by the aggregation definitions in the Aggregate tab are automatically added to the list for output.
You can perform the following tasks here:
-
Add relational columns: Click to add relational columns in the Edit Output Data Column Dialog.
-
Delete relational columns: Select the relational columns to delete and click .
- Edit Output Data Column Dialog
In the Edit Output Data Column dialog box, all the relational columns available in the data source are listed. You can select one or more columns to be added to the output.
Parent topic: JSON Query Node Editor
Edit Output Data Column Dialog
In the Edit Output Data Column dialog box, all the relational columns available in the data source are listed. You can select one or more columns to be added to the output.
To add columns:
- In the Available Attributes list, select the columns that you want to include in the output.
- Click the right arrow to move the attributes to the Selected Attributes list. To exclude any columns from the output, select the attribute and click the left arrow.
- Click OK. This includes the columns in the output, and they are listed in the Additional Output tab.
Related Topics
Parent topic: Additional Output
Aggregate
You can define aggregation column definitions on JSON attributes in the Aggregate tab.
The Aggregate tab displays the information in two sections:
-
Group By Attribute section: Here, the Group By attributes are listed, along with the attribute count. You can perform the following tasks:
-
View JSON Paths: Click JSON Paths to display the attribute name with context information. For example,
$."customers"."cust_id".
If not enabled, then only the attribute name is displayed. -
Edit and Add Attributes: Click to add Group By attributes in the Edit Group By dialog box.
-
Delete Attributes: Select the attributes to delete and click .
-
-
Aggregate Attributes section: Here, the aggregation columns are displayed along with the column count.
-
View JSON Paths: Click JSON Paths to display the attribute name with context information. For example,
$."customers"."cust_id".
If not enabled, then only the attribute name is displayed. -
Define Aggregations Columns: Click to define an aggregation column in the Add Aggregations dialog box.
-
Delete Aggregation column: Click to delete selected columns.
-
- Add Aggregations
In the Add Aggregation dialog box, you can define functions for JSON attributes. - Edit Sub Group By
In the Edit Sub Group By dialog box, you can add Sub Group By attributes to the selected JSON attribute. - Edit Group By
The Edit Group By dialog box displays the relational columns above the JSON attribute collection.
Parent topic: JSON Query Node Editor
Add Aggregations
In the Add Aggregation dialog box, you can define functions for JSON attributes.
The dialog box displays JSON structures in a hierarchical view. You can select multiple attributes and then apply an aggregation function to it.
Note:
Object and Array type attributes cannot be selected.
You can perform the following tasks:
-
Define aggregation functions:
-
Select the JSON attributes. You can select multiple attributes by pressing the Ctrl key and clicking the attributes for which you want to define functions.
-
Click to select and apply a function for the selected attributes. The applicable functions are listed. Select the function that you want to apply.
Alternately, click the corresponding row in the Function column. The applicable functions are listed in a drop-down list. Select the function that you want to apply. Using this option you can define function for only one attribute at a time.
-
Click OK.
-
-
Clear Aggregation Definition: Select the attribute and click . The defined function along with the output and Sub Group By entries are deleted.
-
Edit Sub Group By Elements: Select the attribute and click . The Edit Sub Group By dialog box opens.
-
Search: Click to find attribute based on partial attribute name.
Related Topics
Parent topic: Aggregate
Edit Sub Group By
In the Edit Sub Group By dialog box, you can add Sub Group By attributes to the selected JSON attribute.
To add attributes:
- In the upper pane, expand the Available Attributes folder.
- Select the attributes that you want to add as Sub Group By attributes. The selected attributes are listed in the lower pane, which also displays the count of attributes that are added.
- Click OK.
Parent topic: Aggregate
Edit Group By
The Edit Group By dialog box displays the relational columns above the JSON attribute collection.
You can add relational columns as part of the top level Group By. To add relational columns:
- In the upper pane, expand the Available Attributes folder.
- Select the columns that you want to add. The selected columns are listed in the lower pane.
- Click OK.
Parent topic: Aggregate
Preview
You can preview the node output in the Preview tab.
The output is displayed in two tabs:
Parent topic: JSON Query Node Editor
Output Columns
In the Output Column tab, the columns in the header are displayed in a grid format. Click JSON Paths to view source attribute name.
-
If you click JSON Paths, then the source attribute name along with the contextual information is displayed. For example,
$."customers"."cust_id".
-
If you do not click JSON Paths, then only the attribute name is displayed. For example,
cust_id.
The following details of the columns are displayed in the Output Columns tab:
-
Name: Displays the output column names
-
Data Type: Displays the data type of the output column
-
Data Source: Displays the source of the attribute name
-
JSON Paths: Displays the attribute source
-
Aggregate: Displays the aggregation function used for the aggregations
-
Group By: Displays the Group By attributes
-
Sub Group By: Displays the Sub-Group By attributes used in the aggregations
Parent topic: Preview
Output Data
In the Output Data tab, the query result of the top N rows is displayed. The query reflects the latest user specifications. The query result is displayed in a grid format.
Parent topic: Preview
JSON Query Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The JSON Query node Properties pane has these sections:
- Output
The Output section in the Properties pane displays the Output Columns in read-only mode. - Cache
The Cache section provides the option to generate cache for the output data. - Details
The Details section displays the name of the node and any comments about it.
Parent topic: JSON Query
Output
The Output section in the Properties pane displays the Output Columns in read-only mode.
Related Topics
Parent topic: JSON Query Node Properties
Cache
The Cache section provides the option to generate cache for the output data.
To generate cache output:
Parent topic: JSON Query Node Properties
Details
The Details section displays the name of the node and any comments about it.
You can change the name and add comments in the fields here:
-
Node Name
-
Node Comments
Parent topic: JSON Query Node Properties
JSON Query Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
The context menu for a JSON Query Node has these selections:
-
Edit
-
View Data
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
Related Topics
Parent topic: JSON Query
Data Types and their Supported Operators
Lists the data types and their supported operators for JSON data type.
Table 7-2 Data Types and their Supported Operators
Data Type | Supported Operators | Description |
---|---|---|
Array |
In |
Retrieves elements using specified indices, such as 0, 1, or index range such as 2:4, or a combination of both. |
< |
Retrieves elements that have indices less than the specified index. |
|
Boolean |
True |
Retrieves elements that matches the condition. |
False |
Retrieves elements that do not match the condition. |
|
Numbers |
In |
Retrieves numbers that are in the condition. |
Not In |
Retrieves numbers that are not included in the condition. |
|
= |
Retrieves numbers that are equal to the given condition. |
|
!= |
Retrieves numbers that are not equal to the given condition. |
|
> |
Retrieves numbers that are greater than the given condition. |
|
>= |
Retrieves numbers that are greater than or equal to the given condition. |
|
< |
Retrieves numbers that are lesser than the given condition. |
|
<= |
Retrieves numbers that are lesser than or equal to the given condition. |
|
String |
In |
Retrieves elements that are in the condition. |
Not In |
Retrieves elements that are not in the condition. |
|
Starts With |
Retrieves elements that starts with the string in the condition. |
|
Contains |
Retrieves elements that contains elements that matches in the condition. |
|
= |
Retrieves elements that are equal to the condition. |
|
!= |
Retrieves elements that are not equal to the condition. |
|
> |
Retrieves elements that are greater than the condition. |
|
>= |
Retrieves elements that are greater than or equal to the condition. |
|
< |
Retrieves elements that are lesser than the condition. |
|
<= |
Retrieves elements that are lesser than or equal to the condition. |
Parent topic: JSON Query
Sample
You can sample your data in the Sample tab.
A Sample node enables you to sample data in one of the following ways:
-
Random Sample: A sample in which every element of the data set has an equal chance of being selected.
-
Top N Sample: The default sample that selects the first N values.
-
Stratified Sample: A sample is created as follows:
-
First, the data set is divided into disjoint subsets or strata.
-
Then, a random sample is taken from each subsets.
This technique is used when the distribution of target values is skewed greatly. For example, the response to a marketing campaign may have a positive target value of 1% of the time or less.
-
Sampling nested data is best done with a Case ID. The Sample node can run in parallel.
This section has the following topics:
- Sample Nested Data
Sampling nested data may require a case ID. - Creating a Sample Node
You create a Sample node to create samples your data. - Edit Sample Node
In the Edit Sample Node dialog box, you can define and edit a sample. The settings describe the type of sample to create and the size of the sample. - Sample Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Sample Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Parent topic: Transforms Nodes
Sample Nested Data
Sampling nested data may require a case ID.
If you do not specify a case ID, then the sample operation may fail for a nested column that is very dense and deep. Failure may occur if the amount of nested data per row exceeds the maximum of 30,000 for a specific column or row.
A case ID also allows Data Miner to perform stratified sorts on data that is dense and deep.
Parent topic: Sample
Creating a Sample Node
You create a Sample node to create samples your data.
Related Topics
Parent topic: Sample
Edit Sample Node
In the Edit Sample Node dialog box, you can define and edit a sample. The settings describe the type of sample to create and the size of the sample.
To edit the settings for the Sample node:
- Random
- Top N
- Stratified
- Custom Balance
The Custom Balance dialog box enables you to specify exactly how the selected column is balanced.
Parent topic: Sample
Random
For a random sample, specify the following:
-
Seed: The default seed is
2345
You can specify a different integer. -
Case ID (optional): Select a case ID from the drop-down list.
If you specify a seed and a case ID, then the sample is reproducible.
Parent topic: Edit Sample Node
Stratified
For a stratified sample, specify the following:
-
Column: Select the column for stratification.
-
Seed: Default seed=
12345
You can specify a different integer. -
Case ID (optional): Select a case ID from the drop-down list.
If you specify a seed and a case ID, then the sample is reproducible.
-
Distribution: Specify how the sample is to be created. There are three options:
-
Original: The distribution of the selected column in the sample is the same as the distribution in data source. For example, if the column GENDER has M as the value for 95 percent of the cases, then in the sample, the value of GENDER is M for 95% of the cases.
-
Balanced: The distribution of the values of the column is equal in the sample, regardless of the distribution in the data source. If the column is GENDER, and GENDER has two values M and F, then 50% of the time the value of GENDER is M.
-
Custom: You define how the values of the columns are distributed in the sample. You must run the node once before you define the custom distribution. Click Edit to open the Custom Balance dialog box.
-
The Stratified dialog box displays a histogram of the values of the selected column at the bottom of the window. To see more details, click View to display the Custom Balance dialog box.
Related Topics
Parent topic: Edit Sample Node
Custom Balance
The Custom Balance dialog box enables you to specify exactly how the selected column is balanced.
You must run the node to collect statistics before you create custom balance. After you run the node, edit it, select Custom Distribution and click View. The Custom Balance dialog opens:
You can either create custom entries for each value of the stratify attribute, or you can click Original or Balanced to provide a starting point. Click Reset to reset to the original values.
To create a custom value, select the attribute to change and click .
Change the value in the Sample Count column to the custom value. Press Enter. The new sample is displayed as output in the lower part of the screen. Change as many values as required. Click OK when you have finished.
Parent topic: Edit Sample Node
Sample Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Sample node Properties pane has these sections:
-
Settings: You can specify the following:
-
Sample Size: Select a sample size in terms of:
-
Percent. Default=60%
-
Number of Rows. The default number of rows is 2000.
-
-
Sample Type: The options are:
-
Random (Default)
-
Stratified
-
Top N
-
-
Seed: The default seed is 12345. You can specify a different integer.
-
Case ID: This is an optional field. Select a case ID from the drop-down list. If you specify a seed and a case ID, then the sample is reproducible.
-
-
Cache
-
Details
- Cache
The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference. - Details
The Details section displays the name of the node and any comments about it.
Related Topics
Parent topic: Sample
Cache
The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference.
You can perform the following tasks:
-
Generate Cache of Output Data to Optimize Viewing of Results: Select this option to generate a cache. The default setting is to not generate a cache.
-
Sampling Size: You can select caching or override the default settings. Default sampling size is
Number of Rows
Default Value=2000
-
Related Topics
Parent topic: Sample Node Properties
Details
The Details section displays the name of the node and any comments about it.
You can change the name and comments in the fields here:
-
Node Name
-
Node Comments
Parent topic: Sample Node Properties
Sample Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Right click the Sample node. The following options are available in the context menu:
-
Edit
-
View Data
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed if there are errors.
-
Show Validation Errors. Displayed if there are validation errors.
Related Topics
Parent topic: Sample
Transform
A Transform node can use either sampled data or all data to calculate statistics.
You define transformation on a per-column basis. After you have defined a transformation, you can transform several columns in the same way. The Transform node can run in parallel. To use a Transform node, connect it to a data flow, that is a Data Source node or some other node such as a filtering node that produces attributes. Then select the attributes to transform.
- Supported Transformations
The transformations available depend on the data type of the attribute. For example, normalization cannot be performed on character data. - Support for Date and Time Data Types
Lists the supported data types for Transform nodes. - Creating Transform Node
You create a Transform node to define transformation and transform columns. - Edit Transform Node
You can define and edit the Transform node using the Edit Transform Node dialog box. - Transform Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node. - Transform Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Parent topic: Transforms Nodes
Supported Transformations
The transformations available depend on the data type of the attribute. For example, normalization cannot be performed on character data.
You define transformation on a per-column basis. After you have defined a transformation, you can transform several columns in the same way.
You can use these statistics as a guide to defining one of several transformation. The following transformations are supported:
- Binning
Binning is a transformation type that converts continuous variable into a categorical variable, continuous value into a continuous value, and categorical value into a categorical variable. - Custom
In the Custom dialog box, you can compute a new value for a field based on combinations of existing attributes and common function. - Missing Values
The Missing Values Transform enables you to specify how missing values are treated. - Normalization
Normalization consists of transforming numeric values into a specific range, such as [–1.0,1.0] or [0.0,1.0] such thatx_new = (x_old-shift)/scale
. Normalization applies only to numeric attributes. - Outlier
An Outlier is a data value that is not in the typical population of data, that is, extreme values. In a normal distribution, outliers are typically at least 3 standard deviations from the mean.
Related Topics
Parent topic: Transform
Binning
Binning is a transformation type that converts continuous variable into a categorical variable, continuous value into a continuous value, and categorical value into a categorical variable.
Binning converts the following:
-
A continuous variable into a categorical variable.
-
A continuous value into a continuous value. For example, age can be converted to 10 groups from 1 to 10.
-
A categorical value with many values into a categorical variable with fewer variables.
For example, salary is a continuous variable. If you divide salary into 10 bins, then you convert salary into a categorical variable with 10 values, where each value represents a salary range.
You can bin both numbers and character types VARCHAR2
and CHAR.
Parent topic: Supported Transformations
Recode
Oracle Data Miner does not support recode transformations. However, you can use custom binning to perform a recode transformation. For example, to recode the US states ME, NH, VT, CT, MA, and RI to the value NE, create a custom bin that puts the 5 states into a bin named NE.
Parent topic: Binning
Custom
In the Custom dialog box, you can compute a new value for a field based on combinations of existing attributes and common function.
Use Expression Builder to create the new attribute.
Related Topics
Parent topic: Supported Transformations
Missing Values
The Missing Values Transform enables you to specify how missing values are treated.
A data value can be missing for a variety of reasons:
-
If the data value was not measured, that is, it has a Null value.
-
If the data value was not answered.
-
If the data value was unknown.
-
If the data value was lost.
Data Mining algorithms vary in the way they treat missing values:
-
Ignore the missing values, and then omit any records that contain missing values.
-
Replace missing values with the mode or mean.
-
Infer missing values from existing values.
Parent topic: Supported Transformations
Normalization
Normalization consists of transforming numeric values into a specific range, such as [–1.0,1.0] or [0.0,1.0] such that x_new = (x_old-shift)/scale
. Normalization applies only to numeric attributes.
Oracle Data Miner enables you to specify the following kinds of normalization:
-
Min Max:: Normalizes each attribute using the transformation
x_new = (x_old-min)/(max-min)
-
Linear Scale: Normalizes each attribute using the transformation
x_new = (x_old-shift)/scale
-
Z-Score: Normalizes numeric attributes using the mean and standard deviation computed from the data. Normalizes each attribute using the transformation
x_new = (x-mean)/standard deviation
-
Custom: The user defines normalization.
Normalization provides transformations that perform min-max normalization, scale normalization, and z-score normalization.
Note:
You cannot normalize character data.
Parent topic: Supported Transformations
Outlier
An Outlier is a data value that is not in the typical population of data, that is, extreme values. In a normal distribution, outliers are typically at least 3 standard deviations from the mean.
You specify a treatment by defining what constitutes an outlier (for example, all values in the top and bottom 5 percent of values) and how to replace outliers.
Note:
Usually, you can replace outliers with null or edge values.
For example:
Mean of an attribute distribution=10
Standard deviation=5
Outliers are values that are:
-
Less than -5 (The mean minus 3 times the standard deviation)
-
Greater than 25 (The mean plus three times the standard deviation)
Then, in this case you can either replace the outlier -10 with Null or with 5.
Parent topic: Supported Transformations
Support for Date and Time Data Types
Lists the supported data types for Transform nodes.
The Transform Node provides limited support for these data types for date and time:
-
DATE
-
TIMESTAMP
-
TIMESTAMP_WITH_TIMEZONE
-
TIMESTAMP_WITH_LOCAL_TIMEZONE
You can bin date and time attributes using Equal Width or Custom binning. You can apply Statistic and Missing Value transformation with either Statistic or Value treatment.
Parent topic: Transform
Creating Transform Node
You create a Transform node to define transformation and transform columns.
Related Topics
Parent topic: Transform
Edit Transform Node
You can define and edit the Transform node using the Edit Transform Node dialog box.
This dialog box has two tabs:
-
Transformations
-
Statistics
In the Transformation tab, the statistics for each column is displayed. If you do not want to see statistics, then deselect Show Statistics.
Note:
You must run the node to see statistics.
You can perform the following tasks in the Transformations tab:
-
Define Transformation: Select one or more original columns, that is, columns that are not transformed. Click .
The Add Transform dialog box opens if you selected one or fewer columns. Otherwise, the Apply Transform Wizard opens.
-
Define Custom Transformation: Select one or more original columns, that is, columns that are not transformed. Click .
The Add Custom Transform dialog box opens. You can add custom transformations here.
The default behavior is to ignore the original column and to use the transformed column as output. The values displayed in the Output column are indicated by:
-
: For a column that is included
-
: For a column that is ignored
-
-
Change Values in the Output Column: Click the icon displayed in the Output column to edit the values in the Add Transform dialog box.
-
Edit Transformed Column: You can edit transformed columns only. For example, you can edit AGE_BIN, but not AGE. To edit transforms, select one or more transformed columns and click . The Edit Transform dialog box opens if you selected one or fewer columns.
-
Delete Transform: Select one or more transformed columns and click .
-
Filter Column: To limit the number of columns displayed, click . You can search by:
-
Output Column
-
Transform
-
Source Column
-
-
Clear filter definition: To clear filter definitions, click .
-
View effects of transform: To view the effect of a transformation:
-
Run the node.
-
After running of the node is complete, double-click the node.
-
Select the transformed column to see histograms that compare the original column with the transformed column.
-
When a column has a transformation applied to it, a new row in the list of columns is generated. Because each column must have a name, the name of the new row is based on the name of the old column and the type of transformation performed. Typically users want to transform a column and only have the output of the transform node contained in the new column. The original column has an option set to prevent it from being passed as one of the output columns. For example, if you bin AGE creating AGE_BIN, then AGE is not passed on and AGE_BIN is passed on.
- Add Transform
The Add Transform dialog box allows you to add custom transformation such as Binning, Missing Values, Normalization and so on, depending on the attribute of the data type. - Add Custom Transform
In the Add Custom Transform dialog box, you can define a custom transformation. - Apply Transform Wizard
The Apply Transform wizard enables you to define or edit transformations for several columns at the same time. - Edit Transform
If the node has run, then the Edit Transform dialog box displays information about both the untransformed column and transformed version - Edit Custom Transform
In the Edit Custom Transform dialog box, you can edit expressions using the Expression Builder.
Parent topic: Transform
Add Transform
The Add Transform dialog box allows you to add custom transformation such as Binning, Missing Values, Normalization and so on, depending on the attribute of the data type.
To add a transformation:
- Binning
Binning is a type of transformation. - Bin Equal Width (Number)
The Bin Equal Width selection determines bins for numeric attributes by dividing the range of values into a specified number of bins of equal size. - Bin Quantile
The Bin Quantile selection divides attributes into bins so that each bin contains approximately the same number of cases. - Bin Top N
The Bin Top N type bins categorical attributes. The bin definition for each attribute is computed based on the occurrence frequency of values that are computed from the data. - Custom
Custom binning enables you to define custom bins. - Missing Values
Missing Values is a transformation type that replaces missing values with an appropriate value. - Normalization
Normalization consists of transforming numeric values into a specific range, such as [–1.0,1.0] or [0.0,1.0] so that x_new = (x_old-shift)/scale. - Outlier
An outlier is a data value that does not come from the typical population of data. In other words, it is an extreme value. - Use Existing Column
The Use Existing Column option is available only when there is at least one transformation. - Add or Edit Multiple Transforms
You can define or edit transformations for several column at a time. You can also apply an existing transformation to one or more columns.
Parent topic: Edit Transform Node
Binning
Binning is a type of transformation.
You can use Binning to:
-
Transform a continuous variable to a discrete one.
-
Transform a variable with large number of discrete values to one with a smaller number of discrete values.
The default transformation type is Binning.
The types of binning supported depend on the data type of the column:
-
For numeric data type
NUMBER,
the supported types of binning are:-
Bin Equal Width (Number) (Default)
-
-
For categorical data type
VARCHAR2,
the supported types of binning are: -
For date and time data types
DATE, TIMESTAMP, TIMESTAMP WITH LOCAL TIMEZONE,
andTIMESTAMP WITH TIMEZONE,
the supported types of binning are:-
Bin Equal Width (Number) (Default)
-
Note:
The number of bins must be two.
Parent topic: Add Transform
Bin Equal Width (Number)
The Bin Equal Width selection determines bins for numeric attributes by dividing the range of values into a specified number of bins of equal size.
Edit the following fields:
-
Bin Count: You can change the Bin Count to any number greater than or equal to 2. The default count is set to
10.
-
Bin Label: Select a different Bin Label scheme from the list. The default is set to
Range.
Click OK when you have finished.
Parent topic: Add Transform
Bin Quantile
The Bin Quantile selection divides attributes into bins so that each bin contains approximately the same number of cases.
Edit the following fields:
-
Bin Count: You can change the Bin Count to any number greater than or equal to 2. The default count is set to
10.
-
Bin Label: You can select a different Bin Label scheme from the list. The default is set to
Range.
Click OK when you have finished.
Parent topic: Add Transform
Bin Top N
The Bin Top N type bins categorical attributes. The bin definition for each attribute is computed based on the occurrence frequency of values that are computed from the data.
Specify N,
the number of bins. Each of the bins bin_1, …, bin_N
contains the values with the highest frequencies. The last bin_N
contains all remaining values.
You can change the Bin Count to any number greater than or equal to 3. The default count is set to 10.
Click OK when you have finished.
Parent topic: Add Transform
Custom
Custom binning enables you to define custom bins.
To define bins, click Bin Assignment and then modify the default bins. After you generate default bins, you can modify the generated bins in several ways:
-
Edit Bin Name: If it is a Range label.
-
Delete Bins: Select it and click .
-
Add Bins: Click .
-
Edit Bins: Select a bin and click .
- Bin Assignment
In the Bin Assignment dialog box, you can specify options for bin type, bin count, bin labels and so on for different data types. - Edit Bin
The data type of the attribute determines the way to edit bins. - Add Bin
You can add bins for Categorical and Numerical data types.
Parent topic: Add Transform
Bin Assignment
In the Bin Assignment dialog box, you can specify options for bin type, bin count, bin labels and so on for different data types.
To assign bins, select the following options:
-
Binning Type: The default type depends on the data type of the attribute that you are binning:
-
If the data type of the attribute is Number, then the default binning type is
Bin Equal Width.
-
If the data type of the attribute is Character, then the default binning type is
Bin Top N.
You can change the binning type for numbers.
-
-
Bin Count: The default count is
10.
You can change this to any integer greater than2.
-
Bin Labels: The default label for numbers is
Range.
You can change the bin label toNumber.
-
Transform NULLs: If the Transform NULLs check-box is selected for a binning transformation that produces NUMBER data type, then null values are placed into the last bin. For example, if the AGE column has null values and Equal Width Binning was requested with Bin Labels value equal to Number, and the number of bins is 10, then null values will be in bin number 11. For this option, the following conditions apply:
-
When deselected, null values are excluded from the generated transformation SQL.
Note:
Applicable only for those binning transformations that produce VARCHAR2 data type after transformation.
-
This field is not editable for those binning transformations which produce numeric data type after transformation.
-
For legacy workflows, this field is selected by default, and the corresponding field contains the value
Null bin.
-
Click OK when you have finished. You return to the Custom display where you modify the generated bins.
Parent topic: Custom
Edit Bin
The data type of the attribute determines the way to edit bins.
You can edit bins for number and characters in the following ways:
-
For numbers: Edit lower bounds in the grid. You cannot edit a bin without a lower bound. You cannot add a value that is less than the prior bin lower bound value or greater than the following bin lower bound value.
-
For characters: The Edit Custom Categorical Bins dialog box has two columns:
-
Bins: You can add bins, delete a selected bin and change the name of the selected bin.
-
Bin Assignment: You can delete values for the selected Bin.
-
Click OK when you have finished editing bins. If you are editing custom bins for categorical, then first click OK twice, once to closed the Edit Custom Categorical Bins dialog box.
Parent topic: Custom
Add Bin
You can add bins for Categorical and Numerical data types.
To add bins:
-
Categoricals: Open the Edit Custom Categorical Bins and click . The new bin has a default name that you can change. Add values to the bin in the Bin Assignment column.
-
Numericals: Select a bin and click . You can change the name of the bin and add a range of values.
Parent topic: Custom
Missing Values
Missing Values is a transformation type that replaces missing values with an appropriate value.
To specify a Missing Values transformation:
Parent topic: Add Transform
Normalization
Normalization consists of transforming numeric values into a specific range, such as [–1.0,1.0] or [0.0,1.0] so that x_new = (x_old-shift)/scale.
Normalization usually results in values whose absolute value is less than or equal to 1.0.
Note:
Normalization applies only to numeric columns. Therefore, you can normalize numeric attributes only.
To normalize a column:
Parent topic: Add Transform
Outlier
An outlier is a data value that does not come from the typical population of data. In other words, it is an extreme value.
In a normal distribution, outliers are typically at least 3 standard deviations from the mean. Outliers are usually replaced with values that are not extreme, or they are replaced with Null.
Note:
You can define outlier treatments for numeric columns only.
To define an Outlier transformation:
Parent topic: Add Transform
Use Existing Column
The Use Existing Column option is available only when there is at least one transformation.
This selection is used when you add, or edit multiple transformation.
Related Topics
Parent topic: Add Transform
Add or Edit Multiple Transforms
You can define or edit transformations for several column at a time. You can also apply an existing transformation to one or more columns.
To add or edit transformations for multiple transforms:
-
Double-click the Transform node. The Transformation Editor opens.
-
To define the same transform for several columns, select the columns. You can select columns with different but compatible data types. For example, CHAR and VARCHAR are characters, and are compatible data types. If there are no transformations that apply to all the columns, then you get a message. Click .
The Apply Transform Wizard opens.
-
Select the Transform type to apply to all columns.
-
Provide the specific details related to the transform type that you have selected.
-
Click Next.
-
Click Generate Statistic.
-
Click Finish.
-
-
If you have already transformed a column, then you can define the same transformation for several other columns.
Suppose you have binned AGE creating
AGE_BIN.
To bin several columns in the same way, select AGE and the columns that you want to bin in the same way. Click .The Apply Transform Wizard opens.
-
For Transform Type, select Use Existing AGE_BIN is listed as the Transformed column. You cannot change any other values.
-
Click Next. You can change the names of the output columns.
-
Select Generate Statistic on Finish.
-
Click Finish.
-
-
To edit several transformations at the same time, select the transformations and click .
The Apply Transform Wizard opens. Edit the transformation and click Finish.
Related Topics
Parent topic: Add Transform
Add Custom Transform
In the Add Custom Transform dialog box, you can define a custom transformation.
The default name for the new attribute is EXPRESSION.
You can change this name.
In the Add Custom Transform dialog box, you can perform the following tasks:
-
Add an expression: Click . The Expression Builder opens. Use expression build to define an expression.
-
Validate the expression.
-
Click OK.
-
-
Edit a custom transformation.
-
Delete a custom Transformation: Click .
Related Topics
Parent topic: Edit Transform Node
Apply Transform Wizard
The Apply Transform wizard enables you to define or edit transformations for several columns at the same time.
The first step of the wizard is similar to the Add Transform dialog box. You cannot select custom transformations.
- Define Columns
The second step of the Apply Transform wizard enables you to specify names for the transformed columns. You can accept the names or change them.
Related Topics
Parent topic: Edit Transform Node
Define Columns
The second step of the Apply Transform wizard enables you to specify names for the transformed columns. You can accept the names or change them.
The default is not to generate statistics on finish. Select the check box to generate statistics.
Click Finish when you have finished.
Parent topic: Apply Transform Wizard
Edit Transform
If the node has run, then the Edit Transform dialog box displays information about both the untransformed column and transformed version
The Edit Transform dialog box is similar to the Add Transform dialog box.
The Edit dialog box displays information about both the untransformed column and transformed version in the following tabs:
-
The Histogram tab shows histogram for both the untransformed attribute and the transformed in two sets of histograms. On the left side of the tab are histograms for the untransformed column. On the right side of the tab are histograms for the transformed column.
-
The Statistics tab shows statistics for the transformed data and for the original data.
Note:
When you transform data, the transformed data may have a different data type from the type of the original data. For example, AGE has type NUMBER
and AGE_BIN has type VARCHAR2.
Related Topics
Parent topic: Edit Transform Node
Edit Custom Transform
In the Edit Custom Transform dialog box, you can edit expressions using the Expression Builder.
To edit an expression:
- Select the attribute and click . The Expression Builder opens.
- Use the Expression Builder to modify the expression.
- Validate the expression.
- Click OK.
To delete an expression, click .
Related Topics
Parent topic: Edit Transform Node
Transform Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Transform node Properties pane has these sections:
-
Transform: Specifies how the transformations are defined. You can modify these values.
The transformations are summarized in a grid. For each column name (data) type, transform, and output are displayed. If you bin
AGE
creatingAGE_BIN
, thenAGE
is not used as output, that is, is not passed to subsequent nodes. -
Histogram: Specifies the number of bins used in histograms. You can specify a different number of bins for histograms created for numeric, categorical, and date data types. The default is
10
bins for all data types.
Related Topics
Parent topic: Transform
Transform Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Right click the Transform node. The following options are available in the context menu:
-
Edit
-
View Data
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed if there are errors.
-
Show Validation Errors. Displayed if there are validation errors.
Related Topics
Parent topic: Transform