5 Data Nodes
Data nodes specify data for a mining operation, to transform data, or to save data to a table. The input for Oracle Data Mining operations is a table or view in Oracle Database or an Oracle Data Miner node that is part of a data flow.
The Data nodes are available in the Data section in the Components pane. The following Data nodes enable you to specify data in a workflow and to create and modify tables:
- Create Table or View Node
The Create Table or View node is a type of node that enables you to save the results in a table in the schema to which you are connected. - Data Source Node
A Data Source node defines source data for the workflow. For example, a Data Source node specifies the build data for a model. - Explore Data Node
The Explore Data node provides the profile of any input data source. Explore Data Statistics can either be based on all data or on a sample of the data. - Graph Node
A Graph node creates a two-dimensional graph of numeric data. - SQL Query Node
SQL Query node writes SQL queries to perform special data preparation. Use the SQL Query node to provide input for a model build. - Update Table Node
An Update Table node updates an existing table with selected columns from the data input to the node. Input columns are mapped to the existing table columns. - Target Values Selection
The Target Values Selection dialog box displays the number of target values selected. The default setting isAutomatic.
Create Table or View Node
The Create Table or View node is a type of node that enables you to save the results in a table in the schema to which you are connected.
For example, use a Create Table or View node to save the results of an Apply node to a table. The Create Table node automatically uses compression when possible. The Create Table node can run in parallel.
The benefits of Create Table or View node are:
-
Data Persistence: A Create Table or View node saves the data that flow into the node as a view or table in the database. If you create a table, then the actual data persists. If you create a view, then the SQL definition (full lineage) persists. Output from this node is the data provided by the view or table.
-
Performance Improvement: If you perform one or more complex transformations, such as Joins and Aggregations, and save the result of the transformations as a table, then the subsequent operations are faster. For example, you can perform an Aggregation and a Join, create a table that contains the results of the transformation, and then use the table as input for building the model. Therefore, you will create a table from the Join node. The Classification models are built against this table.
- Working with the Create Table or View Node
Lists the tasks you can perform using a Create Table or View node.
Related Topics
Parent topic: Data Nodes
Working with the Create Table or View Node
Lists the tasks you can perform using a Create Table or View node.
- Creating a Create Table or View Node
You create a Create Table or View node to save a data flow to a table or view. - Create Table Node and Compression
Oracle Data Miner creates tables in the Create Table node, and creates split data sets for testing in the Classification and Regression model build. - Edit Create Table or View Node
You can modify the operation of the Create Table or View node in the Edit Create Table or View dialog box. - Select Columns
By default, all columns are selected. You must select at least one attribute for the Create Table or View definition to be complete. - View Create Table or View Data
You can view the data of a node in the Data Source Node viewer. - Create Table or View Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node. - Create the Table or View Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Parent topic: Create Table or View Node
Creating a Create Table or View Node
You create a Create Table or View node to save a data flow to a table or view.
You can connect a Create Table or View node to any node that create a data flow, such as an Apply node. To create a Create Table or View node:
Related Topics
Parent topic: Working with the Create Table or View Node
Create Table Node and Compression
Oracle Data Miner creates tables in the Create Table node, and creates split data sets for testing in the Classification and Regression model build.
When Oracle Data Miner creates a table, and if Oracle Data Miner determines that the data does not have nested columns DM_NESTED_
*, then it uses compression to create the table.
Table compression does the following:
-
Saves disk space
-
Reduces memory use in the buffer cache
-
Speeds up the running of queries during reads
For the Create Table node, Oracle Data Miner also verifies that a primary key is not defined and no indexes are defined.
Parent topic: Working with the Create Table or View Node
Edit Create Table or View Node
You can modify the operation of the Create Table or View node in the Edit Create Table or View dialog box.
To edit a Create Table or View node:
- Edit Storage Settings
In the Edit Storage Settings dialog box, you can set the logging settings and data compression method.
Related Topics
Parent topic: Working with the Create Table or View Node
Edit Storage Settings
In the Edit Storage Settings dialog box, you can set the logging settings and data compression method.
To edit logging settings and data comprehension method:
-
Select Logging On to override any other settings specified as part of Parallel Processing settings.
-
In the Compression Level section, set the data compression method by selecting any one of the following options:
-
None: In this method, the data is not compressed.
-
Low: This method results in the best query performance.
-
Medium: This method optimizes the data for DML operations and compresses the IM column
-
High: This method results in excellent query performance.
-
Higher: This method results in good query performance.
-
Highest: This method results in a fair query performance.
-
Parent topic: Edit Create Table or View Node
Select Columns
By default, all columns are selected. You must select at least one attribute for the Create Table or View definition to be complete.
You can perform the following tasks:
-
Remove columns: Select the attribute in the Selected Attribute section and move it to the Available Attribute section. Click OK.
-
Add columns: Select the attribute in theAvailable Attribute section and move it to the Selected Attribute section. Click OK.
Parent topic: Working with the Create Table or View Node
View Create Table or View Data
You can view the data of a node in the Data Source Node viewer.
After the node runs successfully, right-click the node and select View Data. The data is displayed in the Data Source Node viewer.
Related Topics
Parent topic: Working with the Create Table or View Node
Create Table or View Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
You can perform the following tasks:
- Connect
Use the Connect option to link nodes in a workflow. - Run
Use the Run option to execute the tasks specified in the nodes that comprise the workflow. - Force Run
Use the Force Run option to rerun one or more nodes that are complete. - Create Schedule
Use the Create Schedule option to define a schedule to run a workflow at a definite time and date. - Edit
Use the Edit option to edit the default settings of a node. - View Data
Use the View Data option to view the data contained in a Data node. - View Models
Use the View Models option to view the details of the models that are built after running the workflow. - Generate Apply Chain
Use the Generate Apply Chain to create a new node that contains the specification of a node that performs a transformation. - Refresh Input Data Definition
Use the Refresh Input Data Definition option if you want to update the workflow with new columns, that are either added or removed. - Show Event Log
Use the Show Event Log option to view information about events in the current connection, errors, warnings, and information messages. - Deploy
Use the Deploy option to deploy a node or workflow by creating SQL scripts that perform the tasks specified in the workflow. - Show Graph
The Show Graph option opens the Graph Node Editor. - Cut
Use the Cut option to remove the selected object, which could be a node or connection. - Copy
Use the Copy option to copy one or more nodes and paste them into the same workflow or a different workflow. - Paste
Use the Paste option to paste the copied object in the workflow. - Extended Paste
Use the Extended Paste option to preserve node and model names while pasting them. - Select All
Use the Select All option to select all the nodes in a workflow. - Performance Settings
Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes. - Toolbar Actions
Use the Toolbar Action option to select actions in the toolbar from the context menu. - Show Runtime Errors
Use the Show Runtime Errors to view errors related to node failure during runtime. This option is displayed only when running of the node fails at runtime. - Show Validation Errors
Use the Show Validation Errors option to view validation errors, if any. - Save SQL
Use the Save SQL option to generate SQL script for the selected node. - Validate Parents
Use the Validate Parents option to validate all parent nodes of the current node. - Compare Test Results
Use the Compare Test Results option to view and compare test results of models that are built successfully. - View Test Results
Use the View Test Results option to view the test results of the selected model. This option is applicable only for Classification and Regression models. - Go to Properties
Use the Go to Properties option to open the Properties pane of the selected node. - Navigate
Use the Navigate option to view the links available from the selected node.
Related Topics
Parent topic: Working with the Create Table or View Node
Connect
Use the Connect option to link nodes in a workflow.
To connect nodes:
Related Topics
Parent topic: Create Table or View Context Menu
Run
Use the Run option to execute the tasks specified in the nodes that comprise the workflow.
The Data Miner server runs workflows asynchronously. The client does not have to be connected. You can run one or more nodes in the workflow:
-
To run one node: Right-click the node and select Run.
-
To run multiple nodes simultaneously: Select the nodes by holding down the Ctrl key and click each individual node. Then right-click any selected node and select Run.
If a node depends on outputs of one or more parent nodes, then the parent node runs automatically only if the outputs required by the running node are missing.
Parent topic: Create Table or View Context Menu
Force Run
Use the Force Run option to rerun one or more nodes that are complete.
Force Run deletes any existing models before building them once again.
To select more than one node, click the nodes while holding down the Ctrl
key.
You can Force Run a node at any location in a workflow. Depending on the location of the node in the workflow, you have the following choices for running the node using Force Run:
-
Selected Node
-
Selected Node and Children
(available if the node has child nodes) -
Child Node Only
(available if the node one or more child nodes) -
Selected Node and Parents
(available if the node has parent nodes)
Parent topic: Create Table or View Context Menu
Create Schedule
Use the Create Schedule option to define a schedule to run a workflow at a definite time and date.
To save the workflow schedule settings, click . You can provide a name for the schedule in the Save a Schedule dialog box.
Related Topics
Parent topic: Create Table or View Context Menu
Edit
Use the Edit option to edit the default settings of a node.
Nodes have default algorithms and settings. When you edit a node, the default algorithms and settings are modified. You can edit a node in any one of the following ways:
-
Edit nodes using the Edit dialog box
-
Edit nodes through Properties UI
Parent topic: Create Table or View Context Menu
View Data
Use the View Data option to view the data contained in a Data node.
The Data nodes are Create Table or View node, Data Source node, Explore Data node, Graph node, SQL Query node, and Update Table node.
Related Topics
Parent topic: Create Table or View Context Menu
View Models
Use the View Models option to view the details of the models that are built after running the workflow.
To view models, you must select a model from the list to open the model viewer. A model must be built successfully before it can be viewed.
Parent topic: Create Table or View Context Menu
Generate Apply Chain
Use the Generate Apply Chain to create a new node that contains the specification of a node that performs a transformation.
If you have several transformations performed in sequence, for example, Sample
followed by a Custom transform
, then you must select Generate Apply Chain for each transformation in the sequence.You must connect the individual nodes and connect them to an appropriate data source.
Generate Apply Chain helps you create a sequence of transformations that you can use to ensure that new data is prepared in the same way as existing data. For example, to ensure that Apply data is prepared in the same way as Build data, use this option.
The Generate Apply Chain option is not valid for all nodes. For example, it does not copy the specification of a Build node.
Parent topic: Create Table or View Context Menu
Refresh Input Data Definition
Use the Refresh Input Data Definition option if you want to update the workflow with new columns, that are either added or removed.
SELECT*
capability in the input source. The option allows you to quickly refresh your workflow definitions to include or exclude columns, as applicable.
Note:
The Refresh Input Data Definition option is available as a context menu option in Data Source nodes and SQL Query nodes.Parent topic: Create Table or View Context Menu
Show Event Log
Use the Show Event Log option to view information about events in the current connection, errors, warnings, and information messages.
Clicking the Show Event Log option opens the View and Event Log dialog box.
Related Topics
Parent topic: Create Table or View Context Menu
Deploy
Use the Deploy option to deploy a node or workflow by creating SQL scripts that perform the tasks specified in the workflow.
The scripts generated by Deploy
are saved to a directory.
Note:
You must run a node before deploying it.
You can generate a script that replicates the behavior of the entire workflow. Such a script can serve as the basis for application integration or as a light-weight deployment than the alternative of installing the Data Miner repository and workflows in the target and production system.
To deploy a workflow or part of a workflow:
Parent topic: Create Table or View Context Menu
Show Graph
The Show Graph option opens the Graph Node Editor.
All graphs are displayed in the Graph Node Editor.
Related Topics
Parent topic: Create Table or View Context Menu
Cut
Use the Cut option to remove the selected object, which could be a node or connection.
You can also delete objects by selecting them and pressing DELETE on your keyboard.
Parent topic: Create Table or View Context Menu
Copy
Use the Copy option to copy one or more nodes and paste them into the same workflow or a different workflow.
To copy and paste nodes:
Note:
Copying and pasting nodes do not carry any mining models or results from the original node.
Parent topic: Create Table or View Context Menu
Paste
Use the Paste option to paste the copied object in the workflow.
To paste an object, right-click the workflow and click Paste. Alternately, you can press Ctrl+V.
Note:
Node names and model names are changed to avoid naming collisions. To preserve names, use the option Extended Paste.
Related Topics
Parent topic: Create Table or View Context Menu
Extended Paste
Use the Extended Paste option to preserve node and model names while pasting them.
The default behavior of Paste is to change node names and model names to avoid naming collisions.
To go to the Extended Paste option, right-click the workflow and click Extended Paste. Alternately, you can press Control+Shift+V.
Note:
If model names are not unique, then the models may be overwritten when they are rebuilt.
Related Topics
Parent topic: Create Table or View Context Menu
Select All
Use the Select All option to select all the nodes in a workflow.
The selected nodes and links are highlighted in a dark blue border.
Parent topic: Create Table or View Context Menu
Performance Settings
Use the Performance Settings option to edit Parallel settings and In-Memory settings of the nodes.
If you click Performance Settings in the context menu, or if you click Performance Options in the workflow toolbar, then the Edit Selected Node Settings dialog box opens. It lists all the nodes that comprise the workflow. To edit the settings in the Edit Selected Node Settings dialog box:
-
Click Parallel Settings and select:
-
Enable: To enable parallel settings in the selected nodes in the workflow.
-
Disable: To disable parallel settings in the selected nodes in the workflow.
-
All: To turn on parallel processing for all nodes in the workflow.
-
None: To turn off parallel processing for all nodes in the workflow.
-
-
Click In-Memory Settings and select:
-
Enable: To enable In-Memory settings for the selected nodes in the workflow.
-
Disable: To disable In-Memory settings for the selected nodes in the workflow.
-
All: To turn on In-Memory settings for the selected nodes in the workflow.
-
None: To turn off In-Memory settings for all nodes in the workflow
-
-
Click to set the Degree of Parallel, and In-Memory settings such as Compression Method, and Priority Levels in the Edit Node Performance Settings dialog box.
If you specify parallel settings for at least one node, then this indication appears in the workflow title bar:
Performance Settings is either On for Selected nodes, On (for All nodes), or Off. You can click Performance Options to open the Edit Selected Node Settings dialog box.
-
Click to edit default the preferences for parallel processing.
-
Edit Node Default Settings: You can edit the Parallel Settings and In-Memory settings for the selected node in the Performance Options dialog box. You can access the Performance Options dialog box from the Preferences options in the SQL Developer Tools menu.
-
Change Settings to Default
-
Parent topic: Create Table or View Context Menu
Toolbar Actions
Use the Toolbar Action option to select actions in the toolbar from the context menu.
Current actions are Zoom In and Zoom Out.
Related Topics
Parent topic: Create Table or View Context Menu
Show Runtime Errors
Use the Show Runtime Errors to view errors related to node failure during runtime. This option is displayed only when running of the node fails at runtime.
The Event Log opens with a list of errors. Select the error to see the exact message and details.
Related Topics
Parent topic: Create Table or View Context Menu
Show Validation Errors
Use the Show Validation Errors option to view validation errors, if any.
This option is displayed only when there are validation errors. For example, if an Association node is not connected to a Data Source node, then select Show Validation Errors to view the validation error No build data input node connected.
You can also view validation errors by moving the mouse over the node. The errors are displayed in a tool tip.
Parent topic: Create Table or View Context Menu
Save SQL
Use the Save SQL option to generate SQL script for the selected node.
To generate SQL script for the selected node:
Related Topics
Parent topic: Create Table or View Context Menu
Validate Parents
Use the Validate Parents option to validate all parent nodes of the current node.
To validate parent nodes of a node, right-click the node and select Validate Parents.
You can validate parent nodes when the node is in Ready, Complete
and Error
state. All parent nodes must be in completed state.
Parent topic: Create Table or View Context Menu
Compare Test Results
Use the Compare Test Results option to view and compare test results of models that are built successfully.
For Classification and Regression models, this option displays the test results for all successfully built models to allow you to pick the model that best solves the problem.
Parent topic: Create Table or View Context Menu
View Test Results
Use the View Test Results option to view the test results of the selected model. This option is applicable only for Classification and Regression models.
The test results are displayed in the test viewer of the respective models:
Related Topics
Parent topic: Create Table or View Context Menu
Go to Properties
Use the Go to Properties option to open the Properties pane of the selected node.
Related Topics
Parent topic: Create Table or View Context Menu
Navigate
Use the Navigate option to view the links available from the selected node.
Note:
The Navigate option is enabled only if there are links to other nodes.Navigate displays the collection of links available from this node. Selecting one of the links selects the link and the selected link is highlighted in the workflow. The link itself has context menu options as well so you can right click and continue with the Navigate option. You can also use the arrow keys to progress to the next node.
Parent topic: Create Table or View Context Menu
Create the Table or View Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To manage the Create Table or View node from Properties pane:
- Table
Table displays the name of the Table or View. - Columns
Columns displays the columns of the table. - Automatic Behavior
If you select the Auto Input Column Selection option, then columns in the node are added or removed automatically, subject to certain scenarios.
Related Topics
Parent topic: Working with the Create Table or View Node
Table
Table displays the name of the Table or View.
You can perform the following tasks:
-
Change the name of the Table or View: If you change the default name of the table, then the name of the node changes to match the name of the table. For example, if you change the default name of the table to
PREDICTIONS,
then the name of the Create Table or View node also changes toPREDICTIONS.
-
Change the object type from table to view: The default type is
Table.
To create a view, click View.
Parent topic: Create the Table or View Node Properties
Columns
Columns displays the columns of the table.
The default setting enables automatic behavior. If you deselect Auto Input Column Selection, then you can manually select and edit the columns in the table. You can:
-
Delete columns: Select the column to delete and click .
-
Edit columns: Select the column and click . Edit the required changes in:
-
Select Column dialog box
-
Edit Data Guide dialog box for JSON data
-
Related Topics
Parent topic: Create the Table or View Node Properties
Automatic Behavior
If you select the Auto Input Column Selection option, then columns in the node are added or removed automatically, subject to certain scenarios.
If Auto Input Column Selection is selected, then the possible scenarios are:
-
Scenario 1: When input is connected, all columns in the input node are selected. The node becomes valid. It assumes that at least one column is included in the specification.
-
Scenario 2: When input is disconnected, all columns are automatically removed. The node becomes invalid.
-
Scenario 3: When the input node is edited, the possible edit scenarios are:
-
Add column: The column is added to the Create Table or View node, if it is compatible.
-
Remove column: The column is removed from the Create Table or View node.
-
Edit column: If a column is edited, then any change in the column data type could trigger an invalid state in the node.
-
Note:
If the Auto Input Column Selection option is deselected, then you must manually add and remove column specifications from the Create Table or View node.
Parent topic: Create the Table or View Node Properties
Data Source Node
A Data Source node defines source data for the workflow. For example, a Data Source node specifies the build data for a model.
Any table or view accessible to the user can be selected as the source. This node does not allow any input node connections. Data Source nodes rely on database resources for their definitions. It may be necessary to refresh a node definition if the database resources change. For example, if the resources are deleted or re-created.
Data Source nodes can run in parallel.
Only certain data types are allowed in a Data Source node. Columns with other data types are excluded.
- Supported Data Types for Data Source Nodes
Most of the basic Oracle data types are supported by the Data Source node. - Support for Date and Time Data
Data and time data types cannot be used for other functions in Oracle Data Miner. In particular, date and time data types cannot be the targets of model builds. - Working with the Data Source Node
Lists the tasks you can perform using the Data Source nodes. - Data Source Node Viewer
The Data Viewer contains information related to data, graphs, columns and SQL queries contained in the node. - Data Source Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Related Topics
Parent topic: Data Nodes
Supported Data Types for Data Source Nodes
Most of the basic Oracle data types are supported by the Data Source node.
Object-based data types can be included, but each object type has to be well understood. Object data types require storage clauses to be defined at appropriate levels with the object hierarchy.
These data types are fully supported:
-
VARCHAR2
-
CHAR
-
FLOAT
-
NUMBER
-
CLOB
-
NESTED_NUMERICALS
-
NESTED_CATEGORICALS
These data types are supported by Oracle Data Mining 12c Release 1 (12.1) and later:
-
BINARY_DOUBLE
-
BINARY_FLOAT
-
DM_NESTED_BINARY_DOUBLES
-
DM_NESTED_BINARY_DOUBLES
-
BLOB
, for Text only
The BINARY
data types and BLOB
are supported by Oracle Data Miner if you are connected to Oracle Data Mining 12c Release 1 and later.
These data types for date and time are partially supported:
-
DATE
-
TIMESTAMP
-
TIMESTAMP_WITH_LOCAL_TIMEZONE
-
TIMESTAMP_WITH_TIMEZONE
-
TIMESTAMP_WITH_LOCAL_TIMEZONE
Oracle Database 12.1.0.2 supports JSON data in the following data types. Oracle Data Miner derives pseudo JSON data types from these data types:
-
VARCHAR2
-
CLOB
-
BLOB
-
RAW
-
NCLOB
-
NVARCHAR2
-
RAW
-
ROWID
-
UROWID
-
URITYPE.
TheURITYPE
data type provides many sub type instances, which are also supported by Oracle Data Miner 18.4. They are:-
HTTPURITYPE
-
DBURITYPE
-
XDBURITYPE
-
Related Topics
Parent topic: Data Source Node
Support for Date and Time Data
Data and time data types cannot be used for other functions in Oracle Data Miner. In particular, date and time data types cannot be the targets of model builds.
The Transform transformation partially supports the data and time data types DATE
, TIMESTAMP
, TIMESTAMP_WITH_LOCAL_TIMEZONE
, and TIMESTAMP_WITH_TIMEZONE
, as follows:
-
Attributes with data and time data types can be binned using Equal Width or Custom binning.
-
You can apply Statistic or Value missing values treatments to attributes with the data and time data types.
Attributes with data and time types are analyzed by the Explore Data node.
Related Topics
Parent topic: Data Source Node
Working with the Data Source Node
Lists the tasks you can perform using the Data Source nodes.
- Create a Data Source Node
You create a Data Source node after creating a workflow. - Define a Data Source Node
If you import a workflow that uses tables or views that are not available, then you can use the Define a Data Source Node wizard to define a data source to replace the missing one. - Edit Data Guide
The Edit Data Guide dialog box allows you to specify how data guide should be generated for a selected JSON type column. - JSON Settings
In the JSON Parsing Settings dialog box, you can specify node settings that determine how data guides are generated. - Edit a Data Source Node
You can modify the operation of the Data Source node in the Edit Data Source Node dialog box. - Running a Data Source Node
To run a valid Data Source node, right-click the node and select Run. - Data Source Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Parent topic: Data Source Node
Create a Data Source Node
You create a Data Source node after creating a workflow.
To create a Data Source node and attach data to it:
- Edit Schema List
By default, tables and views from the schema to which you are connected, are listed.
Related Topics
Parent topic: Working with the Data Source Node
Edit Schema List
By default, tables and views from the schema to which you are connected, are listed.
To add other schemas:
-
If this is the first time that you are adding schemas, then click Add Schemas.
-
If you have already added schemas, then click Edit Schemas.
The Edit Schema List dialog box opens. Available Schemas is a list of the schemas to which you have access. To display tables and views in one of these schemas, move the name to Selected Schemas.
For example, to display tables in the SH schema, move SH to the Selected Schemas list. Click OK.
You return to the Define Data Source wizard. Select Include Tables from Other Schemas. Tables and views in the added schemas are listed. For example, if you selected SH, then you can see tables, such as SH.CUSTOMERS
, in the Available Tables/Views list. You can select these tables as data sources.
Parent topic: Create a Data Source Node
Define a Data Source Node
If you import a workflow that uses tables or views that are not available, then you can use the Define a Data Source Node wizard to define a data source to replace the missing one.
The wizard also detects input columns with JSON data types. You can use this interface to:
-
Define table and attributes for a new Data Source node.
-
Edit an existing data source node.
The wizard has two steps:
Related Topics
Parent topic: Working with the Data Source Node
Edit Data Guide
The Edit Data Guide dialog box allows you to specify how data guide should be generated for a selected JSON type column.
The Edit Data Guide dialog box has two tabs:
-
Structure: Displays the JSON data structure for the selected column
-
Data: Displays the content of the imported or generated Data Guide table
You can perform the following tasks:
-
Generate Data Guide: You can generate a new data guide table whenever the node is run or rerun. To generate a data guide, select any of the following options from the Data Guide Generation drop-down list:
-
Default: If the option
Generate Data Guide if necessary
in the JSON Settings dialog box is selected, then the data guide table is generated whenever the node is run or rerun. Otherwise, no data guide is generated. -
On: Select this option to generate a new data guide table whenever the node is run or rerun.
-
Off: Select this option to not generate a new data guide table whenever the node is run or rerun.
-
Import from Workflow: Select this option to import a data guide from an existing JSON type column defined in a Data Source node or Create Table node within the same workflow or from a different workflow using the Select Data Guide dialog box.
-
Import from file: Select this option to import a data guide from a CSV file. The process validates the data guide, that is, it verifies the correctness of the column header, JSON path format, JSON types, and so on during an import operation.
-
-
Remove Data Guide: To delete the current Data Guide table, click .
-
Export Data Guide: Select this option to export the current data guide table to a CSV file. If you find the generated data guide does not fully represent the underlying JSON data, then you have the option to export the data guide and add any missing JSON paths. You can then import the data guide back to replace the generated one.
- Select Data Guide
The Select Data Guide dialog box enables you to import a data guide table from an existing JSON type column defined in a Data Source node or Create Table node within the same workflow or from a different workflow.
Related Topics
Parent topic: Working with the Data Source Node
Select Data Guide
The Select Data Guide dialog box enables you to import a data guide table from an existing JSON type column defined in a Data Source node or Create Table node within the same workflow or from a different workflow.
Only completed nodes with generated JSON schema are displayed. To import a data guide table:
- In the Show field, select a workflow from the drop-down list.
- Select the nodes to import.
- Click OK.
Parent topic: Edit Data Guide
JSON Settings
In the JSON Parsing Settings dialog box, you can specify node settings that determine how data guides are generated.
A data guide is used whenever a JSON structure is present in the UI, for example, JSON Query node. Because the data guide table generation could be time consuming, especially for large JSON data, the following settings offers some control on the table generation:
-
Generate Data Guide if necessary: By default, this option is selected. It generates a data guide for the JSON type columns. Deselect this option if JSON data is not used in the product. Therefore, no Data Guide will be generated.
-
Sampling: Defines how many JSON documents stored in a column are to be processed, to generate the data guide table. The JSON document contains the entire content of the JSON column for a given row.
-
Max number of documents:
2000
(Default). Use the arrows to change this setting.
-
-
Limit Document Values to Process: Defines how many JSON values (Number, String, Boolean) are to be parsed in the document to generate the data guide table.
-
Max number per document:
10,000
(Default). Use the arrows to change this setting.
-
Parent topic: Working with the Data Source Node
Edit a Data Source Node
You can modify the operation of the Data Source node in the Edit Data Source Node dialog box.
To edit an existing Data Source node:
Related Topics
Parent topic: Working with the Data Source Node
Running a Data Source Node
To run a valid Data Source node, right-click the node and select Run.
The Oracle Data Miner server generates a sample of the selected table or view. The size of the table and type of sampling used is determined by the sample settings of the node.
If the node is complete but is required to provide data to a child node that is being run, then the node is run for validation to ensure that the columns and table still exist. If there are any errors, then the node state is set to Error
and affected attributes are set to Invalid.
Parent topic: Working with the Data Source Node
Data Source Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
Right-click the Data Source node. The following options are available in the context-menu:
-
Edit: Opens the Edit a Data Source Node dialog box.
-
Attributes: Opens the Select Attributes dialog box.
-
Run
-
View Data: Opens the Data Source Node Viewer dialog box.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
- Select Attributes
The Select Attributes dialog box enables you to move attributes between the Available Attributes list and the Selected Attributes list.
Parent topic: Working with the Data Source Node
Select Attributes
The Select Attributes dialog box enables you to move attributes between the Available Attributes list and the Selected Attributes list.
To deselect an attribute, move it to the Available Attributes list.
You can search the Available Attributes list for attributes.
Use the shuttle controls to move attributes between the lists.
When you have finished selecting attributes, click OK.
Parent topic: Data Source Node Context Menu
Data Source Node Viewer
The Data Viewer contains information related to data, graphs, columns and SQL queries contained in the node.
To view data, right-click the node and select View Data from the context menu. The Data Viewer opens.
Note:
You can view data only when the Data Source node is in the Valid
state.
The data viewer has these tabs:
- Data
The Data tab displays a sample of the data. - Graph
In the Graph tab, you can create graphs based on the numeric data. - Columns
The Column tab is a list of all the columns that are output from the node. - SQL
The SQL tab provides the SQL Details text area. The text area displays the SQL code that generates the data provided by the actual view shown in the Data tab.
Parent topic: Data Source Node
Data
The Data tab displays a sample of the data.
The Data Viewer provides a grid display of the rows of data either from the sampling, or pulled through the lineage of nodes back to the source tables.
You can perform the following tasks:
-
Refresh: Click to refresh the data.
-
View: Click View and select either Actual Data or Cached Data. Cached data is available only if you cache data in the Cache section of the Properties pane.
-
Sort: Click Sort to sort data according to applicable criteria. Displays the Select Column to Sort By dialog box.
-
Filter: In the Filter field, enter a
WHERE
clause to select data.
Select Column to Sort By
This dialog box enables you to:
-
Select multiple columns to sort
-
Determine the column ordering
-
Determine ascending or descending order by column
-
Specify
Nulls First
so that null values appear before real data values
The sort order is preserved until you clear it.
Column headers are also sort-enabled to provide a temporary override to the sort settings.
Parent topic: Data
Graph
In the Graph tab, you can create graphs based on the numeric data.
Related Topics
Parent topic: Data Source Node Viewer
Columns
The Column tab is a list of all the columns that are output from the node.
For each column, the Name, Data Type, Mining Type, Length, Precision and Scale (for floating point), and Column ID are displayed.
-
If the node has not run, then the table or view structure provided by the database is displayed.
-
If the node has run successfully, then the structure of the sample table is displayed, based on the sampling defined when the node was specified.
There are several filtering options that limit the columns displayed. The filter settings with the (or)/(and)
suffixes allow you to enter multiple strings separated by spaces. For example, if the Name/Data Type/Mining Type(or)
is selected, then the filter string A B
produces all columns where the name or the data type or the mining type starts with the letter A or B.
Parent topic: Data Source Node Viewer
SQL
The SQL tab provides the SQL Details text area. The text area displays the SQL code that generates the data provided by the actual view shown in the Data tab.
The SQL can be a stacked expression that includes SQL from parent nodes, depending on what lineage is required to access the actual data.
You can copy the SQL and run it in a suitable SQL interface. Select All (Ctrl+A) and Copy (Ctrl+C) are enabled.
The Search control is a standard search control that highlights matching text and searches forward and backward.
Parent topic: Data Source Node Viewer
Data Source Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties. The Properties pane for a Data Source node has these sections:
- Data
The Data section in Properties displays information related to the data and the data source, which can be a table or view. - Cache
The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference. - Details
The Details section displays the name of the node and any comments about it.
Related Topics
Parent topic: Data Source Node
Data
The Data section in Properties displays information related to the data and the data source, which can be a table or view.
The Data section consists of the following:
-
Source Table: Displays the name of the source table or view for the Data Source node. If no source table is associated with the node, then click … to the right of Source Table. A list of tables and views that are accessible from the data mining account is displayed. You can select the table or view. You can also use this process to change the table or view.
-
Data: Displays the attributes in a grid. For each attribute, the name, the alias, and the data type are displayed. You can perform the following tasks:
-
Create an alias for an attributed by entering the alias in the appropriate cell.
-
Filter attributes.
-
Delete attributes. Select the attribute and click .
-
Edit attributes. Select the attribute and click .
-
Select Attributes to include in the Data Source.
-
Refresh the node. Click .
-
Related Topics
Parent topic: Data Source Node Properties
Cache
The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference.
You can perform the following tasks:
-
Generate Cache of Output Data to Optimize Viewing of Results: Select this option to generate a cache. The default setting is to not generate a cache.
-
Sampling Size: You can select caching or override the default settings. Default sampling size is
Number of Rows
Default Value=2000
-
Related Topics
Parent topic: Data Source Node Properties
Details
The Details section displays the name of the node and any comments about it.
You can change the name and comments in the fields here:
-
Node Name
-
Node Comments
Parent topic: Data Source Node Properties
Explore Data Node
The Explore Data node provides the profile of any input data source. Explore Data Statistics can either be based on all data or on a sample of the data.
The Explore Data node enables you to do the following:
-
View common statistics and a histogram for each column. Optionally, you can select a Group By attribute and have multivariate view of histograms generated.
-
Generate an output flow containing the results of all the statistical analysis. You can connect any source of data input to an Explore Data node. For example, you can attach an Explore Data node to an Apply node.
-
Analyze attributes with all supported data types and the following data types for date and time:
-
DATE
-
TIMESTAMP
-
TIMESTAMP_WITH_TIMEZONE
-
TIMESTAMP_WITH_LOCAL_TIMEZONE
-
-
Run Explore Data node in parallel.
-
Create graphs.
-
Export the statistic generated by the Explore Data node using SQL Developer.
- Create an Explore Data Node
You create an Explore Data node and connect it to a data source to analyze the data in the data source. You can connect an Explore Data node to any data source. - Edit the Explore Data Node
You can view and edit information related to inputs and statistics in the Edit Explore Data Node dialog box. - Explore Data Node Viewer
The Explore Data Node viewer displays the statistics and other analyses performed by the Explore Data node. - Export Node Calculations
When an Explore node runs, it writes the statistics that it calculates to a database table. You can export the statistics calculated by the Explore node to a Microsoft Excel spreadsheet. - Perform Tasks from the Explore Data Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node. - Explore Data Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Related Topics
Parent topic: Data Nodes
Create an Explore Data Node
You create an Explore Data node and connect it to a data source to analyze the data in the data source. You can connect an Explore Data node to any data source.
To create an Explore Data node:
By default, all attributes in the data source are analyzed. You can specify specific attributes to analyze.
Related Topics
Parent topic: Explore Data Node
Edit the Explore Data Node
You can view and edit information related to inputs and statistics in the Edit Explore Data Node dialog box.
You can edit the Explore Data node by either double-clicking the node or by right-clicking the node and selecting Edit.
The Edit Explore Data Node editor has these tabs:
- Input (Explore)
In the Input tab, you can specify the attributes to analyze. By default, all attributes in the data source are analyzed. - Statistics (Explore)
The statistics tab contains a list of statistics available. For each statistic, there is a brief definition and an indication of the cost to calculate that statistic.
Parent topic: Explore Data Node
Input (Explore)
In the Input tab, you can specify the attributes to analyze. By default, all attributes in the data source are analyzed.
The default is to select no Group By attribute. Group By displays a sorted list of attributes that are limited to data types that can be binned by Explore Data. Selecting a Group By attribute enables you to analyze data based on the selected attribute. For example, suppose that the data contains AGE and GENDER as attributes. If you select AGE as the Group By attribute, then the histograms for GENDER show the age composition of each value of GENDER.
Change the list of attributes to analyze in either of these ways
-
Right-click the Explore Data node and select Edit. The Select Attributes dialog box opens.
-
Select the Explore Data node. The Explore tab in the Explore Data node Properties pane enables you to change the list of attributes.
- Select Attributes
In the Select Attribute dialog box, you can choose the attributes of the data source for which you want to view the statistics. By default, all attributes of the data source are selected.
Related Topics
Parent topic: Edit the Explore Data Node
Select Attributes
In the Select Attribute dialog box, you can choose the attributes of the data source for which you want to view the statistics. By default, all attributes of the data source are selected.
If you do not want to view statistics for an attribute, then move it from the Selected Attributes list to the Available Attributes list.
You can sort the lists by Name or by Data Type.
When you have finished, click OK.
Parent topic: Input (Explore)
Statistics (Explore)
The statistics tab contains a list of statistics available. For each statistic, there is a brief definition and an indication of the cost to calculate that statistic.
Click the Statistics tab to specify the statistics to calculate. You can also change statistics in the Statistics section of Explore node Properties pane.
If you change any of the default selections, then click Restore Defaults to change all selections to the default selection.
You can search for an individual statistic by name.
By default, Oracle Data Miner calculates these statistics:
-
Average
-
Distinct percent
-
Maximum
-
Median
-
Minimum
-
Mode sampled
-
Percent NULLs
-
Standard deviation
-
Variance
The default statistics have a low or medium cost to calculate.
Skewness and kurtosis have a high cost to calculate. They are not selected by default. You can select them if necessary.
Calculating mode using a sample is a low cost operation. Calculating the mode using all available data is a very high cost operation. To calculate mode using sample data or all available data, click Mode (Sampled) in the Edit Explore Data Node dialog box.
Mode (Sampled)
The default is to calculate mode using a sample of 2000 records. You can calculate the mode using all available data. This calculation has a very high cost.
To use all available data, click the Available Data option and click OK.
Parent topic: Statistics (Explore)
Explore Data Node Viewer
The Explore Data Node viewer displays the statistics and other analyses performed by the Explore Data node.
After the node runs successfully, you can view the data in the Explore Data Node viewer.
To view the data, right-click the node and select View Data. The viewer opens in a new tab.
The viewer displays the information in the following tabs:
- Statistics
The Statistics tab displays the statistics calculated by the Explore Data node. - Columns
The Column tab is a list of all the columns that are output from the node. - Data
The Data section in Properties displays information related to the data and the data source, which can be a table or view. - SQL
The SQL tab provides the SQL Details text area. The text area displays the SQL code that generates the data provided by the actual view shown in the Data tab.
Parent topic: Explore Data Node
Statistics
The Statistics tab displays the statistics calculated by the Explore Data node.
Attributes are listed the Statistics grid. For each attribute, the name, data type, histogram, and summary of the statistics are displayed.
To view a large version of the histogram for an attribute, select the attribute. The histogram is displayed below the grid. If the attribute has null values, then the histogram has a separate bin labeled Null bin
. The histogram may also contain an Others
bin consisting of values that are not NULL
but are not in any other bin.
For more information about the histogram, go to the large histogram and right-click the attribute name. These choices are available:
-
Copy to Clipboard: Copies the histogram to the Microsoft Windows clipboard. You can paste this histogram in to any rich editor, such as a word processor or image editor.
-
Save Image As: Exports the image to a PNG file that can be stored in the file system.
-
View Data: Displays the data used to create the histogram in a pop up window. You can search for attribute name, attribute value, or attribute percent. Click Close when you have finished.
Histograms created by Oracle Data Miner use a sample of the data and can be slightly different each time they are displayed.
For all data types, a histogram is created. The number of distinct values, the percentage of NULL
values, and the number of distinct values are displayed.
Additional statistics calculated depend on the data type of the attribute:
-
For character attributes
VARCHAR2,
the Mode is calculated. -
For numeric attributes
NUMBER
orFLOAT,
the following are calculated:-
Average
-
Minimum value
-
Maximum value
-
Standard deviation
-
Skewness (a measure of the asymmetry of the distribution)
-
Kurtosis (a measure of flatness or peakedness of the curve describing a frequency distribution in the region about its mode)
-
Parent topic: Explore Data Node Viewer
Columns
The Column tab is a list of all the columns that are output from the node.
For each column, the Name, Data Type, Mining Type, Length, Precision and Scale (for floating point), and Column ID are displayed.
-
If the node has not run, then the table or view structure provided by the database is displayed.
-
If the node has run successfully, then the structure of the sample table is displayed, based on the sampling defined when the node was specified.
There are several filtering options that limit the columns displayed. The filter settings with the (or)/(and)
suffixes allow you to enter multiple strings separated by spaces. For example, if the Name/Data Type/Mining Type(or)
is selected, then the filter string A B
produces all columns where the name or the data type or the mining type starts with the letter A or B.
Parent topic: Explore Data Node Viewer
Data
The Data section in Properties displays information related to the data and the data source, which can be a table or view.
The Data section consists of the following:
-
Source Table: Displays the name of the source table or view for the Data Source node. If no source table is associated with the node, then click … to the right of Source Table. A list of tables and views that are accessible from the data mining account is displayed. You can select the table or view. You can also use this process to change the table or view.
-
Data: Displays the attributes in a grid. For each attribute, the name, the alias, and the data type are displayed. You can perform the following tasks:
-
Create an alias for an attributed by entering the alias in the appropriate cell.
-
Filter attributes.
-
Delete attributes. Select the attribute and click .
-
Edit attributes. Select the attribute and click .
-
Select Attributes to include in the Data Source.
-
Refresh the node. Click .
-
Related Topics
Parent topic: Explore Data Node Viewer
SQL
The SQL tab provides the SQL Details text area. The text area displays the SQL code that generates the data provided by the actual view shown in the Data tab.
The SQL can be a stacked expression that includes SQL from parent nodes, depending on what lineage is required to access the actual data.
You can copy the SQL and run it in a suitable SQL interface. Select All (Ctrl+A) and Copy (Ctrl+C) are enabled.
The Search control is a standard search control that highlights matching text and searches forward and backward.
Parent topic: Explore Data Node Viewer
Export Node Calculations
When an Explore node runs, it writes the statistics that it calculates to a database table. You can export the statistics calculated by the Explore node to a Microsoft Excel spreadsheet.
To export the statistics:
The spreadsheet contains the statistics. It contains the names of the histograms generated by the Explore Data node.
To export an individual histogram, right-click the histogram and save the graphic.
Parent topic: Explore Data Node
Perform Tasks from the Explore Data Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
The context menu for an Explore Data node has these selections:
-
Edit: Opens the Select Attributes dialog box.
-
View Data: Opens the Data Source Node Viewer dialog box.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
Related Topics
Parent topic: Explore Data Node
Explore Data Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Explore Data node Properties pane has these sections:
- Input (Properties)
The Input section lists the attributes that are analyzed. - Statistics
The Statistics section lists the calculated statistics. - Output
The Output section lists the columns in the data source. - Histogram
Use bins to create histograms. The Histogram tab lists the default number of bins. - Sample
The data is sampled to support data analysis.
Parent topic: Explore Data Node
Input (Properties)
The Input section lists the attributes that are analyzed.
Attributes are listed in a grid. For each attribute, the name and the data type are displayed. To sort the list of attributes by name or data type, click the heading in the grid.
You can specify a Group By attribute. Select an attribute from the list.
You can either choose the default settings or select columns:
-
To select columns, deselect Auto Input Columns Selection.
-
To delete an attribute, select it and click .
-
To edit an attribute, click .
The Select Attributes dialog box opens.
Related Topics
Parent topic: Explore Data Node Properties
Statistics
The Statistics section lists the calculated statistics.
Related Topics
Parent topic: Explore Data Node Properties
Output
The Output section lists the columns in the data source.
The names and data type of each column is listed in a grid. You can search the grid by name (the default) or by data type.
To clear the search, click .
Parent topic: Explore Data Node Properties
Histogram
Use bins to create histograms. The Histogram tab lists the default number of bins.
The default number of bins for the following types of bins:
-
Numerical Bins
-
Categorical Bins
-
Date Bins
-
Null Values
-
Other Values
The default number of bins for all of these bin types is 10.
The default specifies a maximum number of bins.
Parent topic: Explore Data Node Properties
Sample
The data is sampled to support data analysis.
The default is to use a sample. The Sample tab has the following selections:
-
Use All Data: By default, Use All Data is deselected.
-
Sampling Size: The default is
Number of Rows
with a value of 2000. You can change sampling size toPercent
. The default is 60 percent.
Parent topic: Explore Data Node Properties
Graph Node
A Graph node creates a two-dimensional graph of numeric data.
A Graph node is not a data provider, which means that it cannot be connected to another node.
Note:
You cannot generate code from a Graph node.
You can perform the following tasks:
-
Create and edit one or more graphs of different types such as line, scatter, bar, histogram, and box.
-
Create graphs with actual data and sample data.
-
Run a Graph node in parallel.
- Types of Graphs
Graph node enables you to create two-dimensional graphs of numeric data in several ways. - Supported Data Types for Graph Nodes
Graph nodes supportNUMBER, FLOAT, DATE, AND TIMESTAMP
data types. - Graph Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node. - Create a Graph Node
You can create a Graph node to visualize numeric data and relationships between numeric variables. - New Graph
The New Graph dialog box defines a graph with a default name. - Graph Node Editor
The Graph Node Editor displays all defined graphs. You can modify an existing graph or add graphs to the node. - Edit Graph
You can edit a graph to change it current attributes and to specify a different graph type. - Graph Node Properties
Graph node properties is identified by the node name that you are viewing.In the Properties pane, you can examine and change the characteristics or properties of a node.
Related Topics
Parent topic: Data Nodes
Types of Graphs
Graph node enables you to create two-dimensional graphs of numeric data in several ways.
You can create the following types of graphs:
-
Line Plot: Uses a line to connect the data points. Line plots are useful for identifying if two variable are correlated. Oracle Data Miner supports two-dimensional line plots.
-
Bar Plot: Compares values. The height of the bar represents the measured value or frequency.
-
Histogram: Shows frequencies, as adjacent rectangles, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval.
-
Scatter Plot: Display values in two discrete sets. One variable determines the position on the horizontal axis and the other variable determines the position on the vertical axis.
-
Box Plot or Box Graph: Graphically depicts groups of numeric data using the quantiles of the data.
Graph nodes require you to specify one or more axes. Axes must consists of numeric data.
- Box Plot or Box Graph
A box plot graphically depicts groups of numeric data using the quantiles of the data.
Related Topics
Parent topic: Graph Node
Box Plot or Box Graph
A box plot graphically depicts groups of numeric data using the quantiles of the data.
The bottom and top of the box are the first and third quartiles, and the band inside the box is the second quartile (the median). In the type of box plot created by Data Miner, the whiskers show the minimum and maximum values of all the data.
Parent topic: Types of Graphs
Supported Data Types for Graph Nodes
Graph nodes support NUMBER, FLOAT, DATE, AND TIMESTAMP
data types.
The supported data types:
-
NUMBER
-
FLOAT
-
DATE
-
TIMESTAMP
-
TIMESTAMP_WITH_TIMEZONE
-
TIMESTAMP_WITH_LOCAL_TIMEZONE
Parent topic: Graph Node
Graph Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To view the context menu, right-click a Graph node. The context menu provides the following options:
-
Edit. Opens the Edit a Data Source Node dialog box.
-
Run.
-
Save SQL: This option is disabled. It indicates that you cannot generate SQL query for this node.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Validation Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed if there are validation errors.
- Running Graph Node
Run the Graph node to generate sample data. - Show Graph
The Show Graph option opens the Graph Node Editor.
Related Topics
Parent topic: Graph Node
Running Graph Node
Run the Graph node to generate sample data.
To run the Graph node, right-click the Graph node and click Run.If you do not generate sample data, then graphs are created using the data provided to the node.
Note:
If you import workflow that contains Graph nodes, then you must run the Graph node to view the graphs.
Parent topic: Graph Node Context Menu
Show Graph
The Show Graph option opens the Graph Node Editor.
All graphs are displayed in the Graph Node Editor.
Related Topics
Parent topic: Graph Node Context Menu
Create a Graph Node
You can create a Graph node to visualize numeric data and relationships between numeric variables.
To create a Graph node:
Related Topics
Parent topic: Graph Node
New Graph
The New Graph dialog box defines a graph with a default name.
You can change the name of the graph here.
Select the type of graph to create, and follow the steps to define it.
The graph that you define is displayed in a container. After you have created a graph, you can edit the definition. You can also add graphs, delete graphs, view the data used to construct a graph, and save the graph as a graphic.
- Line or Scatter
Line charts display continuous data over time that is set against a common scale. Scatter charts are commonly used for displaying and comparing numeric values, such as scientific, statistical, and engineering data. - Bar
A bar graph plots the values of a selected attribute (x-axis) against frequency counts (y-Axis). - Histogram
A histogram plots the values of a selected attribute along the x-axis against frequency counts along the y-axis. - Box
A box plot summarizes binned data of a selected attribute (X axis). It graphically depicts groups of numeric data using the quantiles of the data. - Settings (Graph Node)
You can provide settings specific to the data type of the selected attribute in the graph.
Related Topics
Parent topic: Graph Node
Line or Scatter
Line charts display continuous data over time that is set against a common scale. Scatter charts are commonly used for displaying and comparing numeric values, such as scientific, statistical, and engineering data.
To create a Line or Scatter:
Parent topic: New Graph
Bar
A bar graph plots the values of a selected attribute (x-axis) against frequency counts (y-Axis).
To specify a bar graph:
Related Topics
Parent topic: New Graph
Histogram
A histogram plots the values of a selected attribute along the x-axis against frequency counts along the y-axis.
To create a histogram:
Related Topics
Parent topic: New Graph
Box
A box plot summarizes binned data of a selected attribute (X axis). It graphically depicts groups of numeric data using the quantiles of the data.
The bottom and top of the box are the first and third quartiles, and the band inside the box is the second quartile (the median). In the type of box plot created by Data Miner, the whiskers show the minimum and maximum values of all the data.
To specify a box plot or box graph:
Related Topics
Parent topic: New Graph
Settings (Graph Node)
You can provide settings specific to the data type of the selected attribute in the graph.
Depending on the data type of the attribute selected, one of these dialog boxes is displayed when you click Settings.
- Select Values to Display
In the Select Values to Display dialog box, you can provide specific attribute values when defining a graph. - Axis Treatment
If you specify a numeric attribute for an axis or Group By attribute, then click Settings to display the Axis Treatment Settings dialog box.
Parent topic: New Graph
Select Values to Display
In the Select Values to Display dialog box, you can provide specific attribute values when defining a graph.
If you specify a categorical attribute for an axis or the Group By attribute, then click Settings to display the Select Values to Display dialog box.
The values of the attribute are listed in the Values column. Select values in one of the following:
-
All
-
None
-
Default
By default, the most frequent values using Top N is selected.
-
Select specific values by clicking the check box for the values.
You can search for values using the search box.
When you have finished, click OK to return to the definition of the graph.
Parent topic: Settings (Graph Node)
Axis Treatment
If you specify a numeric attribute for an axis or Group By attribute, then click Settings to display the Axis Treatment Settings dialog box.
The options in the Axis Treatment Settings dialog box are:
-
Raw Values (as is)
-
Binned Automatically: Use Equal width binning for the axis values. The default number of bins is
10.
You can change this value. The default is to not show null values. To show null values, click the check box.
Click OK to return to the definition of the graph.
Parent topic: Settings (Graph Node)
Graph Node Editor
The Graph Node Editor displays all defined graphs. You can modify an existing graph or add graphs to the node.
If at least one graph is defined for a Graph node, then double-click the Graph node to open the Graph Node Editor. If no graphs are defined, then the New Graph dialog box opens.
Each graph is in a container. The graph containers are laid out in a grid.
You can also zoom in to view details of a graph.
These icons are applicable to all graphs in the node:
-
To add a new graph, click to open the New Graph dialog box.
-
To refresh the display, click .
-
To select data used to create the graph, click View. The options are:
-
Actual Data: Displays the default unless you have run the Graph node. In this case, the node creates the graph using whatever data is provided.
-
Sample Data: Available only if you have run the Graph node to generate sample data.
-
The name of the new or existing graph is displayed at the top of the container for the graph, along with these controls:
-
To adjust the size of the graph (zoom in and out), click .
-
To edit the current graph, click . When you edit or add a graph, the results are automatically displayed in the editor.
-
To delete the current graph, click .
-
To examine specific values of the graph, zoom the graph.
- Zoom Graph
In the Graph node editor, you have the option to zoom in on selected values to view details. - Viewing Data used to Create Graph
In the Graph node editor, you have the option to save the graph as a graphic.
Related Topics
Parent topic: Graph Node
Zoom Graph
In the Graph node editor, you have the option to zoom in on selected values to view details.
To examine details of a graph, zoom in on the selected values. For example, to see more detail for selected x-axis values, draw a selection box with the mouse that encloses the values. The display shows values in the selection box in more detail and the axis expands. You can zoom in multiple times. When you have finished viewing the selected values, click the graph once for each time that you zoomed in to return to the original graph.
Parent topic: Graph Node Editor
Viewing Data used to Create Graph
In the Graph node editor, you have the option to save the graph as a graphic.
To view the data used to create the graph or to save the graph as a graphic:
Parent topic: Graph Node Editor
Edit Graph
You can edit a graph to change it current attributes and to specify a different graph type.
When you edit a graph, you can do the following:
-
Change the attributes of the existing graph type. For example, if you edit a line graph, then you can change the x-axis and y-axis or add a Group By attribute. You can change the axis treatment or values to display if the graph uses these items.
-
Specify a different graph type. You can change the type of graph. For example, you can change a line graph to a histogram. To change the type of the graph, click the button at the top of the window and specify the required information:
- Bar
A bar graph plots the values of a selected attribute (x-axis) against frequency counts (y-Axis). - Box
A box plot summarizes binned data of a selected attribute (X axis). It graphically depicts groups of numeric data using the quantiles of the data. - Histogram
A histogram plots the values of a selected attribute along the x-axis against frequency counts along the y-axis. - Line or Scatter
Line charts display continuous data over time that is set against a common scale. Scatter charts are commonly used for displaying and comparing numeric values, such as scientific, statistical, and engineering data.
Parent topic: Graph Node
Bar
A bar graph plots the values of a selected attribute (x-axis) against frequency counts (y-Axis).
To specify a bar graph:
Related Topics
Parent topic: Edit Graph
Box
A box plot summarizes binned data of a selected attribute (X axis). It graphically depicts groups of numeric data using the quantiles of the data.
The bottom and top of the box are the first and third quartiles, and the band inside the box is the second quartile (the median). In the type of box plot created by Data Miner, the whiskers show the minimum and maximum values of all the data.
To specify a box plot or box graph:
Related Topics
Parent topic: Edit Graph
Histogram
A histogram plots the values of a selected attribute along the x-axis against frequency counts along the y-axis.
To create a histogram:
Related Topics
Parent topic: Edit Graph
Line or Scatter
Line charts display continuous data over time that is set against a common scale. Scatter charts are commonly used for displaying and comparing numeric values, such as scientific, statistical, and engineering data.
To create a Line or Scatter:
Parent topic: Edit Graph
Graph Node Properties
Graph node properties is identified by the node name that you are viewing.In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
The Properties pane of a Graph node consists of these sections:
- Cache (Graph Node)
You have the option to generate cache of the output data. - Details
The Details section displays the node name and comments about the node. - Data
The Data section in Properties displays information related to the data and the data source, which can be a table or view.
Related Topics
Parent topic: Graph Node
Cache (Graph Node)
You have the option to generate cache of the output data.
After you generate sample data, the option Generate Cache of Output Data to Optimize Viewing Results is selected.
The default value is 2000
records. You can change this value.
Parent topic: Graph Node Properties
Details
The Details section displays the node name and comments about the node.
You can change the name of the node and edit the comments in this section. The new node name and comments must meet the requirements.
Related Topics
Parent topic: Graph Node Properties
Data
The Data section in Properties displays information related to the data and the data source, which can be a table or view.
The Data section consists of the following:
-
Source Table: Displays the name of the source table or view for the Data Source node. If no source table is associated with the node, then click … to the right of Source Table. A list of tables and views that are accessible from the data mining account is displayed. You can select the table or view. You can also use this process to change the table or view.
-
Data: Displays the attributes in a grid. For each attribute, the name, the alias, and the data type are displayed. You can perform the following tasks:
-
Create an alias for an attributed by entering the alias in the appropriate cell.
-
Filter attributes.
-
Delete attributes. Select the attribute and click .
-
Edit attributes. Select the attribute and click .
-
Select Attributes to include in the Data Source.
-
Refresh the node. Click .
-
Related Topics
Parent topic: Graph Node Properties
SQL Query Node
SQL Query node writes SQL queries to perform special data preparation. Use the SQL Query node to provide input for a model build.
You can do the following:
-
Manually enter a SQL query with the least amount of constraints possible.
-
Incorporate a broader array of methodologies as part of an Oracle Data Miner workflow. You can insert any SQL query as a source of data or as a transformation by using existing data.
-
Run Oracle R Enterprise scripts that are registered for the database.
-
Run parallel processes or parallel queries.
- Input for SQL Query Node
The input for a SQL Query node can be data provider nodes and model provider nodes. - SQL Query Restriction
SQL is limited to a query that returns a flow of data. - Create a SQL Query Node
You create a SQL Query node to write a SQL query. - SQL Query Node Editor
The SQL Query Node Editor enables you to define and validate a SQL query. - Oracle R Enterprise Script Support
Oracle R Enterprise, a component of the Oracle Advanced Analytics option, makes the open source R statistical programming language and environment ready for the enterprise and big data. - SQL Query Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node. - SQL Query Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Parent topic: Data Nodes
Input for SQL Query Node
The input for a SQL Query node can be data provider nodes and model provider nodes.
A SQL Query node requires the following input:
-
Zero to many data provider nodes, such as Data Source nodes and Transform nodes.
-
Zero to many input model provider nodes, such as Model Build nodes and Model nodes.
Data provider nodes are used as follows:
-
If there are no data provider nodes, then you define an originating Data Source node using an
SQL SELECT
statement that is not constrained by any input sources that must be defined within Oracle Data Miner. The statement can contain its own internal table references, such as:Select * from a, b where a.id = b.id
When there are no data provider nodes, the source tables or views are hidden from Data Miner. Code generation cannot parameterize such tables in the generated SQL script.
-
If there are one or more data provider nodes, then you can reference each data flow within the expression builder interface. You can continue to expose all data sources within the Oracle Data Miner workflow.
When a model provider node is connected as input, it enables you to see the list of model names contained in the node. This is useful in creating SQL that requires a model name.
Oracle Data Miner includes snippets that help you write SQL queries.
Related Topics
Parent topic: SQL Query Node
SQL Query Restriction
SQL is limited to a query that returns a flow of data.
A SQL Query node can provide data to any node that requires data, such as a node that builds a model.
Parent topic: SQL Query Node
Create a SQL Query Node
You create a SQL Query node to write a SQL query.
To create a SQL Query node:
Related Topics
Parent topic: SQL Query Node
SQL Query Node Editor
The SQL Query Node Editor enables you to define and validate a SQL query.
The editor provides specifications that can be dragged and dropped or double-clicked into the query build text area. The tabs on the left of the window provide help in writing a query:
-
Source is a list of input nodes that includes both Data Provider nodes and Model Provider nodes. When you select an item in the Source list, information appears in the Message box.
-
Snippets are standard SQL Developer Snippets (SQL code fragments) arranged by the type of calculation.
The snippet category Predictive Query helps write queries using the
DBMS_PREDICTIVE_ANALYTICS PL/SQL
package.For information about snippets, see Oracle SQL Developer User's Guide.
For a brief description of snippet functionality, move the cursor over the name. The information is displayed in a tool tip.
-
PL/SQL Functions is a list of PL/SQL functions by user schema.
-
R Script is a list of registered R Scripts; this tab is displayed if Oracle R Enterprise is installed.
Write SQL in the text area.
Below the text area are:
-
Columns—List the columns and data types.
-
Preview—Displays a small number of the rows returned by the query.
When you click OK or Validate, the following are verified:
-
The query generates at least one column of output.
-
All data types are data types that Oracle Data Miner supports. Most scalar data types are supported. The most common data types that are not supported are user custom object types.
If unsupported data types are found, then a table is created and an error message is displayed above the column panel. All unacceptable data types in the column list are marked with an invalid icon next to the column name.
-
If no validation errors occur during the parsing of the query, then the Columns tab shows data types of the columns and the Data tab shows a small sample of results.
If a validation error occurs, then the validation panel is displayed for the Column and Data tabs.
Related Topics
Parent topic: SQL Query Node
Oracle R Enterprise Script Support
Oracle R Enterprise, a component of the Oracle Advanced Analytics option, makes the open source R statistical programming language and environment ready for the enterprise and big data.
Oracle R Enterprise integrates R with Oracle Database. It is designed for problems that involve large amounts of data.
R users can develop, refine, and deploy R scripts that leverage the parallelism and scalability of the database to perform predictive analytics and data analysis.
The SQL Query node in Oracle Data Miner provides a simplified interface for integrating R scripts that have been registered with the database. This enables R developers to provide useful scripts for analyzing data.
You can run embedded R scripts that use theses interfaces:
-
rqEval
-
rqTableEval
-
rqRowEval
-
rqGroupEval
- Oracle R Enterprise Database Roles
The Oracle R Enterprise database roles are added to theOMDRUSER
role.
Parent topic: SQL Query Node
Oracle R Enterprise Database Roles
The Oracle R Enterprise database roles are added to the OMDRUSER
role.
The OMDRUSER
role includes both these two roles:
-
RQUSER
-
RQADMIN
If the RQUSER
and RQAMIN
roles are not available in the database configuration at the time of Oracle Data Miner repository installation, then the roles must be added by the DBA manually to the ODMRUSER
role after Oracle R Enterprise is installed.
You must register R scripts before you can use them. You can register scripts using SQL*Plus or SQL Worksheet, using connection with DBA privileges.
Registered R Scripts are listed on the R Scripts tab of the SQL Query node. There are also R code snippets, which are code fragments that help you write scripts. Some snippets are just syntax, and others are examples. You can insert and edit snippets when you are using SQL Worksheet or creating or editing R code using SQL Query node.
Parent topic: Oracle R Enterprise Script Support
SQL Query Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To see the context menu, right-click a Data Source node. The following options are available in the context menu:
-
Edit. Opens the SQL Query Node Editor.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed if there are validation errors.
Parent topic: SQL Query Node
SQL Query Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
SQL Query node Properties contains these sections:
Parent topic: SQL Query Node
Update Table Node
An Update Table node updates an existing table with selected columns from the data input to the node. Input columns are mapped to the existing table columns.
Update Table node can run in parallel. Update Table nodes rely on database resources for their definitions. It may be necessary to refresh a node definition if the database resources change. For example, if the resources are deleted or re-created.
- Input and Output for Update Table Node
The input for an Update Table node is any node that produces a data flow. You can connect only one node to an Update Table node. - Data Types for Update Table Node
Update Table node supports certain data types. It can also support other types, such as B, and support these additional types with manual mapping. - Create Update Table Node
You create an Update Table node to update an existing table with data from selected columns in the table. You can connect an Update Table node to any node that creates a data flow, such as an Apply node. - Edit Update Table Node
The Edit Update Table dialog box provides the specification for the Update Table node. - Update Table Node Data Viewer
After the node runs successfully, you can view the data in the Update Table Node Data Viewer. - Update Table Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node. - Update Table Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
Related Topics
Parent topic: Data Nodes
Input and Output for Update Table Node
The input for an Update Table node is any node that produces a data flow. You can connect only one node to an Update Table node.
The output of an Update Table node is a data flow. You can use the data flow as a data source. To save the output as a table or view, use a Create Table or View node.
Related Topics
Parent topic: Update Table Node
Data Types for Update Table Node
Update Table node supports certain data types. It can also support other types, such as B, and support these additional types with manual mapping.
You can manually map columns that do not have exact data type matches but the column data types have reasonable, safe implicit conversion default. For example, you can map BINARY_DOUBLE
to NUMBER,
or NVARCHAR2
to VARCHAR2.
There could be some loss of data for such mappings:
-
Mapping
BINARY_DOUBLE
orBINARY_FLOAT
toNUMBER
could result in the loss of precision -
Mapping
NVARCHAR2
andNCHAR
toVARCHAR2
can result in the loss of data becauseNVARCHAR2
andNCHAR
are based on a potentially different character set thanVARCHAR2.
For mappings to work, your database must be set up so thatNVARCHAR2
andNCHAR
are based on the same character set asVARCHAR2.
Related Topics
Parent topic: Update Table Node
Create Update Table Node
You create an Update Table node to update an existing table with data from selected columns in the table. You can connect an Update Table node to any node that creates a data flow, such as an Apply node.
To create an Update Table node:
- Update Table Node Automatic Behavior
The automatic behavior of Update Table node depends on the selection of the Auto Input Column Selection option.
Related Topics
Parent topic: Update Table Node
Update Table Node Automatic Behavior
The automatic behavior of Update Table node depends on the selection of the Auto Input Column Selection option.
If the Auto Input Column Selection option is selected, then the behavior under the following scenarios are:
-
When input is connected—If the Update Table node has a selected table to update, then any column that matches the existing columns in the table are mapped to that column automatically. The node becomes valid, if at least one column was included in the specification.
-
When input is disconnected—All columns are automatically removed. The node becomes invalid.
-
When the input node is edited:
-
If a column is added to the input node, then the column is added to the Update node, if there is a matching column in the existing table.
-
If a column is removed, then the column is removed from the node.
-
If a column is edited, then there are two possibilities:
-
If the edited column no longer matches an existing column, then it is removed.
-
If the edited column matches an existing column, then it is added.
-
-
If Auto Input Column Selection is not selected, then you must manually add and remove column specifications from the Update Table node.
Parent topic: Create Update Table Node
Edit Update Table Node
The Edit Update Table dialog box provides the specification for the Update Table node.
To edit the Update Table node:
-
Name: Displays the name of the table. You can select an existing table by clicking Browse Table and select an existing table. Or you can create a new table by clicking New.
-
Auto Input Columns Selection: This option is selected by default. To select other columns, deselect this option. If you deselect this option, then click to select an input column. Use the arrows to move attributes from Available Attributes to Selected Attributes.
-
Drop Existing Rows: If this option is selected, then existing rows in the table are dropped before the table is updated. By default, the option is not selected.
-
Columns are listed in the Data grid.
- Create Table (Update Table)
All of the attributes of the attached table are listed. - Edit Columns (Update Table)
The Edit Columns tab displays the columns of the table to be updated. For each column, the Input Name, Target Name and Target (data) type are displayed.
Related Topics
Parent topic: Update Table Node
Create Table (Update Table)
All of the attributes of the attached table are listed.
You can either accept the name for the new table or select a different name.
-
To delete an attribute, click .
-
To edit an attribute, click .
The Select Attributes dialog box opens.
- Select Attributes (Update Table)
All columns are selected by default.
Parent topic: Edit Update Table Node
Select Attributes (Update Table)
All columns are selected by default.
-
Move the attribute from Selected Attributes to Available Attributes.
-
Click OK.
Parent topic: Create Table (Update Table)
Edit Columns (Update Table)
The Edit Columns tab displays the columns of the table to be updated. For each column, the Input Name, Target Name and Target (data) type are displayed.
By default, the option Auto Input Columns Selection is selected. If you deselect Auto Input Columns Selection, then you must manually select input columns by clicking .
Use the arrows to move the attributes from Available Attributes to Selected Attributes.
Related Topics
Parent topic: Edit Update Table Node
Update Table Node Data Viewer
After the node runs successfully, you can view the data in the Update Table Node Data Viewer.
To view the data, right-click the node and select View Data. The data viewer is the same as Data Source Node Viewer.
Related Topics
Parent topic: Update Table Node
Update Table Node Context Menu
The context menu options depend on the type of the node. It provides the shortcut to perform various tasks and view information related to the node.
To see the context menu, right-click a Data Source node. The following options are available in the context menu:
-
Edit. Opens the Edit Update Table Node dialog box.
-
View Data. Opens the Update Table Node Data Viewer dialog box.
-
Performance Settings. This opens the Edit Selected Node Settings dialog box, where you can set Parallel Settings and In-Memory settings for the node.
-
Show Runtime Errors. Displayed only if there is an error.
-
Show Validation Errors. Displayed only if there are validation errors.
Related Topics
Parent topic: Update Table Node
Update Table Node Properties
In the Properties pane, you can examine and change the characteristics or properties of a node.
To view the properties of a node, click the node and click Properties. If the Properties pane is closed, then go to View and click Properties. Alternately, right-click the node and click Go to Properties.
Update Table Properties consists of these sections:
- Table (Update Table)
The Table tab displays the name of the table to update. - Columns (Update Table)
The Columns tab displays the columns of the table that are to be updated. For each column, the Input Name, Target Name and Target (data) type are displayed. - Cache
The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference. - Details
The Details section displays the name of the node and any comments about it.
Parent topic: Update Table Node
Table (Update Table)
The Table tab displays the name of the table to update.
The default is to not drop existing rows. To drop existing rows, select Drop Existing Rows.
Parent topic: Update Table Node Properties
Columns (Update Table)
The Columns tab displays the columns of the table that are to be updated. For each column, the Input Name, Target Name and Target (data) type are displayed.
By default, the Auto Input Column Selection is selected.
If you deselect the Auto Input Column Selection then you must manually select input columns by clicking . Use the arrows to move the attributes from Available Attributes list to Selected Attributes list.
If data on the server changes, then it may be necessary to refresh the nodes. To refresh, click .
Related Topics
Parent topic: Update Table Node Properties
Cache
The Cache section provides the option to generate a cache for output data. You can change this default using the transform preference.
You can perform the following tasks:
-
Generate Cache of Output Data to Optimize Viewing of Results: Select this option to generate a cache. The default setting is to not generate a cache.
-
Sampling Size: You can select caching or override the default settings. Default sampling size is
Number of Rows
Default Value=2000
-
Related Topics
Parent topic: Update Table Node Properties
Details
The Details section displays the name of the node and any comments about it.
You can change the name and comments in the fields here:
-
Node Name
-
Node Comments
Parent topic: Update Table Node Properties
Target Values Selection
The Target Values Selection dialog box displays the number of target values selected. The default setting is Automatic.
It uses the top 10 Target Class Values by frequency. You can change the number of target values by changing Frequency Count.
You can also select Use Lowest Occurring.
To select custom values:
- Select Custom.
- Move the values from Available Values to Selected Values.
- After you are done, click OK.
Parent topic: Data Nodes