52 DBMS_DATA_MINING
The DBMS_DATA_MINING
package is the application programming interface for creating, evaluating, and querying Oracle Machine Learning for SQL models.
In Oracle Database Release 21c, Oracle Data Mining has been rebranded to Oracle Machine Learning for SQL (Oracle Machine Learning for SQL). The PL/SQL package name, however, has not changed and remains DBMS_DATA_MINING
.
This chapter contains the following topics:
52.1 DBMS_DATA_MINING Overview
Oracle Machine Learning for SQL supports both supervised and unsupervised machine learning. Supervised machine learning predicts a target value based on historical data. Unsupervised machine learning discovers natural groupings and does not use a target. You can use Oracle Machine Learning for SQL procedures on structured data and unstructured text.
Supervised machine learning techniques include:

Classification

Regression

Feature Selection (Attribute Importance)

Time Series
Unsupervised machine learning techniques include:

Clustering

Association

Feature Extraction

Anomaly Detection
The steps you use to build and apply a machine learning model depend on the machine learning technique and the algorithm being used. The algorithms supported by Oracle Machine Learning for SQL are listed in the following table.
Table 521 Oracle Machine Learning for SQL Algorithms
Algorithm  Abbreviation  Function 

AR 

CUR 

DT 
Classification 

EM 
Clustering 

ESA 

ESM 

GLM 

KM 

Minimum Descriptor Length 
MDL 

Multivariate State Estimation Technique  Sequential Probability Ratio Test 
MSETSPRT 
Anomaly detection, classification 
NB 

NN 

NMF 

OCluster 

RF 

Singular Value Decomposition and Principal Component Analysis 
SVD and PCA 
Feature extraction 
SVM 

XGBoost 
Classification, regression 
Oracle Machine Learning for SQL supports more than one algorithm for the classification, regression, clustering, and feature extraction machine learning techniques. Each of these machine learning techniques has a default algorithm, as shown in the following table.
Table 522 Oracle Machine Learning for SQL Default Algorithms
Mining Function  Default Algorithm 

Classification 
Naive Bayes 
Clustering 
kMeans 
Feature Extraction 
NonNegative Matrix Factorization 
Feature Selection 
Minimum Descriptor Length 
Regression 
Support Vector Machine 
Time Series 
Exponential Smoothing 
52.2 DBMS_DATA_MINING Security Model
The DBMS_DATA_MINING
package is owned by user
SYS
and is installed as part of database installation. Execution
privilege on the package is granted to public. The routines in the package are run with
invokers' rights (run with the privileges of the current user).
The DBMS_DATA_MINING
package exposes
APIs that are leveraged by the Oracle Machine Learning for SQL. Users who wish to create machine learning models
in their own schema require the CREATE MINING MODEL
system
privilege. Users who wish to create machine learning models in
other schemas require the CREATE ANY MINING MODEL
system privilege.
Users have full control over managing models that exist within
their own schema. Additional system privileges necessary for managing machine learning models in other schemas include ALTER
ANY MINING MODEL
, DROP ANY MINING MODEL
,
SELECT ANY MINING MODEL
, COMMENT ANY MINING
MODEL
, and AUDIT ANY
.
Individual object privileges on machine learning models,
ALTER MINING MODEL
and SELECT MINING MODEL
,
can be used to selectively grant privileges on a model to a different user.
See Also:
Oracle Data Mining User's Guide for more information about the security features of Oracle Machine Learning for SQL
52.3 DBMS_DATA_MINING — Machine Learning Functions
A machine learning function refers to the methods for solving a given class of machine learning problems.
The machine learning function must be specified when a model is created. You specify a machine learning function with the mining_function
parameter of the CREATE_MODEL Procedure or the CREATE_MODEL2 Procedure.
Table 523 Machine Learning Functions
Value  Description 


Association is a descriptive machine learning function. An association model identifies relationships and the probability of their occurrence within a data set. Association models use the Apriori algorithm. 

Attribute importance is a predictive machine learning function, also known as feature selection. An attribute importance model identifies the relative importance of an attribute in predicting a given outcome. Attribute importance models can use Minimum Description Length (MDL) or CUR Matrix Decomposition. MDL is the default. 

Classification is a predictive machine learning function. A classification model uses historical data to predict a categorical target. Classification models can use: Decision Tree, logistic regression, Multivariate State Estimation Technique  Sequential Probability Ratio Test, Naive Bayes, Support Vector Machine (SVM), or XGBoost. The default is Naive Bayes. The classification function can also be used for anomaly detection. For anomaly detection, you can use the Multivariate State Estimation Technique  Sequential Probability Ratio Test algorithm or the SVM algorithm with a null target (OneClass SVM). 

Clustering is a descriptive machine learning function. A clustering model identifies natural groupings within a data set. Clustering models can use kMeans, OCluster, or Expectation Maximization. The default is kMeans. 

Feature extraction is a descriptive machine learning function. A feature extraction model creates an optimized data set on which to base a model. Feature extraction models can use Explicit Semantic Analysis, NonNegative Matrix Factorization, Singular Value Decomposition, or Principal Component Analysis. NonNegative Matrix Factorization is the default. 

Regression is a predictive machine learning function. A regression model uses historical data to predict a numerical target. Regression models can use linear regression, Support Vector Machine, or XGBoost. The default is Support Vector Machine. 

Time series is a predictive machine learning function. A time series model forecasts the future values of a timeordered series of historical numeric data over a userspecified time window. Time series models use the Exponential Smoothing algorithm. 
See Also:
Oracle Machine Learning for SQL Concepts for more information about mining functions
52.4 DBMS_DATA_MINING — Model Settings
Oracle Machine Learning for SQL uses settings to specify the algorithm and other characteristics of a model. Some settings are general, some are specific to a machine learning function, and some are specific to an algorithm.
All settings have default values. If you want to override one or more of the settings for a model, then you must create a settings table. The settings table must have the column names and data types shown in the following table.
Table 524 Required Columns in the Model Settings Table
Column Name  Data Type 





The information you provide in the settings table is used by the model at build time. The name of the settings table is an optional argument to the CREATE_MODEL Procedure. You can also provide these settings through the CREATE_MODEL2 Procedure.
The settings used by a model can be found by querying the data dictionary view ALL_MINING_MODEL_SETTINGS
. This view displays the model settings used by the machine learning models to which you have access. All of the default and userspecified setting values are included in the view.
See Also:

ALL_MINING_MODEL_SETTINGS
in Oracle Database Reference 
Oracle Machine Learning for SQL User’s Guide for information about specifying model settings
52.4.1 DBMS_DATA_MINING — Algorithm Names
The ALGO_NAME
setting specifies the model algorithm.
The values for the ALGO_NAME
setting are listed in the following table.
Table 525 Algorithm Names
ALGO_NAME Value  Description  Machine Learning Function 


Minimum Description Length 
Attribute importance 

Apriori 
Association rules 

CUR Matrix Decomposition 
Attribute importance 

Decision Tree 
Classification 

Expectation Maximization 
Clustering 

Explicit Semantic Analysis 
Feature extraction Classification 

Exponential Smoothing 
Time series 

Language used for extensible algorithm 
All mining functions supported 

Generalized Linear Model 
Classification, regression; also feature selection and generation 

Enhanced kMeans 
Clustering 

Multivariate State Estimation Technique  Sequential Probability Ratio Test 
Classification 

Naive Bayes 
Classification 

Neural Network 
Classification 

NonNegative Matrix Factorization 
Feature extraction 

OCluster 
Clustering 

Random Forest 
Classification 

Singular Value Decomposition 
Feature extraction 

Support Vector Machine 
Classification and regression 

XGBoost 
Classification and regression 
See Also:
Oracle Machine Learning for SQL Concepts for information about algorithms
52.4.2 DBMS_DATA_MINING — Automatic Data Preparation
Oracle Machine Learning for SQL supports
fully Automatic Data Preparation (ADP), userdirected general data preparation, and
userspecified embedded data preparation. The PREP_*
settings enable the
user to request fully automated or userdirected general data preparation. By default, fully
Automatic Data Preparation (PREP_AUTO_ON
) is enabled.
When you enable ADP, the model uses heuristics to
transform the build data according to the requirements of the algorithm. Instead of
fully ADP, the user can request that the data be shifted and/or scaled with the
PREP_SCALE*
and PREP_SHIFT*
settings. The
transformation instructions are stored with the model and reused whenever the model
is applied. The model settings can be viewed
in USER_MINING_MODEL_SETTINGS
.
You can choose
to supplement Automatic Data Preparations by specifying additional transformations
in the xform_list
parameter when you build the model.
See "CREATE_MODEL Procedure" and "CREATE_MODEL2 Procedure".
If you do not use ADP and do
not specify transformations in the xform_list
parameter to
CREATE_MODEL
, you must implement your own transformations
separately in the build, test, and scoring data. You must take special care to
implement the exact same transformations in each data set.
If you
do not use ADP, but you do specify transformations in the
xform_list
parameter to CREATE_MODEL
, OML4SQL embeds the transformation definitions in the
model and prepares the test and scoring data to match the build data.
The values for the PREP_*
setting are described in the
following table.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.PREP_AUTO_ON
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'ON'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 526 PREP_* Setting
Setting Name  Constant Value  String Value Equivalent  Description 




This setting enables fully automated data
preparation.
The default is



Disables fully automated data preparation. 




This setting enables scaling data preparation for
twodimensional numeric columns.



A request to divide the column values by the range of
values and is often provided together with





This setting enables scaling data preparation for nested
numeric columns. 



This setting enables centering data preparation for
twodimensional numeric columns.



Results in subtracting the minimum of the column from each value. 
See Also:
Oracle® Machine Learning for SQL for information about data transformations
52.4.3 DBMS_DATA_MINING — Machine Learning Function Settings
The settings described in this table apply to a machine learning function.
Table 527 Machine Learning Function Settings
Machine Learning Function  Setting Name  Setting Value  Description 

Association 


Maximum rule length for association rules. Default is 
Association 


Minimum confidence for association rules. Default is 
Association 


Minimum support for association rules Default is 
Association 

a positive integer 
Minimum absolute support that each rule must satisfy. The value must be an integer. The default is 
Association 


Sets the Minimum Reverse Confidence that each rule should satisfy. The Reverse Confidence of a rule is defined as the number of transactions in which the rule occurs divided by the number of transactions in which the consequent occurs. The value is real number between 0 and 1. The default is 
Association 


Sets Including Rules applied for each association rule: it specifies the list of items that at least one of them must appear in each reported association rule, either as antecedent or as consequent. It is a comma separated string containing the list of including items. If not set, the default behavior is, the filtering is not applied. For example,

Association 


Sets Excluding Rules applied for each association rule: it specifies the list of items that none of them can appear in each reported association rules. It is a comma separated string containing the list of excluded items. No rule can contain any item in the list. The default is For example,

Association 


Sets Including Rules for the antecedent: it specifies the list of items that at least one of them must appear in the antecedent part of each reported association rule. It is a comma separated string containing the list of including items. The antecedent part of each rule must contain at least one item in the list. The default is For example,

Association 


Sets Excluding Rules for the antecedent: it specifies the list of items that none of them can appear in the antecedent part of each reported association rule. It is a comma separated string containing the list of excluded items. No rule can contain any item in the list in its antecedent part. The default is For example,

Association 


Sets Including Rules for the consequent: it specifies the list of items that at least one of them must appear in the consequent part of each reported association rule. It is a comma separated string containing the list of including items. The consequent of each rule must be an item in the list. The default is For example,

Association 


Sets Excluding Rules for the consequent: it specifies the list of items that none of them can appear in the consequent part of each reported association rule. It is a comma separated string containing the list of excluded items. No rule can have any item in the list as its consequent. The excluding rule can be used to reduce the data that must be stored, but the user may be required to build an extra model for executing different including or Excluding Rules. The default is For example,

Association 


Specifies the columns to be aggregated. It is a comma separated string containing the names of the columns for aggregation. The number of columns in the list must be <= 10. You can set
The default is For each item, the user may supply several columns to aggregate. It requires more memory to buffer the extra data. Also, the performance impact can be seen because of the larger input data set and more operation. 
Association 

0 <ASSO_ABS_ERROR ≤MAX(ASSO_MIN_SUPPORT, ASSO_MIN_CONFIDENCE). 
Specifies the absolute error for the association rules sampling. A smaller value of 
Association 

0 ≤ ASSO_CONF_LEVEL ≤ 1 
Specifies the confidence level for an association rules sample. A larger value of 
Classification 

table_name 
(Decision tree only) Name of a table that stores a cost matrix to be used by the algorithm in building the model. The cost matrix specifies the costs associated with misclassifications. Only decision tree models can use a cost matrix at build time. All classification algorithms can use a cost matrix at apply time. The cost matrix table is usercreated. See "ADD_COST_MATRIX Procedure" for the column requirements. See Oracle Machine Learning for SQL Concepts for information about costs. 
Classification 

table_name 
(Naive Bayes) Name of a table that stores prior probabilities to offset differences in distribution between the build data and the scoring data. The priors table is usercreated. See Oracle Machine Learning for SQL User’s Guide for the column requirements. See Oracle Machine Learning for SQL Concepts for additional information about priors. 
Classification 

table_name 
(GLM and SVM only) Name of a table that stores weighting information for individual target values in SVM classification and GLM logistic regression models. The weights are used by the algorithm to bias the model in favor of higher weighted classes. The class weights table is usercreated. See Oracle Machine Learning for SQL User’s Guide for the column requirements. See Oracle Machine Learning for SQL Concepts for additional information about class weights. 
Classification 


This setting indicates that the algorithm must create a model that balances the target distribution. This setting is most relevant in the presence of rare targets, as balancing the distribution may enable better average accuracy (average of perclass accuracy) instead of overall accuracy (which favors the dominant class). The default value is 
Classification 

For Decision Tree:
For Random Forest:

This parameter specifies the maximum number of bins for each attribute. The default value is 
Clustering 


The maximum number of leaf clusters generated by a clustering algorithm. The algorithm may return fewer clusters, depending on the data. Enhanced kMeans usually produces the exact number of clusters specified by Expectation maximization (EM) may return fewer clusters than the number specified by For EM, the default value of 
Feature extraction 


The number of features to be extracted by a feature extraction model. The default is estimated from the data by the algorithm. If the matrix rank is smaller than this number, fewer features will be returned. For CUR Matrix Decomposition, the 
See Also:
Oracle Machine Learning for SQL Concepts for information about machine learning functions
52.4.4 DBMS_DATA_MINING — Global Settings
The configuration settings in this table are applicable to any type of model, but are currently only implemented for specific algorithms.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.ODMS_MISSING_VALUE_MEAN_MODE
. Alternatively, you can
specify the corresponding string value from the String Value Equivalent
column without the DBMS_DATA_MINING
prefix, in single quotes. For
example, 'ODMS_MISSING_VALUE_MEAN_MODE'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 528 Global Settings
Setting Name  Setting Value  String Value Equivalent  Description 


column_name 
column_name 
(Association rules only) Name of a column that contains the items in a transaction. When this setting is specified, the algorithm expects the data to be presented in a native transactional format, consisting of two columns:
A typical example of transactional data is market basket data, wherein a case represents a basket that may contain many items. Each item is stored in a separate row, and many rows may be needed to represent a case. The case ID values do not uniquely identify each row. Transactional data is also called multirecord case data. Association rules function is normally used with transactional data, but it can also be applied to singlerecord case data (similar to other algorithms). For more information about singlerecord and multirecord case data, see Oracle SQL Developer Data Modeler User's Guide. 

column_name 
column_name 
(Association rules only) Name of
a column that contains a value associated with each item in a
transaction. This setting is only used when a value has been
specified for If
ASSO_AGGREGATES , Case ID, and Item ID column
are present, then the Item Value column may or may not appear.
The Item Value column may specify information such as the number of items (for example, three apples) or the type of the item (for example, macintosh apples). For details onASSO_AGGREGATES , see DBMS_DATA_MINING  Mining Function
Settings.




Indicates how to treat missing values in the training
data. This setting does not affect the scoring data. The default
value is
When 


The value 



ODMS_MISSING_VALUE_AUTO performs different strategies
for different algorithms.



column_name 
column_name 
(GLM only) Name of a column in the training data that contains a weighting factor for the rows. The column data type must be NUMBER. Row weights can be used as a compact representation of repeated rows, as in the design of experiments where a specific configuration is repeated several times. Row weights can also be used to emphasize certain rows during model construction. For example, to bias the model towards rows that are more recent and away from potentially obsolete data. 

The name of an Oracle Text POLICY created using

The name of an Oracle Text POLICY created using

Affects how individual tokens are extracted from unstructured text. For details about


1 <= value 
1 <= value 
The maximum number of distinct features, across all text
attributes, to use from a document set passed to


Nonnegative value 
Nonnegative value 
This is a text processing setting the controls how in how many documents a token needs to appear to be used as a feature. The default is 

Comma separated list of machine learning attributes 
Comma separated list of machine learning attributes 
This setting indicates a request to build a partitioned
model. The setting value is a commaseparated list of the machine learning attributes to be used to
determine the inlist partition key values. These machine learning attributes are taken from the input
columns unless an 



This setting indicates the maximum number of partitions
allowed for the model. The default is 



This setting allows the user to request a sampling of the
build data. The default is



Disables sampling of the build data.  



This setting determines how many rows will be sampled
(approximately). It can be set only if




This setting controls the parallel build of partitioned models.
The default mode is



Each partition is built entirely in a single slave, but multiple partitions may be built at the same time since multiple slaves are active. 



It is a combination of the other two types and is recommended for most situations to adapt to dynamic environments. 


tablespace_name 
tablespace_name 
This setting controls the storage specifications. If you explicitly sets this to the name of a tablespace (for which you have sufficient quota), then the specified tablespace storage creates the resulting model content. If you do not provide this setting, then the default tablespace of the user creates the resulting model content. 

Nonnegative integer 
Nonnegative integer 
The hash function with a random number seed generates a
random number with uniform distribution. Users can control the
random number seed by this setting. The default is
This setting is used by Random Forest, Neural Network, and CUR Matrix Decomposition. 



This setting reduces the space that is used while
creating a model, especially a partitioned model. The default value
is When the setting is The reduction in space depends on the model. Reduction on the order of 10x can be achieved. 


When the setting is 
See Also:
Oracle Machine Learning for SQL Concepts for information about GLM
Oracle Machine Learning for SQL Concepts for information about association rules
Oracle Machine Learning for SQL User’s Guide for information about machine learning unstructured text
52.5 DBMS_DATA_MINING — Algorithm Specific Model Settings
Oracle Machine Learning for SQL uses algorithm specific settings to define the characteristics of a model.
All settings have default values. If you want to override one or more of the settings for a model, then you must specify those settings.
The information you provide in the settings table is used by the model at build time. The name of the settings table is an optional argument to the CREATE_MODEL Procedure. You can also provide these settings through the CREATE_MODEL2 Procedure.
The settings used by a model can be found by querying the data dictionary view ALL_MINING_MODEL_SETTINGS
. This view displays the model settings used by the machine learning models to which you have access. All of the default and userspecified setting values are included in the view.
See Also:

ALL_MINING_MODEL_SETTINGS
in Oracle Database Reference 
Oracle Machine Learning for SQL User’s Guide for information about specifying model settings
52.5.1 DBMS_DATA_MINING — Algorithm Settings: ALGO_EXTENSIBLE_LANG
The settings listed in the following table configure the behavior of the machine learning model with an extensible algorithm. The model is built in the R language.
The RALG_*_FUNCTION
specifies the R script that is used to build, score, and view an R model and must be registered in the Oracle Machine Learning for R script repository. The R scripts are registered through Oracle Machine Learning for R with special privileges. When ALGO_EXTENSIBLE_LANG
is set to R in the MINING_MODEL_SETTING
table, the machine learning model is built in the R language. After the R model is built, the names of the R scripts are recorded in the MINING_MODEL_SETTING
table in the SYS
schema. The scripts must exist in the script repository for the R model to function. The amount of R memory used to build, score, and view the R model through these R scripts can be controlled by Oracle Machine Learning for R.
All algorithmindependent DBMS_DATA_MINING
subprograms can operate on an R model for machine learning functions such as association, attribute importance, classification, clustering, feature extraction, and regression.
The supported DBMS_DATA_MINING
subprograms include, but are not limited, to the following:

ADD_COST_MATRIX Procedure

COMPUTE_CONFUSION_MATRIX Procedure

COMPUTE_LIFT Procedure

COMPUTE_ROC Procedure

CREATE_MODEL Procedure

DROP_MODEL Procedure

EXPORT_MODEL Procedure

GET_MODEL_COST_MATRIX Function

IMPORT_MODEL Procedure

REMOVE_COST_MATRIX Procedure

RENAME_MODEL Procedure
Table 529 ALGO_EXTENSIBLE_LANG Settings
Setting Name  Setting Value  Description 



Specifies the name of an existing registered R script for the R algorithm machine learning model build function. The R script defines an R function for the first input argument for training data and returns an R model object. For clustering and feature extraction machine learning function model build, the R attributes dm$nclus and dm$nfeat must be set on the R model to indicate the number of clusters and features respectively. The 

SELECT value param_name, ...FROM DUAL

Specifies a list of numeric and string scalar for optional input parameters of the model build function. 


Specifies the name of an existing registered R script to score data. The script returns a 


Specifies the name of an existing registered R script for the R algorithm that computes the weight (contribution) for each attribute in scoring. The script returns a 


Specifies the name of an existing registered R script for the R algorithm that produces the model information. This setting is required to generate a model view. 

SELECT type_value column_name, ... FROM DUAL 
Specifies the 
52.5.2 DBMS_DATA_MINING — Algorithm Settings: CUR Matrix Decomposition
The following settings affects the behavior of the CUR Matrix Decomposition algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.CURS_ROW_IMP_DISABLE
. Alternatively, you can specify
the corresponding string value from the String Value Equivalent column
without the DBMS_DATA_MINING
prefix, in single quotes. For example,
'CURS_ROW_IMP_DISABLE'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5210 CUR Matrix Decomposition Settings
Setting Name  Constant Value  String Value Equivalent  Description 


The value must be a positive integer 
The value must be a positive integer 
Defines the approximate number of attributes to be selected. The default value is the number of attributes. 



Defines the flag indicating whether or not to perform row selection. Enables row selection. The default value is



Disables row selection. 


The value must be a positive integer 
The value must be a positive integer 
Defines the approximate number of rows to be selected.
This parameter is only used when users decide to perform row
selection ( The default value is the total number of rows. 

The value must be a positive integer 
The value must be a positive integer 
Defines the rank parameter used in the column/row leverage score calculation. If users do not provide an input value, the value is determined by the system. 
See Also:
52.5.3 DBMS_DATA_MINING — Algorithm Settings: Decision Tree
These settings configure the behavior of the Decision Tree algorithm. Note that the Decision Tree settings are also used to configure the behavior of Random Forest as it constructs each individual decision tree.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.TREE_IMPURITY_ENTROPY
. Alternatively, you can specify
the corresponding string value from the String Value Equivalent column
without the DBMS_DATA_MINING
prefix, in single quotes. For example,
'TREE_IMPURITY_ENTROPY'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5211 Decision Tree Settings
Setting Name  Constant Value  String Value Equivalent  Description 




Tree impurity metric for Decision Tree. Tree algorithms seek the best test question for splitting
data at each node. The best splitter and split values are those that
result in the largest increase in target value homogeneity (purity)
for the entities in the node. Purity is by a metric. By default,
the algorithm uses 


Decision trees can use either Gini
( 


For Decision Tree:
For Random Forest:

For Decision Tree:
For Random Forest:

Criteria for splits: maximum tree depth (the maximum number of nodes between the root and any leaf node, including the leaf node). For Decision Tree, the default is
For Random Forest, the default is




The minimum number of training rows in a node expressed as a percentage of the rows in the training data. Default is 



The minimum number of rows required to consider splitting a node expressed as a percentage of the training rows. Default is 



The minimum number of rows in a node. Default is 



Criteria for splits: minimum number of records in a parent node expressed as a value. No split is attempted if the number of records is below this value. Default is 
See Also:
Oracle Machine Learning for SQL Concepts for information about Decision Tree
52.5.4 DBMS_DATA_MINING — Algorithm Settings: Expectation Maximization
These algorithm settings configure the behavior of the Expectation Maximization algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For
example,
DBMS_DATA_MINING.EMCS_ATTR_FILTER_ENABLE
.
Alternatively, you can specify the corresponding
string value from the String Value Equivalent
column without the DBMS_DATA_MINING
prefix, in single quotes. For example,
'EMCS_ATTR_FILTER_ENABLE'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.See Also:
Oracle Data Mining Concepts for information about Expectation Maximization
Table 5212 Expectation Maximization Settings for Data Preparation and Analysis
Setting Name  Constant Value  String Value Equivalent  Description 




Whether or not to include
uncorrelated attributes in the model. When
Note: This setting applies only to attributes that are not nested. Default is systemdetermined. 


Includes uncorrelated attributes in the model. 




Maximum number of correlated attributes to include in the model. Note: This setting applies only to attributes that are not nested (2D). Default is




The distribution for modeling numeric attributes. Applies to the input table or view as a whole and does not allow perattribute specifications. The options include Bernoulli, Gaussian, or systemdetermined distribution. When Bernoulli or Gaussian distribution is chosen, all numeric attributes are modeled using the same type of distribution. Default is



Models all numeric attributes using Gaussian distribution. 



When the distribution is systemdetermined, individual attributes may use different distributions (either Bernoulli or Gaussian), depending on the data. 




Number of equiwidth bins that will be used for gathering cluster statistics for numeric columns. Default is




Specifies the number of projections that will be used for each nested column. If a column has fewer distinct attributes than the specified number of projections, the data will not be projected. The setting applies to all nested columns. Default is 



Specifies the number of quantile bins that will be used for modeling numeric columns with multivalued Bernoulli distributions. Default is systemdetermined. 



Specifies the number of topN bins that will be used for modeling categorical columns with multivalued Bernoulli distributions. Default is systemdetermined. 
Table 5213 Expectation Maximization Settings for Learning
Setting Name  Constant Value  String Value Equivalent  Description 




The convergence criterion for EM. The convergence criterion may be based on a heldaside data set, or it may be Bayesian Information Criterion.
Default is system determined. 


Uses the Bayesian Information Criterion (BIC) for convergence. 




When the convergence criterion is
based on a heldaside data set
( Default value is




Maximum number of components in the model. If model search is enabled, the algorithm automatically determines the number of components based on improvements in the likelihood function or based on regularization, up to the specified maximum. The number of components must be greater than or equal to the number of clusters. Default is 20. 



Specifies the maximum number of iterations in the EM algorithm. Default is




This setting enables model search in EM where different model sizes are explored and a best size is selected. The default is



Deactivates model search. 




This setting allows the EM algorithm to remove a small component from the solution. The default is



Prevents the EM algorithm from removing small components. 


Nonnegative integer 
Nonnegative integer 
This setting controls the seed of
the random generator used in EM. The default is

Table 5214 Expectation Maximization Settings for Component Clustering
Setting Name  Constant Value  String Value Equivalent  Description 




Enables or disables the grouping of EM components into highlevel clusters. When disabled, the components themselves are treated as clusters. When component clustering is
enabled, model scoring through the SQL
Default is



When clustering is disabled, the





Dissimilarity threshold that controls the clustering of EM components. When the dissimilarity measure is less than the threshold, the components are combined into a single cluster. A lower threshold may produce more clusters that are more compact. A higher threshold may produce fewer clusters that are more spread out. Default is 



Allows the specification of a linkage function for the agglomerative clustering step.
Default is








Table 5215 Expectation Maximization Settings for Cluster Statistics
Setting Name  Constant Value  String Value Equivalents  Description 




Enables or disables the gathering of descriptive statistics for clusters (centroids, histograms, and rules). Default is



When statistics are disabled, model size is
reduced, and 




Minimum support required for including an attribute in the cluster rule. The support is the percentage of the data rows assigned to a cluster that must have nonnull values for the attribute. Default is

52.5.5 DBMS_DATA_MINING — Algorithm Settings: Explicit Semantic Analysis
Explicit Semantic Analysis (ESA) is a useful technique for extracting meaningful and interpretable features.
Table 5216 Explicit Semantic Analysis Settings
Setting Name  Constant Value  String Value Equivalent  Description 


Nonnegative number 
Nonnegative number 
This setting thresholds a small value for attribute
weights in the transformed build data. The default is


Text input Nontext input is 
Text input Nontext input is 
This setting determines the minimum number of nonzero entries that need to be present in an input row. The default is 100 for text input and 0 for nontext input. 

A positive integer 
A positive integer 
This setting controls the maximum number of features
per attribute. The default is 
See Also:
Oracle Machine Learning for SQL Concepts for information about ESA.
52.5.6 DBMS_DATA_MINING — Algorithm Settings: Exponential Smoothing
These settings configure the behavior of the Exponential Smoothing (ESM) algorithm.
The Constant Value column specifies constants
using the prefix DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.EXSM_SIMPLE
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'EXSM_SIMPLE'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5217 Exponential Smoothing Settings
Setting Name  Constant Value  String Value Equivalent  Description 

EXSM_MODEL 

EXSM_SIMPLE 
This setting specifies the model.
The default value is

EXSM_SIMPLE_MULT 
EXSM_SIMPLE_MULT_ERR 
Forecasts data as a weighted moving average, with the influence of past observations declining exponentially with the length of time since the observation occurred. Errors in estimation are assumed to be proportional to the level of the prior estimate.  
EXSM_HOLT 
EXSM_HOLT 
Applies Holt's linear exponential smoothing method, designed to forecast data with an underlying linear trend.  
EXSM_HOLT_DMP 
EXSM_HOLT_DAMPED 
Applies Holt's linear exponential smoothing with a damping factor to progressively reduce the strength of the trend over time.  
EXSM_MUL_TRND 
EXSM_MULT_TREND 
Applies an exponential smoothing framework with a multiplicative trend component, effectively capturing data where trends are not linear but grow or decay over time.  
EXSM_MULTRD_DMP 
EXSM_MULT_TREND_DAMPED 
Applies an exponential smoothing algorithm with a multiplicative trend that diminishes over time, providing a conservative approach to trend estimation.  
EXSM_SEAS_ADD 
EXSM_SEASON_ADD 
Applies an exponential smoothing with an additive seasonal component, isolating and accounting for seasonal variations without incorporating a trend.  
EXSM_SEAS_MUL 
EXSM_SEASON_MUL 
Executes exponential smoothing with a multiplicative seasonal component, capturing seasonal effects that increase or decrease in proportion to the level of the series.  
EXSM_HW 
EXSM_WINTERS 
Applies the HoltWinters method with additive trends and multiplicative seasonality, offering a robust model for data with both linear trend and proportional seasonal variation.  
EXSM_HW_DMP 
EXSM_WINTERS_DAMPED 
Applies the HoltWinters method with a damped trend and multiplicative seasonality, moderating the linear trend over time while still capturing proportional seasonal changes.  
EXSM_HW_ADDSEA 
EXSM_ADDWINTERS 
Applies the HoltWinters additive model to simultaneously smooth data with linear trends and additive seasonal effects.  
EXSM_DHW_ADDSEA 
EXSM_ADDWINTERS_DAMPED 
Applies the HoltWinters additive approach with a damping mechanism, reducing the impact of the trend and seasonal components over time.  
EXSM_HWMT 
EXSM_WINTERS_MULT_TREND 
Applies the HoltWinters model with both trend and seasonality components being multiplicative, suited for series where the seasonal variations and trends are both increasing or decreasing proportional to level.  
EXSM_HWMT_DMP 
EXSM_WINTERS_MUL_TREND_DMP 
Applies the HoltWinters model with a damped multiplicative trend, effectively moderating the exponential increase or decrease of both trend and seasonal components over time.  
EXSM_SEASONALITY 
positive integer > 1 
positive integer > 1 
This setting specifies a positive integer value as the length of seasonal cycle. The value it takes must be larger than
This setting is only applicable and must be provided for models with seasonality, otherwise the model throws an error. When 
EXSM_INTERVAL 

EXSM_INTERVAL_YEAR 
This setting only applies and must be provided when
the time column ( The model throws an error if the time column of
input table is of datetime type and setting

EXSM_INTERVAL_QTR 
EXSM_INTERVAL_QTR 
This option sets the spacing interval to a quarter, aggregating the data for every three months.  
EXSM_INTERVAL_MONTH 
EXSM_INTERVAL_MONTH 
This option adjusts the spacing interval to one month. The accumulated time series represent aggregated or summarized data for each month.  
EXSM_INTERVAL_WEEK 
EXSM_INTERVAL_WEEK 
With this option data is aggregated or summarized on a weekly basis, setting the spacing interval to one week.  
EXSM_INTERVAL_DAY 
EXSM_INTERVAL_DAY 
This option adjusts the spacing interval to one day. It's suitable for scenarios where daily aggregated insights are required.  
EXSM_INTERVAL_HOUR 
EXSM_INTERVAL_HOUR 
For more granular insights, this option sets the spacing interval to one hour. It's especially useful when analyzing data that changes significantly within a day.  
EXSM_INTERVAL_MIN 
EXSM_INTERVAL_MINUTE 
With this option the spacing is set to one minute. This provides a very detailed view of data, suitable for applications like highfrequency trading or realtime monitoring systems.  
EXSM_INTERVAL_SEC 
EXSM_INTERVAL_SECOND 
For most granular details, this options sets the spacing interval to one second. It's tailored for scenarios requiring realtime or nearrealtime analysis.  
EXSM_INITVL_OPTIMIZE 


The setting The default value is



Note: EXSM_INITVL_OPTIMIZE can only be set
to EXSM_INITVL_OPTIMIZE_DISABLE if the user has set
EXSM_MODEL to EXSM_HW or
EXSM_HW_ADDSEA . If EXSM_MODEL
is set to another model type or is not specified, you get an error
40213 (conflicting settings) and the model is not built.


EXSM_ACCUMULATE 


This setting only applies and must be provided when the time column has datetime type. It specifies how to generate the value of the accumulated time series from the input time series.
The default value is



This option computes the standard deviation of the time series values within a specified interval. It helps you understand the amount of variation or dispersion in your data. 



By selecting this option, the maximum value of the time series within a specified interval will be determined. It helps in identifying the peak value in the given range. 



This option focuses on determining the minimum value of the time series within a specified interval. It is useful for identifying the lowest value in the time series for the given datetime range. 



This specifies the average value of your time series within a specified interval. It calculates the mean value of all data points in the specified range. 



This option provides the median of the time series values within the given interval. The median gives a central value, which can be especially useful if your data contains outliers. 



This option counts the number of time series values within the specified interval. It is helpful if you want to know how many data points are present in a certain datetime range. 

EXSM_SETMISSING 
Specify an option:


This setting specifies how to handle missing values, which may come from input data and/or the accumulation process of time series. You can specify either a number or an option. If a number is specified, all the missing values are set to that number.



Replaces missing value with maximum of the accumulated time series. 



Replaces missing value with average of the accumulated time series. 



Replaces missing value with median of the accumulated time series. 



Replaces missing value with last nonmissing value of the accumulated time series. 



Replaces missing value with first nonmissing value of the accumulated time series. 



Replaces missing value with the previous nonmissing value of the accumulated time series. 



Replaces missing value with the next nonmissing value of the accumulated time series. 



EXSM model treats the input data as an irregular (nonuniformly spaced) time series. If this setting is not provided,


EXSM_PREDICTION_STEP 
It must be set to a number between 130.  It must be set to a number between 130. 
This setting specifies how many steps ahead the predictions are to be made. If it is not set, the default value is

EXSM_CONFIDENCE_LEVEL 
It must be a number between 0 and 1, exclusive.  It must be a number between 0 and 1, exclusive. 
This setting specifies the desired confidence level for prediction. The lower and upper bounds of the specified
confidence interval is reported. If this setting is not
specified, the default confidence level is

EXSM_OPT_CRITERION 


This setting specifies the desired optimization criterion. The optimization criterion is useful as a diagnostic for comparing models' fit to the same data.
The default value is



This provides the mean squared error pertaining to the model. 



This denotes the average of the mean squared error over a time window as specified by the user. 



This metric captures the standard deviation of the residuals of the model. 



This metric conveys the average absolute error associated with the model. It measures the size of the error. 

EXSM_NMSE 
positive integer  positive integer  This setting specifies the length of the window used in computing the error metric Average Mean Square Error (AMSE). 
See Also:
Oracle Machine Learning for SQL Concepts for information about ESM.
https://github.com/oraclesamples/oracledbexamples/tree/main/machinelearning/sql browse to the release folder
and click the oml4sqltimeseriesexponentialsmoothing.sql
example.
52.5.7 DBMS_DATA_MINING — Algorithm Settings: Generalized Linear Model
The settings listed in the following table configure the behavior of the Generalized Linear Model algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.GLMS_FTR_GEN_QUADRATIC
. Alternatively, you can specify
the corresponding string value from the String Value Equivalent column
without the DBMS_DATA_MINING
prefix, in single quotes. For example,
'GLMS_FTR_GEN_QUADRATIC'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5218 DBMS_DATA_MINING GLM Settings
Setting Name  Constant Value  String Value Equivalent  Description 




The confidence level for coefficient confidence intervals. The default confidence level is




Whether feature generation is quadratic or cubic. When feature generation is enabled, the algorithm automatically chooses the most appropriate feature generation method based on the data.



Generates features using a cubic method. 




Whether or not feature generation is enabled for GLM. By default, feature generation is not enabled. Note: Feature generation can only be enabled when feature selection is also enabled. 


Disables feature generation for GLM (default). 




Feature selection penalty criterion for adding a feature to the model. When feature selection is enabled, the algorithm automatically chooses the penalty criterion based on the data.



Uses Schwarz Bayesian Information Criterion for feature selection. 



Uses Risk Inflation Criterion for feature selection.  


Uses Alpha Inverse Criterion for feature selection.  



Whether or not feature selection is enabled for GLM. By default, feature selection is not enabled. 


Disables feature selection. 




When feature selection is enabled, this setting specifies the maximum number of features that can be selected for the final model. By default, the algorithm limits the number of features to ensure sufficient memory. 



Enables or disables pruning of features in the final model. Pruning is based on TTest statistics for linear regression, or Wald Test statistics for logistic regression. Features are pruned in a loop until all features are statistically significant with respect to the full data. When feature selection is enabled, the algorithm automatically performs pruning. 


Disables pruning of features. 


target_value 
target_value 
The target value used as the reference class in a binary logistic regression model. Probabilities are produced for the other class. By default, the algorithm chooses the value with the highest prevalence (the most cases) for the reference class. 



Enable or disable ridge regression. Ridge applies to both regression and classification machine learning techniques. When ridge is enabled, prediction bounds are not
produced by the Note: Ridge may only be enabled when feature selection is not specified, or has been explicitly disabled. If ridge regression and feature selection are both explicitly enabled, then an exception is raised. 


Disables ridge regression. 




The value of the ridge parameter. This setting is only used when the algorithm is configured to use ridge regression. If ridge regression is enabled internally by the algorithm, then the ridge parameter is determined by the algorithm. 
GLMS_ROW_DIAGNOSTICS



Enables or disables row diagnostics.



Disables row diagnostics (default). 


The range is ( 
The range is ( 
Convergence Tolerance setting of the GLM algorithm The default value is systemdetermined. 

Positive integer 
Positive integer 
Maximum number of iterations for the GLM algorithm. The default value is systemdetermined. 



Number of rows in a batch used by the SGD solver. The value of this parameter sets the size of the batch for the SGD solver. An input of 0 triggers a data driven batch size estimate. The default is 



This setting allows the user to choose the GLM solver.
The solver cannot be selected if
The default value is system determined. See Also: GLM Solvers 


Solves generalized linear models using the Cholesky decomposition method, which provides a stable and efficient solution by transforming the righthand of the equation into a lower triangular matrix and its conjugate transpose. 



Utilizes the QR decomposition technique to solve generalized linear models, ensuring numerical stability and accuracy by decomposing the problem into an orthonormal matrix Q and upper triangular matrix R. 



Combines LBFGS, an approximation of the BroydenFletcherGoldfarbShanno optimization algorithm, with ADMM for solving largescale generalized linear model problems efficiently. 




This setting allows the user to use sparse solver if it
is available. The default value is



Disables sparse solver. 
Related Topics
See Also:
Oracle Machine Learning for SQL Concepts for information about GLM.
52.5.8 DBMS_DATA_MINING — Algorithm Settings: kMeans
The settings listed in the following table configure the behavior of the kMeans algorithm.
The Constant Value column specifies constants using the
prefix DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.KMNS_COSINE
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'KMNS_COSINE'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5219 kMeans Settings
Setting Name  Constant Value  String Value Equivalent  Description 




Minimum Convergence Tolerance for kMeans. The
algorithm iterates until the minimum Convergence Tolerance is
satisfied or until the maximum number of iterations, specified in
Decreasing the Convergence Tolerance produces a more accurate solution but may result in longer run times. The default Convergence Tolerance is




Distance function for kMeans.
The default distance function is



Uses Euclidean distance for kMeans. 




Maximum number of iterations for kMeans. The
algorithm iterates until either the maximum number of iterations is
reached or the minimum Convergence Tolerance, specified in
The default number of iterations is 



Minimum percentage of attribute values that must be nonnull in order for the attribute to be included in the rule description for the cluster. If the data is sparse or includes many missing values, a minimum support that is too high can cause very short rules or even empty rules. The default minimum support is 



Number of bins in the attribute histogram produced by kmeans. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equiwidth. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin. The default number of histogram bins is




Split criterion for kmeans. The split criterion controls the initialization of new kMeans clusters. The algorithm builds a binary tree and adds one new cluster at a time. When the split criterion is based on size, the new cluster is placed in the area where the largest current cluster is located. The default split criterion is the



When the split criterion is based on the variance, the new cluster is placed in the area of the most spreadout cluster. 

KMNS_RANDOM_SEED 
Nonnegative integer 
Nonnegative integer 
This setting controls the seed of the random generator used during the kMeans initialization. It must be a nonnegative integer value. The default is 



This setting determines the level of cluster detail that are computed during the build.



Cluster hierarchy and cluster record counts are computed. This is the default value. 



Cluster hierarchy, record counts, descriptive statistics (means, variances, modes, histograms, and rules) are computed. 
See Also:
Oracle Machine Learning for SQL Concepts for information about kMeans
52.5.9 DBMS_DATA_MINING  Algorithm Settings: Multivariate State Estimation Technique  Sequential Probability Ratio Test
Settings that configure the training calibration behavior of the Multivariate State Estimation Technique  Sequential Probability Ratio Test algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.MSET_ADB_HEIGHT
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'MSET_ADB_HEIGHT'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5220 MSETSPRT Settings
Setting Name  Setting Value  String Value Equivalent  Description 


A positive double 
A positive double 
Estimates the band within which signal values normally oscillate. The default value is 

A positive integer 
A positive integer 
The number of the last n signals (the alert window) that should have passed the threshold to raise an alert. The alert count should be lower or equal to the alert window. The default value is 

A positive integer greater than or equal to

A positive integer greater than or equal to

The number of signals to consider in the SPRT hypothesis consolidation logic. The default value is 

A positive double between 0 and 1 
A positive double between 0 and 1 
False Alarm Probability FAP (false positive). The default is 

A positive double between 0 and 1 
A positive double between 0 and 1 
Missed Alarm Probability MAP (false negative). The default is 

A positive integer 
A positive integer 
The approximate number of data rows used for MSET model calibration. You can use The default value is 

A positive integer 
A positive integer 
The default value is data driven. 

A positive integer >0, <=10000 
A positive integer >0, <=10000 
Specifies whether to use random projections. When the number of sensors exceeds the setting value, random projections are used. To turn off random projections, set the threshold to a value that is equal to or greater than the number of sensors. The default value is 

A positive integer 
A positive integer 
The tolerance in standard deviations used in the SPRT calculation. The default value is 
52.5.10 DBMS_DATA_MINING — Algorithm Settings: Naive Bayes
The settings listed in the following table configure the behavior of the Naive Bayes algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.NABS_PAIRWISE_THRESHOLD
. Alternatively, you can
specify the corresponding string value from the String Value Equivalent
column without the DBMS_DATA_MINING
prefix, in single quotes. For
example, 'NABS_PAIRWISE_THRESHOLD'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5221 Naive Bayes Settings
Setting Name  Setting Value  String Value Equivalent  Description 




Value of pairwise threshold for NB algorithm Default is 



Value of singleton threshold for NB algorithm Default value is 
See Also:
Oracle Machine Learning for SQL Concepts for information about Naive Bayes
52.5.11 DBMS_DATA_MINING — Algorithm Settings: Neural Network
The settings listed in the following table configure the behavior of the Neural Network algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.NNET_SOLVER_ADAM
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'NNET_SOLVER_ADAM'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5222 DBMS_DATA_MINING Neural Network Settings
Setting Name  Constant Value  String Value Equivalents  Description 


One of the following strings:


Specifies the method of optimization. The default value is system determined.



Uses the Limitedmemory BroydenFletcherGoldfarbShanno (LBFGS) optimization method. 


One or more of the following strings:


Specifies the activation functions for the hidden layers. You can specify a single activation function, which is then applied to each hidden layer, or you can specify an activation function for each layer individually. Different layers can have different activation functions. To apply a different activation function to one or more
of the layers, you must specify an activation function for each
layer. The number of activation functions you specify must be
consistent with the For example, if you have three hidden layers, you could specify the use of the same activation function for all three layers with the following settings value:
The following settings value specifies a different activation function for each layer:
Note: You specify the different activation functions as strings within a single string. All quotes are single and two single quotes are used to escape a single quote in SQL statements and PL/SQL blocks.
The default value is



Uses the bipolar sigmoid activation function. 



Uses the linear activation function. 



Uses the logistic sigmoid activation function. 



Uses the rectified linear unit activation function. 



Uses the hyperbolic tangent activation function. 

NNET_HELDASIDE_MAX_FAIL 
A positive integer 
A positive integer 
With The default value is 



Define the held ratio for the heldaside method. The default value is 

A positive integer 
A positive integer 
Defines the topology by the number of hidden layers. The default value is 

A positive integer 
A positive integer 
Specifies the maximum number of iterations in the Neural Network algorithm. For the For the 

A positive integer or a list of positive integers 
A positive integer or a list of positive integers 
Defines the topology by the number of nodes per layer. Different layers can have different numbers of nodes. To specify the same number of nodes for each layer, you can provide a single value, which is then applied to each layer. To specify a different number of nodes for one or more
layers, provide a list of commaseparated positive integers, one for
each layer. For example, The default number of nodes per layer is the number of
attributes or 



Defines the L2 regularization parameter lambda. This can
not be set together with
The default value is 

One of the following strings:


Regularization setting for Neural Network algorithm.



Applies L2 regularization, which penalizes the sum of squared weights. 



Disables regularization. If the total number of training rows is less
than or equal to 50000, the default is





Defines the convergence tolerance setting of the Neural Network algorithm. The default value is 

A real number 
A real number 
The setting specifies the lower bound of the region
where weights are randomly initialized.
NNET_WEIGHT_LOWER_BOUND and
NNET_WEIGHT_UPPER_BOUND must be set together.
Setting one and not setting the other raises an error.
NNET_WEIGHT_LOWER_BOUND must not be greater
than NNET_WEIGHT_UPPER_BOUND . The default value is
–sqrt(6/(l_nodes+r_nodes)) . The value of
l_nodes for:
The value of 

A real number 
A real number 
This setting specifies the upper bound of the region
where weights are initialized. It should be set in pairs with
The default value is

Related Topics
See Also:
Oracle Machine Learning for SQL Concepts for information about Neural Network.
52.5.12 DBMS_DATA_MINING — Algorithm Settings: NonNegative Matrix Factorization
The settings listed in the following table configure the behavior of the Nonnegative Matrix Factorization algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.NMFS_NONNEG_SCORING_ENABLE
. Alternatively, you can
specify the corresponding string value from the String Value Equivalent
column without the DBMS_DATA_MINING
prefix, in single quotes. For
example, 'NMFS_NONNEG_SCORING_ENABLE'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.You can query the data dictionary view
*_MINING_MODEL_SETTINGS
(using the ALL
,
USER
, or DBA
prefix) to find the setting
values for a model. See Oracle Database Reference for information about
*_MINING_MODEL_SETTINGS
.
Table 5223 NMF Settings
Setting Name  Constant Value  String Value Equivalent  Description 




Convergence tolerance for NMF algorithm Default is 



Whether negative numbers should be allowed in scoring results. When set to Default is



When set to 




Number of iterations for NMF algorithm Default is 



Random seed for NMF algorithm. Default is 
See Also:
Oracle Machine Learning for SQL Concepts for information about NMF
52.5.13 DBMS_DATA_MINING — Algorithm Settings: OCluster
The settings in the table configure the behavior of the OCluster algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.OCLT_SENSITIVITY
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'OCLT_SENSITIVITY'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5224 OCLuster Settings
Setting Name  Constant Value  String Value Equivalent  Description 




A fraction that specifies the peak density required for separating a new cluster. The fraction is related to the global uniform density. Default is 
See Also:
Oracle Machine Learning for SQL Concepts for information about OCluster
52.5.14 DBMS_DATA_MINING — Algorithm Settings: Random Forest
These settings configure the behavior of the Random Forest algorithm. Random Forest makes use of the Decision Tree settings to configure the construction of individual trees.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.RFOR_MTRY
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'RFOR_MTRY'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5225 Random Forest Settings
Setting Name  Constant Value  String Value Equivalent  Description 




Size of the random subset of columns to be considered
when choosing a split at a node. For each node, the size of the pool
remains the same, but the specific candidate columns change. The
default is half of the columns in the model signature. The special
value 



Number of trees in the forest Default is 



Fraction of the training data to be randomly sampled for use in the construction of an individual tree. The default is half of the number of rows in the training data. 
Related Topics
See Also:
Oracle Machine Learning for SQL Concepts for information about Random Forest
52.5.15 DBMS_DATA_MINING — Algorithm Constants and Settings: Singular Value Decomposition
The following constant affects the behavior of the Singular Value Decomposition algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.SVDS_MAX_NUM_FEATURES
. Alternatively, you can specify
the corresponding string value from the String Value Equivalent column
without the DBMS_DATA_MINING
prefix, in single quotes. For example,
'SVDS_MAX_NUM_FEATURES'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5226 Singular Value Decomposition Constant
Constant Name  Constant Value  Description 


2500 
The maximum number of features supported by SVD. 
The following settings configure the behavior of the Singular Value Decomposition algorithm.
Table 5227 Singular Value Decomposition Settings
Setting Name  Constant Value  String Value Equivalent  Description 




Specifies whether to persist the U Matrix produced by SVD.
When Default is 


Does not persist the U Matrix. 




Whether to use SVD or PCA scoring for the model. When the build data is scored with SVD, the projections will be the same as the U matrix. Default is 


When the build data is scored with PCA, the projections will be the product of the U and S matrices. 




This setting indicates the solver to be used for computing SVD of the data. In the case of PCA, the solver setting indicates the type of SVD solver used to compute the PCA for the data. When this setting is not specified the solver type selection is data driven. If the number of attributes is greater than 3240, then the default wide solver is used. Otherwise, the default narrow solver is selected. The following are the group of solvers:
For narrow data solvers:
For wide data solvers:



TallSkinny SVD using eigenvalue computation for matrices with up to 11500 attributes. This is the default solver for narrow data. 



Stochastic SVD using QR computation for matrices with up to 1 million attributes. This is the default solver for wide data. 



Stochastic SVD using eigenvalue computations for matrices with up to 1 million attributes.  

Range [ 
Range [ 
This setting is used to prune features. Define the minimum value the eigenvalue of a feature as a share of the first eigenvalue to not to prune. Default value is data driven. 

Range [ 
Range [ 
The random seed value is used for initializing the
sampling matrix used by the Stochastic SVD solver. The default is


Range [ 
Range [ 
This setting is configures the number of columns in the
sampling matrix used by the Stochastic SVD solver. The number of
columns in this matrix is equal to the requested number of features
plus the oversampling setting. The SVD Solver must be set to


Range [ 
Range [ 
The power iteration setting improves the accuracy of the
SSVD solver. The default is 
See Also:
52.5.16 DBMS_DATA_MINING — Algorithm Settings: Support Vector Machine
The settings listed in the following table configure the behavior of the Support Vector Machine algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.SVMS_GAUSSIAN
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'SVMS_GAUSSIAN'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5228 SVM Settings
Setting Name  Constant Value  String Value Equivalent  Description 




Regularization setting that balances the complexity of the model against model robustness to achieve good generalization on new data. SVM uses a datadriven approach to finding the complexity factor. Value of complexity factor for SVM algorithm (both classification and regression). Default value estimated from the data by the algorithm. 



Convergence tolerance for SVM algorithm. Default is 



Regularization setting for regression, similar to complexity factor. Epsilon specifies the allowable residuals, or noise, in the data. Value of epsilon factor for SVM regression. Default is 



Kernel for Support Vector Machine. Linear or Gaussian.
SVMS_LINEAR .



Uses the Linear kernel for SVM. This is the default option. 




The desired rate of outliers in the training data. Valid for OneClass SVM models only (anomaly detection). Default is 



Controls the spread of the Gaussian kernel function. SVM uses a datadriven approach to find a standard deviation value that is on the same scale as distances between typical cases. Value of standard deviation for SVM algorithm. This is applicable only for Gaussian kernel. Default value estimated from the data by the algorithm. 

Positive integer 
Positive integer 
This setting sets an upper limit on the number of SVM iterations. The default is system determined because it depends on the SVM solver. 

Range [ 
Range [ 
This setting sets an upper limit on the number of pivots
used in the Incomplete Cholesky decomposition. It can be set only
for nonlinear kernels. The default value is 

Positive integer 
Positive integer 
This setting applies to SVM models with linear kernel.
This setting sets the size of the batch for the SGD solver. An input
of 0 triggers a data driven batch size estimate. The default is




This setting controls the type of regularization that the SGD SVM solver uses. The setting can be used only for linear SVM models. The default is system determined because it depends on the potential model size.



Uses L2 regularization. 




Enables to choose the SVM solver. The SGD solver cannot be selected if the kernel is nonlinear. The default value is system determined.



Uses Interior Point Method solver. 
See Also:
Oracle Machine Learning for SQL Concepts for information about SVM
52.5.17 DBMS_DATA_MINING — Algorithm Settings: XGBoost
Settings that configure the behavior of the XGBoost gradient boosting algorithm.
The Constant Value column specifies constants using the prefix
DBMS_DATA_MINING
. For example,
DBMS_DATA_MINING.xgboost_booster
. Alternatively, you can specify the
corresponding string value from the String Value Equivalent column without
the DBMS_DATA_MINING
prefix, in single quotes. For example,
'booster'
.
Note:
The distinction between Constant Value and String Value Equivalent for this algorithm is applicable to Oracle Database 19c and Oracle Database 21c.Table 5229 General Settings
Constant Name  String Value Equivalent  Setting Value  Description 



A string that is one of the following:

The booster to use:
The The default value is 


A nonnegative integer. 
The number of rounds for boosting. The default value is 
Table 5230 Settings for Tree Boosting
Constant Name  String Value Equivalent  Setting Value  Description 


alpha 
A nonnegative number 
L1 regularization term on weights. Increasing this value makes the model more conservative. The default value is 


A number in the range (0, 1] 
Subsample ratio of columns for each split, in each
level. Subsampling occurs each time a new split is made. This
parameter has no effect when The default value is 


A number in the range (0, 1] 
The subsample ratio of columns for each node (split). Subsampling occurs once every time a new split is evaluated. Columns are subsampled from the set of columns chosen for the current level. The default value is 


A number in the range (0, 1] 
Subsample ratio of columns when constructing each tree. Subsampling occurs once in every boosting iteration. The default value is 


A number in the range [0, 1] 
Stepsize shrinkage used in the update step to
prevent overfitting. After each boosting step,
The default value is 


A number in the range [0, ∞] 
Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger gamma value is, the more conservative the algorithm is. The default value is 

grow_policy 
A string; one of the following:

Controls the way new nodes are added to the tree:
Valid only if The default value is 


A nonnegative number 
L2 regularization term on weights. The default value is 


A nonnegative integer 
Maximum number of discrete bins to bucket continuous features. Increasing this number improves the optimality of splits at the cost of higher computation time. This parameter is valid only when
The default value is 


A number in the range [0, ∞] 
Maximum delta step allowed for each leaf output. Setting this to a positive value can help make the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when the class is extremely imbalanced. Setting it to value from 1 to 10 might help control the update. The default value is 


An integer in the range [0, ∞] 
Maximum depth of a tree. Increasing this value makes the model more complex and more likely to overfit. Setting this to 0 indicates no limit. Note: You must set amax_depth limit when the
grow_policy setting is
depthwise .
The default value is 


A nonnegative number 
Maximum number of nodes to add. Use this setting only when
The default value is 


A number in the range [0, ∞] 
Minimum sum of instance weight (hessian) needed in a
child. If the tree partition step results in a leaf node with a
sum of instance weight less than
The default value is 


A nonnegative integer 
Number of parallel trees constructed during each iteration. Use this option to support a boosted random forest. The default value is 


A nonnegative number 
Controls the balance of positive and negative
weights, which is useful for unbalanced classes. A typical value
to consider: The default value is 


A number in the range (0, 1] 
Increases enumeration accuracy. Valid only for the approximate greedy tree method. Compared to directly selecting the number of bins, this setting comes with a theoretical guarantee with sketch accuracy. You usually do not need to change this setting, but you might consider setting a lower number for more accurate enumeration. The default value is 


A number in the range (0, 1] 
Subsample ratio of the training instances. A setting of 0.5 means that XGBoost randomly samples half of the training data prior to growing trees, which prevents overfitting. Subsampling occurs once in every boosting iteration. The default value is 


A string that is one of the following:

Tree construction algorithm used in XGBoost:
The default value is 


A commaseparated string; one or more of the following:

Defines the sequence of tree updaters to run, which provides a modular way to construct and to modify the trees. This is an advanced parameter that is usually set automatically, depending on some other parameters. However, you can also explicitly specify a settting. The setting values are:

Table 5231 Settings for the Dart Booster
Constant Name  String Value Equivalent  Setting Value  Description 



A number that is 0 or 1 
When set to 1, at least one tree is always dropped during the dropout. When set to 0, at least one tree is not always dropped during the dropout. The default value is 


A string; either:

Type of normalization algorithm:
The default value is 


A number in the range [0.0, 1.0] 
Dropout rate (a fraction of the previous trees to drop during the dropout). The default value is 


A string; either:

Type of sampling algorithm:
The default value is 


A number in the range [0.0, 1.0] 
Probability of skipping the dropout procedure during
a boosting iteration. If a dropout is skipped, new trees are
added in the same manner as A nonzero The default value is 
Table 5232 Settings for the Linear Booster
Constant Name  String Value Equivalent  Setting Value  Description 



A nonnegative number 
L1 regularization term on weights, normalized to the number of training examples. Increasing this value makes the model more conservative. The default value is 


A string that is one of the following:

Feature selection and ordering method:
The default value is 


A nonnegative number 
L2 regularization term on weights, normalized to the number of training examples. Increasing this value makes the model more conservative. The default value is 


A nonnegative integer 
Number of top features to select for the
The default value is 


A string that is one of the following:

Algorithm to fit the linear model:
The default value is 
Table 5233 Settings for Tweedie Regression
Constant Name  String Value Equivalent  Setting Value  Description 



A number in the range (1, 2) 
Controls the variance of the Tweedie distribution
A setting closer to 1 shifts towards a Poisson distribution. A setting closer to 2 shifts towards a gamma distribution. The default value is 
Some XGBoost objectives apply only to classification function models and other objectives apply only to regression function models. If you specify an incompatible objective
value, an error is raised. In the DBMS_DATA_MINING.CREATE_MODEL
procedure, if you specify DBMS_DATA_MINING.CLASSIFICATION
as the function, then the only objective values that you can use are the binary
and multi
values. The one exception is binary: logitraw
, which produces a continuous value and applies only to a regression model. If you specify DBMS_DATA_MINING.REGRESSION
as the function, then you can specify binary: logitraw
or any of the count
, rank
, reg
, and survival
values as the objective.
Table 5234 Settings for Learning Tasks
Constant Name  String Value Equivalent  Setting Value  Description 



For a classification model, a string that is one of the following:
For a regression model, a string that is one of the following:

Settings for a Classification model:
The default Settings for a Regression model:
The default 


A number 
Initial prediction score of all instances, global bias. For a sufficient number of iterations, changing this value does not have much effect. The default value is 


A commaseparated string; one or more of the following:

Evaluation metrics for validation data. You can specify one or more of these evaluation metrics:
A default metric is assigned according to the objective:



A nonnegative integer 
Random number seed. The default value is 
See Also:
https://github.com/oracle/oracledbexamples/tree/master/machinelearning/sql/20c GitHub repository for an example of XGBoost.52.6 DBMS_DATA_MINING — Solver Settings
Oracle Machine Learning for SQL algorithms can use different solvers. Solver settings can be provided at build time in the settings table.
52.6.1 DBMS_DATA_MINING  Solver Settings: Adam
These settings configure the behavior of the Adaptive Moment Estimation (Adam) solver.
Neural Network models use these settings.
Table 5235 DBMS_DATA_MINING Adam Settings
Setting Name  Setting Value  Description 


A nonnegative double precision floating point number in the interval (0; 1] 
The learning rate for Adam. The default value is 

A positive integer 
The number of rows per batch. The default value is 

A positive double precision floating point number in the interval [0; 1) 
The exponential decay rate for the 1st moment estimates. The default value is 

A positive double precision floating point number in the interval [0; 1) 
The exponential decay rate for the 2nd moment estimates. The default value is 

A positive double precision floating point number 
The gradient infinity norm tolerance for Adam. The default value is 
52.6.2 DBMS_DATA_MINING — Solver Settings: ADMM
The settings listed in the following table configure the behavior of Alternating Direction Method of Multipliers (ADMM). The Generalized Linear Model (GLM) algorithm uses these settings.
Table 5236 DBMS_DATA_MINING ADMM Settings
Related Topics
See Also:
Oracle Machine Learning for SQL Concepts for information about neural network
52.6.3 DBMS_DATA_MINING — Solver Settings: LBFGS
The settings listed in the following table configure the behavior of LBFGS. Neural Network and Generalized Linear Model (GLM) use these settings.
Table 5237 DBMS_DATA_MINING LBFGS Settings
Setting Name  Setting Value  Description 



Defines gradient infinity norm tolerance for LBFGS. Default value is 

The value must be a positive integer. 
Defines the number of historical copies kept in LBFGS solver. The default value is 


Defines whether to scale Hessian in LBFGS or not. Default value is 
See Also:
Oracle Machine Learning for SQL Concepts for information about neural network
52.7 DBMS_DATA_MINING Datatypes
The DBMS_DATA_MINING
package defines object data types for processing transactional data. The package also defines a type for userspecified transformations. These types are called DM_NESTED_
n
, where n
identifies the Oracle data type of the nested attributes.
The Oracle Machine Learning for SQL object data types are described in the following table:
Table 5238 DBMS_DATA_MINING Summary of Data Types
Datatype  Description 


The name and value of a numerical attribute of type 

A collection of 

The name and value of a numerical attribute of type 

A collection of 

The name and value of a categorical attribute of type 

A collection of 

The name and value of a numerical attribute of type 

A collection of 

A table of 

A list of userspecified transformations for a model. Accepted as a parameter by the CREATE_MODEL Procedure. This collection type is defined in the DBMS_DATA_MINING_TRANSFORM package. 
For more information about processing nested data, see Oracle Machine Learning for SQL User’s Guide.
Note:
Starting from Oracle Database 12c Release 2,*GET_MODEL_DETAILS
are deprecated and are replaced with Model Detail Views. See Oracle Machine Learning
for SQL User’s Guide.
52.7.1 Deprecated Types
This topic contains tables listing deprecated types.
The DBMS_DATA_MINING
package defines object datatypes for storing information about model attributes. Most of these types are returned by the table functions GET
_n
, where n
identifies the type of information to return. These functions take a model name as input and return the requested information as a collection of rows.
For a list of the GET
functions, see "Summary of DBMS_DATA_MINING Subprograms".
All the table functions use pipelining, which causes each row of output to be materialized as it is read from model storage, without waiting for the generation of the complete table object. For more information on pipelined, parallel table functions, consult the Oracle Database PL/SQL Language Reference.
Table 5239 DBMS_DATA_MINING Summary of Deprecated Datatypes
Datatype  Description 


The centroid of a cluster. 

A collection of 

A child node of a cluster. 

A collection of 

A cluster. A cluster includes See also, Table 5241. 

A collection of See also, Table 5241. 

The conditional probability of an attribute in a Naive Bayes model. 

A collection of 

The actual and predicted values in a cost matrix. 

A collection of 

A component of an Expectation Maximization model. 

A collection of 

A projection of an Expectation Maximization model. 

A collection of 

The coefficient and associated statistics of an attribute in a Generalized Linear Model. 

A collection of 

A histogram associated with a cluster. 

A collection of See also, Table 5241. 

An item in an association rule. 

A collection of 

A collection of 

A collection of 

Highlevel statistics about a model. 

A collection of 

Information about an attribute in a Naive Bayes model. 

A collection of 

An attribute in a feature of a NonNegative Matrix Factorization model. 

A collection of 

A feature in a NonNegative Matrix Factorization model. 

A collection of 

Antecedent and consequent in a rule. 

A collection of See also, Table 5241. 

An attribute ranked by its importance in an Attribute Importance model. 

A collection of 

A rule that defines a conditional relationship. The rule can be one of the association rules returned by GET_ASSOCIATION_RULES Function, or it can be a rule associated with a cluster in the collection of clusters returned by GET_MODEL_DETAILS_KM Function and GET_MODEL_DETAILS_OC Function. See also, Table 5241. 

A collection of See also, Table 5241. 

A factorized matrix S, V, or U returned by a Singular Value Decomposition model. 

A collection of 

The name, value, and coefficient of an attribute in a Support Vector Machine model. 

A collection of 

The linear coefficient of each attribute in a Support Vector Machine model. 

A collection of 

The transformation and reverse transformation expressions for an attribute. 

A collection of 
Return Values for Clustering Algorithms
The table contains description of DM_CLUSTER
return value columns, nested table columns, and rows.
Table 5240 DM_CLUSTER Return Values for Clustering Algorithms
Return Value  Description 


A set of rows of type (id NUMBER, cluster_id VARCHAR2(4000), record_count NUMBER, parent NUMBER, tree_level NUMBER, dispersion NUMBER, split_predicate DM_PREDICATES, child DM_CHILDREN, centroid DM_CENTROIDS, histogram DM_HISTOGRAMS, rule DM_RULE) 
DM_PREDICATE 
The (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), conditional_operator CHAR(2)/*=,<>,<,>,<=,>=*/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER) 
DM_CLUSTER Fields
The following table describes DM_CLUSTER
fields.
Table 5241 DM_CLUSTER Fields
Column Name  Description 


Cluster identifier 

The ID of a cluster in the model 

Specifies the number of records 

Parent ID 

Specifies the number of splits from the root 

A measure used to quantify whether a set of observed occurrences are dispersed compared to a standard statistical model. 

The (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), conditional_operator CHAR(2) /*=,<>,<,>,<=,>=*/, attribute_num_value NUMBER, attribute_str_value VARCHAR2(4000), attribute_support NUMBER, attribute_confidence NUMBER) Note: The Expectation Maximization algorithm uses all the fields except 

The 

The (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), mean NUMBER, mode_value VARCHAR2(4000), variance NUMBER) 

The (attribute_name VARCHAR2(4000), attribute_subname VARCHAR2(4000), bin_id NUMBER, lower_bound NUMBER, upper_bound NUMBER, label VARCHAR2(4000), count NUMBER) 

The (rule_id INTEGER, antecedent DM_PREDICATES, consequent DM_PREDICATES, rule_support NUMBER, rule_confidence NUMBER, rule_lift NUMBER, antecedent_support NUMBER, consequent_support NUMBER, number_of_items INTEGER) 
Usage Notes

The table function pipes out rows of type
DM_CLUSTER
. For information on Oracle Machine Learning for SQL data types and piped output from table functions, see "Data Types". 
For descriptions of predicates (
DM_PREDICATE
) and rules (DM_RULE
), see GET_ASSOCIATION_RULES Function.
52.8 Summary of DBMS_DATA_MINING Subprograms
This table summarizes the subprograms included in the DBMS_DATA_MINING
package.
The GET_*
interfaces are replaced by model views. Oracle recommends that users leverage model detail views instead. For more information, refer to Model Detail Views in Oracle Machine Learning
for SQL User’s Guide and Static Data Dictionary Views: ALL_ALL_TABLES to ALL_OUTLINES in Oracle Database
Reference.
Table 5242 DBMS_DATA_MINING Package Subprograms
Subprogram  Purpose 

Adds a cost matrix to a classification model 

ADD_PARTITION Procedure 
Adds single or multiple partitions in an existing partition model 
Changes the reverse transformation expression to an expression that you specify 

Applies a model to a data set (scores the data) 

Computes the confusion matrix for a classification model 

COMPUTE_CONFUSION_MATRIX_PART Procedure 
Computes the evaluation matrix for partitioned models 
Computes lift for a classification model 

COMPUTE_LIFT_PART Procedure 
Computers lift for partitioned models 
Computes Receiver Operating Characteristic (ROC) for a classification model 

COMPUTE_ROC_PART Procedure 
Computes Receiver Operating Characteristic (ROC) for a partitioned model 
Creates a model 

CREATE_MODEL2 Procedure 
Creates a model without extra persistent stages 
Create Model Using Registration Information 
Fetches setting information from JSON object 
DROP_ALGORITHM Procedure 
Drops the registered algorithm information. 
DROP_PARTITION Procedure 
Drops a single partition 
Drops a model 

Exports a model to a dump file 

Exports a model in a serialized format 

Fetches and reads JSON schema from 

Returns the cost matrix for a model 

Imports a model into a user schema 

Imports a serialized model back into the database 

Displays flexibility in creating JSON schema for R Extensible 

Registers a new algorithm 

Ranks the predictions from the 

Removes a cost matrix from a model 

Renames a model 
Deprecated GET_MODEL_DETAILS
Starting from Oracle Database 12c Release 2, the following GET_MODEL_DETAILS
are deprecated:
Table 5243 Deprecated GET_MODEL_DETAILS
Functions
Subprogram  Purpose 

Returns the rules from an association model 

Returns the frequent itemsets for an association model 

Returns details about an attribute importance model 

Returns details about an Expectation Maximization model 

Returns details about the parameters of an Expectation Maximization model 

Returns details about the projects of an Expectation Maximization model 

Returns details about a Generalized Linear Model model 

Returns highlevel statistics about a model 

Returns details about a kMeans model 

Returns details about a Naive Bayes model 

Returns details about a NonNegative Matrix Factorization model 

Returns details about an OCluster model 

Returns the settings used to build the given model This function is replaced with 

Returns the list of columns from the build input table This function is replaced with 

Returns details about a Singular Value Decomposition model 

Returns details about a Support Vector Machine model with a linear kernel 

Returns the transformations embedded in a model This function is replaced with 

Returns details about a Decision Tree model 

Converts between two different transformation specification formats 
52.8.1 ADD_COST_MATRIX Procedure
The ADD_COST_MATRIX
procedure associates a cost matrix table with a classification model. The cost matrix biases the model by assigning costs or benefits to specific model outcomes.
The cost matrix is stored with the model and taken into account when the model is scored.
You can also specify a cost matrix inline when you invoke an Oracle Machine Learning for SQL function for scoring. To view the scoring matrix for a model, query the DM$VC
prefixed model view. Refer to Model Detail View for Classification Algorithm.
To obtain the default scoring matrix for a model, query the DM$VC
prefixed model view. To remove the default scoring matrix from a model, use the REMOVE_COST_MATRIX
procedure. See REMOVE_COST_MATRIX Procedure.
See Also:

"Biasing a Classification Model" in Oracle Machine Learning for SQL Concepts for more information about costs

Oracle Database SQL Language Reference for syntax of inline cost matrix

Specifying Costs in Oracle Machine Learning for SQL User’s Guide
Syntax
DBMS_DATA_MINING.ADD_COST_MATRIX ( model_name IN VARCHAR2, cost_matrix_table_name IN VARCHAR2, cost_matrix_schema_name IN VARCHAR2 DEFAULT NULL); partition_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 5244 ADD_COST_MATRIX Procedure Parameters
Parameter  Description 


Name of the model in the form [schema_name.]model_name. If you do not specify a schema, then your own schema is assumed. 

Name of the cost matrix table (described in Table 5245). 

Schema of the cost matrix table. If no schema is specified, then the current schema is used. 

Name of the partition in a partitioned model 
Usage Notes

If the model is not in your schema, then
ADD_COST_MATRIX
requires theALTER ANY MINING MODEL
system privilege or theALTER
object privilege for the machine learning model. 
The cost matrix table must have the columns shown in Table 5245.
Table 5245 Required Columns in a Cost Matrix Table
Column Name Data Type ACTUAL_TARGET_VALUE
Valid target data type
PREDICTED_TARGET_VALUE
Valid target data type
COST
NUMBER,
FLOAT
,BINARY_DOUBLE
, orBINARY_FLOAT
See Also:
Oracle Machine Learning for SQL User’s Guide for valid target data types

The types of the actual and predicted target values must be the same as the type of the model target. For example, if the target of the model is
BINARY_DOUBLE
, then the actual and predicted values must beBINARY_DOUBLE
. If the actual and predicted values areCHAR
orVARCHAR
, thenADD_COST_MATRIX
treats them asVARCHAR2
internally.If the types do not match, or if the actual or predicted value is not a valid target value, then the
ADD_COST_MATRIX
procedure raises an error.Note:
If a reverse transformation is associated with the target, then the actual and predicted values must be consistent with the target after the reverse transformation has been applied.
See “Reverse Transformations and Model Transparency” under the “About Transformation Lists” section in DBMS_DATA_MINING_TRANSFORM Operational Notes for more information.

Since a benefit can be viewed as a negative cost, you can specify a benefit for a given outcome by providing a negative number in the
costs
column of the cost matrix table. 
All classification algorithms can use a cost matrix for scoring. The Decision Tree algorithm can also use a cost matrix at build time. If you want to build a Decision Tree model with a cost matrix, specify the cost matrix table name in the
CLAS_COST_TABLE_NAME
setting in the settings table for the model. See Table 527.The cost matrix used to create a Decision Tree model becomes the default scoring matrix for the model. If you want to specify different costs for scoring, use the
REMOVE_COST_MATRIX
procedure to remove the cost matrix and theADD_COST_MATRIX
procedure to add a new one. 
Scoring on a partitioned model is partitionspecific. Scoring cost matrices can be added to or removed from an individual partition in a partitioned model. If
PARTITION_NAME
is NOT NULL, then the model must be a partitioned model. TheCOST_MATRIX
is added to that partition of the partitioned model.If the
PARTITION_NAME
is NULL, but the model is a partitioned model, then theCOST_MATRIX
table is added to every partition in the model.
Example
This example creates a cost matrix table called COSTS_NB
and adds it to a Naive Bayes model called NB_SH_CLAS_SAMPLE
. The model has a binary target: 1 means that the customer responds to a promotion; 0 means that the customer does not respond. The cost matrix assigns a cost of .25 to misclassifications of customers who do not respond and a cost of .75 to misclassifications of customers who do respond. This means that it is three times more costly to misclassify responders than it is to misclassify nonresponders.
CREATE TABLE costs_nb ( actual_target_value NUMBER, predicted_target_value NUMBER, cost NUMBER); INSERT INTO costs_nb values (0, 0, 0); INSERT INTO costs_nb values (0, 1, .25); INSERT INTO costs_nb values (1, 0, .75); INSERT INTO costs_nb values (1, 1, 0); COMMIT; EXEC dbms_data_mining.add_cost_matrix('nb_sh_clas_sample', 'costs_nb'); SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(nb_sh_clas_sample COST MODEL USING cust_marital_status, education, household_size) = 1 GROUP BY cust_gender ORDER BY cust_gender; C CNT AVG_AGE    F 72 39 M 555 44
52.8.2 ADD_PARTITION Procedure
ADD_PARTITION
procedure supports a single or multiple partition addition to an existing partitioned model.
The ADD_PARTITION
procedure derives build settings and userdefined expressions from the existing model. The target column must exist in the input data query when adding partitions to a supervised model.
Syntax
DBMS_DATA_MINING.ADD_PARTITION (
model_name IN VARCHAR2,
data_query IN CLOB,
add_options IN VARCHAR2 DEFAULT ERROR);
Parameters
Table 5246 ADD_PARTITION Procedure Parameters
Parameter  Description 

model_name 
Name of the model in the form [schema_name.]model_name. If you do not specify a schema, then your own schema is used. 
data_query 
An arbitrary SQL statement that provides data to the model build. The user must have privilege to evaluate this query. 
add_options 
Allows users to control the conditional behavior of ADD for cases where rows in the input dataset conflict with existing partitions in the model. The following are the possible values:
Note: For better performance, Oracle recommends usingDROP_PARTITION followed by the ADD_PARTITION instead of using the REPLACE option.

52.8.3 ALTER_REVERSE_EXPRESSION Procedure
This procedure replaces a reverse transformation expression with an expression that you specify. If the attribute does not have a reverse expression, the procedure creates one from the specified expression.
You can also use this procedure to customize the output of clustering, feature extraction, and anomaly detection models.
Syntax
DBMS_DATA_MINING.ALTER_REVERSE_EXPRESSION ( model_name VARCHAR2, expression CLOB, attribute_name VARCHAR2 DEFAULT NULL, attribute_subname VARCHAR2 DEFAULT NULL);
Parameters
Table 5247 ALTER_REVERSE_EXPRESSION Procedure Parameters
Parameter  Description 


Name of the model in the form [schema_name.]model_name. If you do not specify a schema, your own schema is used. 

An expression to replace the reverse transformation associated with the attribute. 

Name of the attribute. Specify 

Name of the nested attribute if 
Usage Notes

For purposes of model transparency, Oracle Machine Learning for SQL provides reverse transformations for transformations that are embedded in a model. Reverse transformations are applied to the attributes returned in model detail views and to the scored target of predictive models.
See Also:
 “About Transformation Lists” under DBMS_DATA_MINING_TRANSFORM Operational Notes
 Model Detail Views in Oracle Machine Learning for SQL User’s Guide

If you alter the reverse transformation for the target of a model that has a cost matrix, you must specify a transformation expression that has the same type as the actual and predicted values in the cost matrix. Also, the reverse transformation that you specify must result in values that are present in the cost matrix.
See Also:
"ADD_COST_MATRIX Procedure" and Oracle Machine Learning for SQL Concepts for information about cost matrixes.

To prevent reverse transformation of an attribute, you can specify
NULL
forexpression
. 
The reverse transformation expression can contain a reference to a PL/SQL function that returns a valid Oracle data type. For example, you could define a function like the following for a categorical attribute named
blood_pressure
that has values 'Low', 'Medium' and 'High'.CREATE OR REPLACE FUNCTION numx(c char) RETURN NUMBER IS BEGIN CASE c WHEN ''Low'' THEN RETURN 1; WHEN ''Medium'' THEN RETURN 2; WHEN ''High'' THEN RETURN 3; ELSE RETURN null; END CASE; END numx;
Then you could invoke
ALTER_REVERSE_EXPRESION
forblood_pressure
as follows.EXEC dbms_data_mining.alter_reverse_expression( '<model_name>', 'NUMX(blood_pressure)', 'blood_pressure');

You can use
ALTER_REVERSE_EXPRESSION
to label clusters produced by clustering models and features produced by feature extraction.You can use
ALTER_REVERSE_EXPRESSION
to replace the zeros and ones returned by anomalydetection models. By default, anomalydetection models label anomalous records with 0 and all other records with 1.See Also:
Oracle Machine Learning for SQL Concepts for information about anomaly detection
Examples

In this example, the target (
affinity_card
) of the modelCLASS_MODEL
is manipulated internally asyes
orno
instead of1
or0
but returned as1
s and0
s when scored. TheALTER_REVERSE_EXPRESSION
procedure causes the target values to be returned asTRUE
orFALSE
.DECLARE v_xlst dbms_data_mining_transform.TRANSFORM_LIST; BEGIN dbms_data_mining_transform.SET_TRANSFORM(v_xlst, 'affinity_card', NULL, 'decode(affinity_card, 1, ''yes'', ''no'')', 'decode(affinity_card, ''yes'', 1, 0)'); dbms_data_mining.CREATE_MODEL( model_name => 'CLASS_MODEL', mining_function => dbms_data_mining.classification, data_table_name => 'mining_data_build', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', settings_table_name => NULL, data_schema_name => 'oml_user', settings_schema_name => NULL, xform_list => v_xlst ); END; / SELECT cust_income_level, occupation, PREDICTION(CLASS_MODEL USING *) predict_response FROM mining_data_test WHERE age = 60 AND cust_gender IN 'M' ORDER BY cust_income_level; CUST_INCOME_LEVEL OCCUPATION PREDICT_RESPONSE    A: Below 30,000 Transp. 1 E: 90,000  109,999 Transp. 1 E: 90,000  109,999 Sales 1 G: 130,000  149,999 Handler 0 G: 130,000  149,999 Crafts 0 H: 150,000  169,999 Prof. 1 J: 190,000  249,999 Prof. 1 J: 190,000  249,999 Sales 1 BEGIN dbms_data_mining.ALTER_REVERSE_EXPRESSION ( model_name => 'CLASS_MODEL', expression => 'decode(affinity_card, ''yes'', ''TRUE'', ''FALSE'')', attribute_name => 'affinity_card'); END; / column predict_response on column predict_response format a20 SELECT cust_income_level, occupation, PREDICTION(CLASS_MODEL USING *) predict_response FROM mining_data_test WHERE age = 60 AND cust_gender IN 'M' ORDER BY cust_income_level; CUST_INCOME_LEVEL OCCUPATION PREDICT_RESPONSE    A: Below 30,000 Transp. TRUE E: 90,000  109,999 Transp. TRUE E: 90,000  109,999 Sales TRUE G: 130,000  149,999 Handler FALSE G: 130,000  149,999 Crafts FALSE H: 150,000  169,999 Prof. TRUE J: 190,000  249,999 Prof. TRUE J: 190,000  249,999 Sales TRUE

This example specifies labels for the clusters that result from the
sh_clus
model. The labels consist of the word "Cluster" and the internal numeric identifier for the cluster.BEGIN dbms_data_mining.ALTER_REVERSE_EXPRESSION( 'sh_clus', '''Cluster ''value'); END; / SELECT cust_id, cluster_id(sh_clus using *) cluster_id FROM sh_aprep_num WHERE cust_id < 100011 ORDER by cust_id; CUST_ID CLUSTER_ID   100001 Cluster 18 100002 Cluster 14 100003 Cluster 14 100004 Cluster 18 100005 Cluster 19 100006 Cluster 7 100007 Cluster 18 100008 Cluster 14 100009 Cluster 8 100010 Cluster 8
52.8.4 APPLY Procedure
The APPLY
procedure applies a machine learning model to the data of interest, and generates the results in a table. The APPLY
procedure is also referred to as scoring.
For predictive machine learning functions, the APPLY
procedure generates predictions in a target column. For descriptive machine learning functions such as Clustering, the APPLY
process assigns each case to a cluster with a probability.
In Oracle Machine Learning for SQL, the APPLY
procedure is not applicable to Association models and Attribute Importance models.
Note:
Scoring can also be performed directly in SQL using the Oracle Machine Learning for SQL functions. See

Oracle Machine Learning for SQL Functions in Oracle Database SQL Language Reference
 Scoring and Deployment in Oracle Machine Learning for SQL User’s Guide
Syntax
DBMS_DATA_MINING.APPLY ( model_name IN VARCHAR2, data_table_name IN VARCHAR2, case_id_column_name IN VARCHAR2, result_table_name IN VARCHAR2, data_schema_name IN VARCHAR2 DEFAULT NULL);
Parameters
Table 5248 APPLY Procedure Parameters
Parameter  Description 


Name of the model in the form [schema_name.]model_name. If you do not specify a schema, then your own schema is used. 

Name of table or view containing the data to be scored 

Name of the case identifier column 

Name of the table in which to store apply results 

Name of the schema containing the data to be scored 
Usage Notes

The data provided for
APPLY
must undergo the same preprocessing as the data used to create and test the model. When you use Automatic Data Preparation, the preprocessing required by the algorithm is handled for you by the model: both at build time and apply time. (See "Automatic Data Preparation".) 
APPLY
creates a table in the user's schema to hold the results. The columns are algorithmspecific.The columns in the results table are listed in Table 5249 through Table 5253. The case ID column name in the results table will match the case ID column name provided by you. The type of the incoming case ID column is also preserved in
APPLY
output.Note:
Make sure that the case ID column does not have the same name as one of the columns that will be created by
APPLY
. For example, when applying a Classification model, the case ID in the scoring data must not bePREDICTION
orPROBABILITY
(See Table 5249). 
The data type for the
PREDICTION
,CLUSTER_ID
, andFEATURE_ID
output columns is influenced by any reverse expression that is embedded in the model by the user. If the user does not provide a reverse expression that alters the scored value type, then the types will conform to the descriptions in the following tables. See "ALTER_REVERSE_EXPRESSION Procedure". 
If the model is partitioned, the
result_table_name
can contain results from different partitions depending on the data from the input data table. An additional column calledPARTITION_NAME
is added to the result table indicating the partition name that is associated with each row.For a nonpartitioned model, the behavior does not change.
Classification
The results table for Classification has the columns described in Table 5249. If the target of the model is categorical, the PREDICTION
column will have a VARCHAR2
data type. If the target has a binary type, the PREDICTION
column will have the binary type of the target.
Table 5249 APPLY Results Table for Classification
Column Name  Data type 


Type of the case ID 

Type of the target 


Anomaly Detection
The results table for Anomaly Detection has the columns described in Table 5250.
Table 5250 APPLY Results Table for Anomaly Detection
Column Name  Data Type 


Type of the case ID 




Regression
The results table for Regression has the columns described in APPLY Procedure.
Table 5251 APPLY Results Table for Regression
Column Name  Data Type 


Type of the case ID 

Type of the target 
Clustering
Clustering is an unsupervised machine learning function, and hence there are no targets. The results of an APPLY
procedure contain simply the cluster identifier corresponding to a case, and the associated probability. The results table has the columns described in Table 5252.
Table 5252 APPLY Results Table for Clustering
Column Name  Data Type 


Type of the case ID 




Feature Extraction
Feature Extraction is also an unsupervised machine learning function, hence there are no targets. The results of an APPLY
procedure will contain simply the feature identifier corresponding to a case, and the associated match quality. The results table has the columns described in Table 5253.
Table 5253 APPLY Results Table for Feature Extraction
Column Name  Data Type 


Type of the case ID 




Examples
This example applies the GLM Regression model GLMR_SH_REGR_SAMPLE
to the data in the MINING_DATA_APPLY_V
view. The APPLY
results are output of the table REGRESSION_APPLY_RESULT
.
SQL> BEGIN DBMS_DATA_MINING.APPLY ( model_name => 'glmr_sh_regr_sample', data_table_name => 'mining_data_apply_v', case_id_column_name => 'cust_id', result_table_name => 'regression_apply_result'); END; / SQL> SELECT * FROM regression_apply_result WHERE cust_id > 101485; CUST_ID PREDICTION   101486 22.8048824 101487 25.0261101 101488 48.6146619 101489 51.82595 101490 22.6220714 101491 61.3856816 101492 24.1400748 101493 58.034631 101494 45.7253149 101495 26.9763318 101496 48.1433425 101497 32.0573434 101498 49.8965531 101499 56.270656 101500 21.1153047
52.8.5 COMPUTE_CONFUSION_MATRIX Procedure
This procedure computes a confusion matrix, stores it in a table in the user's schema, and returns the model accuracy.
A confusion matrix is a test metric for classification models. It compares the predictions generated by the model with the actual target values in a set of test data. The confusion matrix lists the number of times each class was correctly predicted and the number of times it was predicted to be one of the other classes.
COMPUTE_CONFUSION_MATRIX
accepts three input streams:

The predictions generated on the test data. The information is passed in three columns:

Case ID column

Prediction column

Scoring criterion column containing either probabilities or costs


The known target values in the test data. The information is passed in two columns:

Case ID column

Target column containing the known target values


(Optional) A cost matrix table with predefined columns. See the Usage Notes for the column requirements.
See Also:
Oracle Machine Learning for SQL Concepts for more details about confusion matrixes and other test metrics for classification
Syntax
DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX ( accuracy OUT NUMBER, apply_result_table_name IN VARCHAR2, target_table_name IN VARCHAR2, case_id_column_name IN VARCHAR2, target_column_name IN VARCHAR2, confusion_matrix_table_name IN VARCHAR2, score_column_name IN VARCHAR2 DEFAULT 'PREDICTION', score_criterion_column_name IN VARCHAR2 DEFAULT 'PROBABILITY', cost_matrix_table_name IN VARCHAR2 DEFAULT NULL, apply_result_schema_name IN VARCHAR2 DEFAULT NULL, target_schema_name IN VARCHAR2 DEFAULT NULL, cost_matrix_schema_name IN VARCHAR2 DEFAULT NULL, score_criterion_type IN VARCHAR2 DEFAULT 'PROBABILITY');
Parameters
Table 5254 COMPUTE_CONFUSION_MATRIX Procedure Parameters
Parameter  Description 


Output parameter containing the overall percentage accuracy of the predictions. 

Table containing the predictions. 

Table containing the known target values from the test data. 

Case ID column in the apply results table. Must match the case identifier in the targets table. 

Target column in the targets table. Contains the known target values from the test data. 

Table containing the confusion matrix. The table will be created by the procedure in the user's schema. The columns in the confusion matrix table are described in the Usage Notes. 

Column containing the predictions in the apply results table. The default column name is 

Column containing the scoring criterion in the apply results table. Contains either the probabilities or the costs that determine the predictions. By default, scoring is based on probability; the class with the highest probability is predicted for each case. If scoring is based on cost, the class with the lowest cost is predicted. The The default column name is ' See the Usage Notes for additional information. 

(Optional) Table that defines the costs associated with misclassifications. If a cost matrix table is provided and the The columns in a cost matrix table are described in the Usage Notes. 

Schema of the apply results table. If null, the user's schema is assumed. 

Schema of the table containing the known targets. If null, the user's schema is assumed. 

Schema of the cost matrix table, if one is provided. If null, the user's schema is assumed. 

Whether to use probabilities or costs as the scoring criterion. Probabilities or costs are passed in the column identified in the The default value of If See the Usage Notes and the Examples. 
Usage Notes

The predictive information you pass to
COMPUTE_CONFUSION_MATRIX
may be generated using SQLPREDICTION
functions, theDBMS_DATA_MINING.APPLY
procedure, or some other mechanism. As long as you pass the appropriate data, the procedure can compute the confusion matrix. 
Instead of passing a cost matrix to
COMPUTE_CONFUSION_MATRIX
, you can use a scoring cost matrix associated with the model. A scoring cost matrix can be embedded in the model or it can be defined dynamically when the model is applied. To use a scoring cost matrix, invoke the SQLPREDICTION_COST
function to populate the score criterion column. 
The predictions that you pass to
COMPUTE_CONFUSION_MATRIX
are in a table or view specified inapply_result_table_name
.CREATE TABLE
apply_result_table_name
AS (case_id_column_name
VARCHAR2, score_column_name VARCHAR2,score_criterion_column_name
VARCHAR2); 
A cost matrix must have the columns described in Table 5255.
Table 5255 Columns in a Cost Matrix
Column Name Data Type actual_target_value
Type of the target column in the build data
predicted_target_value
Type of the predicted target in the test data. The type of the predicted target must be the same as the type of the actual target unless the predicted target has an associated reverse transformation.
cost
BINARY_DOUBLE
See Also:
Oracle Machine Learning for SQL User’s Guide for valid target data types
Oracle Machine Learning for SQL Concepts for more information about cost matrixes

The confusion matrix created by
COMPUTE_CONFUSION_MATRIX
has the columns described in Table 5256.Table 5256 Columns in a Confusion Matrix
Column Name Data Type actual_target_value
Type of the target column in the build data
predicted_target_value
Type of the predicted target in the test data. The type of the predicted target is the same as the type of the actual target unless the predicted target has an associated reverse transformation.
value
BINARY_DOUBLE
See Also:
Oracle Machine Learning for SQL Concepts for more information about confusion matrixes
Examples
These examples use the Naive Bayes model nb_sh_clas_sample
.
Compute a Confusion Matrix Based on Probabilities
The following statement applies the model to the test data and stores the predictions and probabilities in a table.
CREATE TABLE nb_apply_results AS SELECT cust_id, PREDICTION(nb_sh_clas_sample USING *) prediction, PREDICTION_PROBABILITY(nb_sh_clas_sample USING *) probability FROM mining_data_test_v;
Using probabilities as the scoring criterion, you can compute the confusion matrix as follows.
DECLARE v_accuracy NUMBER; BEGIN DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX ( accuracy => v_accuracy, apply_result_table_name => 'nb_apply_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', confusion_matrix_table_name => 'nb_confusion_matrix', score_column_name => 'PREDICTION', score_criterion_column_name => 'PROBABILITY' cost_matrix_table_name => null, apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'PROBABILITY'); DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: '  ROUND(v_accuracy,4)); END; /
The confusion matrix and model accuracy are shown as follows.
**** MODEL ACCURACY ****: .7847 SQL>SELECT * from nb_confusion_matrix; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE VALUE    1 0 60 0 0 891 1 1 286 0 1 263
Compute a Confusion Matrix Based on a Cost Matrix Table
The confusion matrix in the previous example shows a high rate of false positives. For 263 cases, the model predicted 1 when the actual value was 0. You could use a cost matrix to minimize this type of error.
The cost matrix table nb_cost_matrix
specifies that a false positive is 3 times more costly than a false negative.
SQL> SELECT * from nb_cost_matrix; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE COST    0 0 0 0 1 .75 1 0 .25 1 1 0
This statement shows how to generate the predictions using APPLY
.
BEGIN DBMS_DATA_MINING.APPLY( model_name => 'nb_sh_clas_sample', data_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', result_table_name => 'nb_apply_results'); END; /
This statement computes the confusion matrix using the cost matrix table. The score criterion column is named 'PROBABILITY
', which is the name generated by APPLY
.
DECLARE v_accuracy NUMBER; BEGIN DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX ( accuracy => v_accuracy, apply_result_table_name => 'nb_apply_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', confusion_matrix_table_name => 'nb_confusion_matrix', score_column_name => 'PREDICTION', score_criterion_column_name => 'PROBABILITY', cost_matrix_table_name => 'nb_cost_matrix', apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'COST'); DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: '  ROUND(v_accuracy,4)); END; /
The resulting confusion matrix shows a decrease in false positives (212 instead of 263).
**** MODEL ACCURACY ****: .798 SQL> SELECT * FROM nb_confusion_matrix; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE VALUE    1 0 91 0 0 942 1 1 255 0 1 212
Compute a Confusion Matrix Based on Embedded Costs
You can use the ADD_COST_MATRIX
procedure to embed a cost matrix in a model. The embedded costs can be used instead of probabilities for scoring. This statement adds the previouslydefined cost matrix to the model.
BEGIN DBMS_DATA_MINING.ADD_COST_MATRIX ('nb_sh_clas_sample', 'nb_cost_matrix');END;/
The following statement applies the model to the test data using the embedded costs and stores the results in a table.
CREATE TABLE nb_apply_results AS SELECT cust_id, PREDICTION(nb_sh_clas_sample COST MODEL USING *) prediction, PREDICTION_COST(nb_sh_clas_sample COST MODEL USING *) cost FROM mining_data_test_v;
You can compute the confusion matrix using the embedded costs.
DECLARE v_accuracy NUMBER; BEGIN DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX ( accuracy => v_accuracy, apply_result_table_name => 'nb_apply_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', confusion_matrix_table_name => 'nb_confusion_matrix', score_column_name => 'PREDICTION', score_criterion_column_name => 'COST', cost_matrix_table_name => null, apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'COST'); END; /
The results are:
**** MODEL ACCURACY ****: .798 SQL> SELECT * FROM nb_confusion_matrix; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE VALUE    1 0 91 0 0 942 1 1 255 0 1 212
52.8.6 COMPUTE_CONFUSION_MATRIX_PART Procedure
The COMPUTE_CONFUSION_MATRIX_PART
procedure computes a confusion matrix, stores it in a table in the user's schema, and returns the model accuracy.
COMPUTE_CONFUSION_MATRIX_PART
provides support to computation of evaluation metrics perpartition for partitioned models. For nonpartitioned models, refer to COMPUTE_CONFUSION_MATRIX Procedure.
A confusion matrix is a test metric for classification models. It compares the predictions generated by the model with the actual target values in a set of test data. The confusion matrix lists the number of times each class was correctly predicted and the number of times it was predicted to be one of the other classes.
COMPUTE_CONFUSION_MATRIX_PART
accepts three input streams:

The predictions generated on the test data. The information is passed in three columns:

Case ID column

Prediction column

Scoring criterion column containing either probabilities or costs


The known target values in the test data. The information is passed in two columns:

Case ID column

Target column containing the known target values


(Optional) A cost matrix table with predefined columns. See the Usage Notes for the column requirements.
See Also:
Oracle Machine Learning for SQL Concepts for more details about confusion matrixes and other test metrics for classification
Syntax
DBMS_DATA_MINING.compute_confusion_matrix_part( accuracy OUT DM_NESTED_NUMERICALS, apply_result_table_name IN VARCHAR2, target_table_name IN VARCHAR2, case_id_column_name IN VARCHAR2, target_column_name IN VARCHAR2, confusion_matrix_table_name IN VARCHAR2, score_column_name IN VARCHAR2 DEFAULT 'PREDICTION', score_criterion_column_name IN VARCHAR2 DEFAULT 'PROBABILITY', score_partition_column_name IN VARCHAR2 DEFAULT 'PARTITION_NAME', cost_matrix_table_name IN VARCHAR2 DEFAULT NULL, apply_result_schema_name IN VARCHAR2 DEFAULT NULL, target_schema_name IN VARCHAR2 DEFAULT NULL, cost_matrix_schema_name IN VARCHAR2 DEFAULT NULL, score_criterion_type IN VARCHAR2 DEFAULT NULL);
Parameters
Table 5257 COMPUTE_CONFUSION_MATRIX_PART Procedure Parameters
Parameter  Description 


Output parameter containing the overall percentage accuracy of the predictions The output argument is changed from 

Table containing the predictions 

Table containing the known target values from the test data 

Case ID column in the apply results table. Must match the case identifier in the targets table. 

Target column in the targets table. Contains the known target values from the test data. 

Table containing the confusion matrix. The table will be created by the procedure in the user's schema. The columns in the confusion matrix table are described in the Usage Notes. 

Column containing the predictions in the apply results table. The default column name is 

Column containing the scoring criterion in the apply results table. Contains either the probabilities or the costs that determine the predictions. By default, scoring is based on probability; the class with the highest probability is predicted for each case. If scoring is based on cost, then the class with the lowest cost is predicted. The The default column name is See the Usage Notes for additional information. 

(Optional) Parameter indicating the column which contains the name of the partition. This column slices the input test results such that each partition has independent evaluation matrices computed. 

(Optional) Table that defines the costs associated with misclassifications. If a cost matrix table is provided and the The columns in a cost matrix table are described in the Usage Notes. 

Schema of the apply results table. If null, then the user's schema is assumed. 

Schema of the table containing the known targets. If null, then the user's schema is assumed. 

Schema of the cost matrix table, if one is provided. If null, then the user's schema is assumed. 

Whether to use probabilities or costs as the scoring criterion. Probabilities or costs are passed in the column identified in the The default value of If See the Usage Notes and the Examples. 
Usage Notes

The predictive information you pass to
COMPUTE_CONFUSION_MATRIX_PART
may be generated using SQLPREDICTION
functions, theDBMS_DATA_MINING.APPLY
procedure, or some other mechanism. As long as you pass the appropriate data, the procedure can compute the confusion matrix. 
Instead of passing a cost matrix to
COMPUTE_CONFUSION_MATRIX_PART
, you can use a scoring cost matrix associated with the model. A scoring cost matrix can be embedded in the model or it can be defined dynamically when the model is applied. To use a scoring cost matrix, invoke the SQLPREDICTION_COST
function to populate the score criterion column. 
The predictions that you pass to
COMPUTE_CONFUSION_MATRIX_PART
are in a table or view specified inapply_result_table_name
.CREATE TABLE
apply_result_table_name
AS (case_id_column_name
VARCHAR2, score_column_name VARCHAR2,score_criterion_column_name
VARCHAR2); 
A cost matrix must have the columns described in Table 5255.
Table 5258 Columns in a Cost Matrix
Column Name Data Type actual_target_value
Type of the target column in the test data
predicted_target_value
Type of the predicted target in the test data. The type of the predicted target must be the same as the type of the actual target unless the predicted target has an associated reverse transformation.
cost
BINARY_DOUBLE
See Also:
Oracle Machine Learning for SQL User’s Guide for valid target data types
Oracle Machine Learning for SQL Concepts for more information about cost matrixes

The confusion matrix created by
COMPUTE_CONFUSION_MATRIX_PART
has the columns described in Table 5256.Table 5259 Columns in a Confusion Matrix Part
Column Name Data Type actual_target_value
Type of the target column in the test data
predicted_target_value
Type of the predicted target in the test data. The type of the predicted target is the same as the type of the actual target unless the predicted target has an associated reverse transformation.
value
BINARY_DOUBLE
See Also:
Oracle Machine Learning for SQL Concepts for more information about confusion matrixes
Examples
These examples use the Naive Bayes model nb_sh_clas_sample
.
Compute a Confusion Matrix Based on Probabilities
The following statement applies the model to the test data and stores the predictions and probabilities in a table.
CREATE TABLE nb_apply_results AS SELECT cust_id, PREDICTION(nb_sh_clas_sample USING *) prediction, PREDICTION_PROBABILITY(nb_sh_clas_sample USING *) probability FROM mining_data_test_v;
Using probabilities as the scoring criterion, you can compute the confusion matrix as follows.
DECLARE v_accuracy NUMBER; BEGIN DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX_PART ( accuracy => v_accuracy, apply_result_table_name => 'nb_apply_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', confusion_matrix_table_name => 'nb_confusion_matrix', score_column_name => 'PREDICTION', score_criterion_column_name => 'PROBABILITY' score_partition_column_name => 'PARTITION_NAME' cost_matrix_table_name => null, apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'PROBABILITY'); DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: '  ROUND(v_accuracy,4)); END; /
The confusion matrix and model accuracy are shown as follows.
**** MODEL ACCURACY ****: .7847 SELECT * FROM NB_CONFUSION_MATRIX; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE VALUE    1 0 60 0 0 891 1 1 286 0 1 263
Compute a Confusion Matrix Based on a Cost Matrix Table
The confusion matrix in the previous example shows a high rate of false positives. For 263 cases, the model predicted 1 when the actual value was 0. You could use a cost matrix to minimize this type of error.
The cost matrix table nb_cost_matrix
specifies that a false positive is 3 times more costly than a false negative.
SELECT * from NB_COST_MATRIX; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE COST    0 0 0 0 1 .75 1 0 .25 1 1 0
This statement shows how to generate the predictions using APPLY
.
BEGIN DBMS_DATA_MINING.APPLY( model_name => 'nb_sh_clas_sample', data_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', result_table_name => 'nb_apply_results'); END; /
This statement computes the confusion matrix using the cost matrix table. The score criterion column is named 'PROBABILITY
', which is the name generated by APPLY
.
DECLARE v_accuracy NUMBER; BEGIN DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX_PART ( accuracy => v_accuracy, apply_result_table_name => 'nb_apply_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', confusion_matrix_table_name => 'nb_confusion_matrix', score_column_name => 'PREDICTION', score_criterion_column_name => 'PROBABILITY', score_partition_column_name => 'PARTITION_NAME' cost_matrix_table_name => 'nb_cost_matrix', apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'COST'); DBMS_OUTPUT.PUT_LINE('**** MODEL ACCURACY ****: '  ROUND(v_accuracy,4)); END; /
The resulting confusion matrix shows a decrease in false positives (212 instead of 263).
**** MODEL ACCURACY ****: .798 SELECT * FROM NB_CONFUSION_MATRIX; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE VALUE    1 0 91 0 0 942 1 1 255 0 1 212
Compute a Confusion Matrix Based on Embedded Costs
You can use the ADD_COST_MATRIX
procedure to embed a cost matrix in a model. The embedded costs can be used instead of probabilities for scoring. This statement adds the previouslydefined cost matrix to the model.
BEGIN DBMS_DATA_MINING.ADD_COST_MATRIX ('nb_sh_clas_sample', 'nb_cost_matrix'); END;/
The following statement applies the model to the test data using the embedded costs and stores the results in a table.
CREATE TABLE nb_apply_results AS SELECT cust_id, PREDICTION(nb_sh_clas_sample COST MODEL USING *) prediction, PREDICTION_COST(nb_sh_clas_sample COST MODEL USING *) cost FROM mining_data_test_v;
You can compute the confusion matrix using the embedded costs.
DECLARE v_accuracy NUMBER; BEGIN DBMS_DATA_MINING.COMPUTE_CONFUSION_MATRIX_PART ( accuracy => v_accuracy, apply_result_table_name => 'nb_apply_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', confusion_matrix_table_name => 'nb_confusion_matrix', score_column_name => 'PREDICTION', score_criterion_column_name => 'COST', score_partition_column_name => 'PARTITION_NAME' cost_matrix_table_name => null, apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'COST'); END; /
The results are:
**** MODEL ACCURACY ****: .798 SELECT * FROM NB_CONFUSION_MATRIX; ACTUAL_TARGET_VALUE PREDICTED_TARGET_VALUE VALUE    1 0 91 0 0 942 1 1 255 0 1 212
52.8.7 COMPUTE_LIFT Procedure
This procedure computes lift and stores the results in a table in the user's schema.
Lift is a test metric for binary classification models. To compute lift, one of the target values must be designated as the positive class. COMPUTE_LIFT
compares the predictions generated by the model with the actual target values in a set of test data. Lift measures the degree to which the model's predictions of the positive class are an improvement over random chance.
Lift is computed on scoring results that have been ranked by probability (or cost) and divided into quantiles. Each quantile includes the scores for the same number of cases.
COMPUTE_LIFT
calculates quantilebased and cumulative statistics. The number of quantiles and the positive class are userspecified. Additionally, COMPUTE_LIFT
accepts three input streams:

The predictions generated on the test data. The information is passed in three columns:

Case ID column

Prediction column

Scoring criterion column containing either probabilities or costs associated with the predictions


The known target values in the test data. The information is passed in two columns:

Case ID column

Target column containing the known target values


(Optional) A cost matrix table with predefined columns. See the Usage Notes for the column requirements.
See Also:
Oracle Machine Learning for SQL Concepts for more details about lift and test metrics for classification
Syntax
DBMS_DATA_MINING.COMPUTE_LIFT ( apply_result_table_name IN VARCHAR2, target_table_name IN VARCHAR2, case_id_column_name IN VARCHAR2, target_column_name IN VARCHAR2, lift_table_name IN VARCHAR2, positive_target_value IN VARCHAR2, score_column_name IN VARCHAR2 DEFAULT 'PREDICTION', score_criterion_column_name IN VARCHAR2 DEFAULT 'PROBABILITY', num_quantiles IN NUMBER DEFAULT 10, cost_matrix_table_name IN VARCHAR2 DEFAULT NULL, apply_result_schema_name IN VARCHAR2 DEFAULT NULL, target_schema_name IN VARCHAR2 DEFAULT NULL, cost_matrix_schema_name IN VARCHAR2 DEFAULT NULL score_criterion_type IN VARCHAR2 DEFAULT 'PROBABILITY');
Parameters
Table 5260 COMPUTE_LIFT Procedure Parameters
Parameter  Description 


Table containing the predictions. 

Table containing the known target values from the test data. 

Case ID column in the apply results table. Must match the case identifier in the targets table. 

Target column in the targets table. Contains the known target values from the test data. 

Table containing the lift statistics. The table will be created by the procedure in the user's schema. The columns in the lift table are described in the Usage Notes. 

The positive class. This should be the class of interest, for which you want to calculate lift. If the target column is a 

Column containing the predictions in the apply results table. The default column name is ' 

Column containing the scoring criterion in the apply results table. Contains either the probabilities or the costs that determine the predictions. By default, scoring is based on probability; the class with the highest probability is predicted for each case. If scoring is based on cost, the class with the lowest cost is predicted. The The default column name is ' See the Usage Notes for additional information. 

Number of quantiles to be used in calculating lift. The default is 10. 

(Optional) Table that defines the costs associated with misclassifications. If a cost matrix table is provided and the The columns in a cost matrix table are described in the Usage Notes. 

Schema of the apply results table. If null, the user's schema is assumed. 

Schema of the table containing the known targets. If null, the user's schema is assumed. 

Schema of the cost matrix table, if one is provided. If null, the user's schema is assumed. 

Whether to use probabilities or costs as the scoring criterion. Probabilities or costs are passed in the column identified in the The default value of If See the Usage Notes and the Examples. 
Usage Notes

The predictive information you pass to
COMPUTE_LIFT
may be generated using SQLPREDICTION
functions, theDBMS_DATA_MINING.APPLY
procedure, or some other mechanism. As long as you pass the appropriate data, the procedure can compute the lift. 
Instead of passing a cost matrix to
COMPUTE_LIFT
, you can use a scoring cost matrix associated with the model. A scoring cost matrix can be embedded in the model or it can be defined dynamically when the model is applied. To use a scoring cost matrix, invoke the SQLPREDICTION_COST
function to populate the score criterion column. 
The predictions that you pass to
COMPUTE_LIFT
are in a table or view specified inapply_results_table_name
.CREATE TABLE
apply_result_table_name
AS (case_id_column_name
VARCHAR2, score_column_name VARCHAR2,score_criterion_column_name
VARCHAR2); 
A cost matrix must have the columns described in Table 5261.
Table 5261 Columns in a Cost Matrix
Column Name Data Type actual_target_value
Type of the target column in the build data
predicted_target_value
Type of the predicted target in the test data. The type of the predicted target must be the same as the type of the actual target unless the predicted target has an associated reverse transformation.
cost
NUMBER
See Also:
Oracle Machine Learning for SQL Concepts for more information about cost matrixes

The table created by
COMPUTE_LIFT
has the columns described in Table 5262Table 5262 Columns in a Lift Table
Column Name Data Type quantile_number
NUMBER
probability_threshold
NUMBER
gain_cumulative
NUMBER
quantile_total_count
NUMBER
quantile_target_count
NUMBER
percent_records_cumulative
NUMBER
lift_cumulative
NUMBER
target_density_cumulative
NUMBER
targets_cumulative
NUMBER
non_targets_cumulative
NUMBER
lift_quantile
NUMBER
target_density
NUMBER
See Also:
Oracle Machine Learning for SQL Concepts for details about the information in the lift table

When a cost matrix is passed to
COMPUTE_LIFT
, the cost threshold is returned in theprobability_threshold
column of the lift table.
Examples
This example uses the Naive Bayes model nb_sh_clas_sample
.
The example illustrates lift based on probabilities. For examples that show computation based on costs, see "COMPUTE_CONFUSION_MATRIX Procedure".
The following statement applies the model to the test data and stores the predictions and probabilities in a table.
CREATE TABLE nb_apply_results AS SELECT cust_id, t.prediction, t.probability FROM mining_data_test_v, TABLE(PREDICTION_SET(nb_sh_clas_sample USING *)) t;
Using probabilities as the scoring criterion, you can compute lift as follows.
BEGIN DBMS_DATA_MINING.COMPUTE_LIFT ( apply_result_table_name => 'nb_apply_results', target_table_name => 'mining_data_test_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', lift_table_name => 'nb_lift', positive_target_value => to_char(1), score_column_name => 'PREDICTION', score_criterion_column_name => 'PROBABILITY', num_quantiles => 10, cost_matrix_table_name => null, apply_result_schema_name => null, target_schema_name => null, cost_matrix_schema_name => null, score_criterion_type => 'PROBABILITY'); END; /
This query displays some of the statistics from the resulting lift table.
SQL>SELECT quantile_number, probability_threshold, gain_cumulative, quantile_total_count FROM nb_lift; QUANTILE_NUMBER PROBABILITY_THRESHOLD GAIN_CUMULATIVE QUANTILE_TOTAL_COUNT     1 .989335775 .15034965 55 2 .980534911 .26048951 55 3 .968506098 .374125874 55 4 .958975196 .493006993 55 5 .946705997 .587412587 55 6 .927454174 .66958042 55 7 .904403627 .748251748 55 8 .836482525 .839160839 55 10 .500184953 1 54
52.8.8 COMPUTE_LIFT_PART Procedure
The COMPUTE_LIFT_PART
procedure computes lift and stores the results in a table in the user's schema. This procedure provides support to the computation of evaluation metrics perpartition for partitioned models.
Lift is a test metric for binary classification models. To compute lift, one of the target values must be designated as the positive class. COMPUTE_LIFT_PART
compares the predictions generated by the model with the actual target values in a set of test data. Lift measures the degree to which the model's predictions of the positive class are an improvement over random chance.
Lift is computed on scoring results that have been ranked by probability (or cost) and divided into quantiles. Each quantile includes the scores for the same number of cases.
COMPUTE_LIFT_PART
calculates quantilebased and cumulative statistics. The number of quantiles and the positive class are userspecified. Additionally, COMPUTE_LIFT_PART
accepts three input streams:

The predictions generated on the test data. The information is passed in three columns:

Case ID column

Prediction column

Scoring criterion column containing either probabilities or costs associated with the predictions


The known target values in the test data. The information is passed in two columns:

Case ID column

Target column containing the known target values


(Optional) A cost matrix table with predefined columns. See the Usage Notes for the column requirements.
See Also:
Oracle Machine Learning for SQL Concepts for more details about Lift and test metrics for classification
"COMPUTE_CONFUSION_MATRIX Procedure"
Syntax
DBMS_DATA_MINING.COMPUTE_LIFT_PART ( apply_result_table_name IN VARCHAR2, target_table_name IN VARCHAR2, case_id_column_name IN VARCHAR2, target_column_name IN VARCHAR2, lift_table_name IN VARCHAR2, positive_target_value IN VARCHAR2, score_column_name IN VARCHAR2 DEFAULT 'PREDICTION', score_criterion_column_name IN VARCHAR2 DEFAULT 'PROBABILITY', score_partition_column_name IN VARCHAR2 DEFAULT 'PARTITION_NAME', num_quantiles IN NUMBER DEFAULT 10, cost_matrix_table_name IN VARCHAR2 DEFAULT NULL, apply_result_schema_name IN VARCHAR2 DEFAULT NULL, target_schema_name IN VARCHAR2 DEFAULT NULL, cost_matrix_schema_name IN VARCHAR2 DEFAULT NULL, score_criterion_type IN VARCHAR2 DEFAULT NULL);
Parameters
Table 5263 COMPUTE_LIFT_PART Procedure Parameters
Parameter  Description 


Table containing the predictions 

Table containing the known target values from the test data 

Case ID column in the apply results table. Must match the case identifier in the targets table. 