The CREATE_MODEL Procedure

36.5 The CREATE_MODEL Procedure

The CREATE_MODEL procedure of the DBMS_DATA_MINING package uses the specified data to create a machine learning model with the specified name and machine learning function.

The model can be created with configuration settings and user-specified transformations.

PROCEDURE CREATE_MODEL(
                  model_name            IN VARCHAR2,
                  mining_function       IN VARCHAR2,
                  data_table_name       IN VARCHAR2,
                  case_id_column_name   IN VARCHAR2,
                  target_column_name    IN VARCHAR2 DEFAULT NULL,
                  settings_table_name   IN VARCHAR2 DEFAULT NULL,
                  data_schema_name      IN VARCHAR2 DEFAULT NULL,
                  settings_schema_name  IN VARCHAR2 DEFAULT NULL,
                  xform_list            IN TRANSFORM_LIST DEFAULT NULL);

You can also rename the model using the RENAME_MODEL procedure of the DBMS_DATA_MINING package. The procedure changes the value of the machine learning model specified against MODEL_NAME with another name that you specify.

The following example builds a classification model using the Support Vector Machine algorithm.

 Create the settings table 
CREATE TABLE svm_model_settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(30));

-- Populate the settings table
-- Specify SVM. By default, Naive Bayes is used for classification.
-- Specify ADP. By default, ADP is not used.
BEGIN 
  INSERT INTO svm_model_settings (setting_name, setting_value) VALUES
     (dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines);
  INSERT INTO svm_model_settings (setting_name, setting_value) VALUES
     (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on);
  COMMIT;
END;
/
-- Create the model using the specified settings 
BEGIN
  DBMS_DATA_MINING.CREATE_MODEL(
    model_name          => 'svm_model',
    mining_function     => dbms_data_mining.classification,
    data_table_name     => 'mining_data_build_v',
    case_id_column_name => 'cust_id',
    target_column_name  => 'affinity_card',
    settings_table_name => 'svm_model_settings');
END;
/

Related Topics

36.5.1 Choose the Machine Learning Technique

Describes providing an Oracle Machine Learning for SQL machine learning function for the CREATE_MODEL and CREATE_MODEL2 procedure.

An OML4SQL machine learning technique specifies a class of problems that can be modeled and solved. You specify a machine learning with the mining_function argument of the CREATE_MODEL and CREATE_MODEL2 procedure.

OML4SQL machine learning functions implement either supervised or unsupervised learning. Supervised learning uses a set of independent attributes to predict the value of a dependent attribute or target. Unsupervised learning does not distinguish between dependent and independent attributes. Supervised functions are predictive. Unsupervised functions are descriptive.

Note:

In OML4SQL terminology, a function is a general type of problem to be solved by a given approach to machine learning. In SQL language terminology, a function is an operation that returns a result.

In OML4SQL documentation, the term function, or machine learning function refers to an OML4SQL machine learning function; the term SQL function or SQL machine learning function refers to a SQL function for scoring (applying machine learning models).

You can specify any of the values in the following table for the mining_function parameter to the CREATE_MODEL and CREATE_MODEL2 procedure.

Table 36-7 Oracle Machine Learning mining_function Values

mining_function Value	Description
`ASSOCIATION`	Association is a descriptive machine learning function. An association model identifies relationships and the probability of their occurrence within a data set (association rules). Association models use the Apriori algorithm.
`ATTRIBUTE_IMPORTANCE`	Attribute importance is a predictive machine learning function. An attribute importance model identifies the relative importance of attributes in predicting a given outcome. Attribute importance models use the Minimum Description Length algorithm and CUR Matrix Decomposition.
`CLASSIFICATION`	Classification is a predictive machine learning function. A classification model uses historical data to predict a categorical target. Classification models can use Naive Bayes, Neural Network, Decision Tree, logistic regression, Random Forest, Support Vector Machine, Explicit Semantic Analysis, or XGBoost. The default is Naive Bayes. You can also specify the classification machine learning function for anomaly detection for a One-Class SVM model and a Multivariate State Estimation Technique - Sequential Probability Ratio Test model.
`CLUSTERING`	Clustering is a descriptive machine learning function. A clustering model identifies natural groupings within a data set. Clustering models can use k-Means, O-Cluster, or Expectation Maximization. The default is k-Means.
`FEATURE_EXTRACTION`	Feature extraction is a descriptive machine learning function. A feature extraction model creates a set of optimized attributes. Feature extraction models can use Non-Negative Matrix Factorization, Singular Value Decomposition (which can also be used for Principal Component Analysis) or Explicit Semantic Analysis. The default is Non-Negative Matrix Factorization.
`REGRESSION`	Regression is a predictive machine learning function. A regression model uses historical data to predict a numerical target. Regression models can use Support Vector Machine, GLM regression, or XGBoost. The default is Support Vector Machine.
`TIME_SERIES`	Time series is a predictive machine learning function. A time series model forecasts the future values of a time-ordered series of historical numeric data over a user-specified time window. Time series models use the Exponential Smoothing algorithm. The default is Exponential Smoothing.

36.5.2 Choose the Algorithm

Learn about providing the algorithm settings for a model.

The ALGO_NAME setting specifies the algorithm for a model. If you use the default algorithm for the machine learning technique, or if there is only one algorithm available for the machine learning technique, then you do not need to specify the ALGO_NAME setting.

Table 36-8 Oracle Machine Learning Algorithms

ALGO_NAME Value	Algorithm	Default?	Machine Learning Model Function
`ALGO_AI_MDL`	Minimum Description Length	—	Attribute importance
`ALGO_APRIORI_ASSOCIATION_RULES`	Apriori	—	Association
`ALGO_CUR_DECOMPOSITION`	CUR Matrix Decomposition	—	Attribute importance
`ALGO_DECISION_TREE`	Decision Tree	—	Classification
`ALGO_EXPECTATION_MAXIMIZATION`	Expectation Maximization	—	Clustering and Anomaly Detection
`ALGO_EXPLICIT_SEMANTIC_ANALYS`	Explicit Semantic Analysis	—	Feature extraction and classification
`ALGO_EXPONENTIAL_SMOOTHING`	Exponential Smoothing	—	Time series and time series regression
`ALGO_EXTENSIBLE_LANG`	Language used for an extensible algorithm	—	All machine learning functions are supported
`ALGO_GENERALIZED_LINEAR_MODEL`	Generalized Linear Model	—	Classification and regression
`ALGO_KMEANS`	k-Means	yes	Clustering
`ALGO_MSET_SPRT`	Multivariate State Estimation Technique - Sequential Probability Ratio Test	—	Anomaly detection (classification with no target)
`ALGO_NAIVE_BAYES`	Naive Bayes	yes	Classification
`ALGO_NEURAL_NETWORK`	Neural Network	—	Classification
`ALGO_NONNEGATIVE_MATRIX_FACTOR`	Non-Negative Matrix Factorization	yes	Feature extraction
`ALGO_O_CLUSTER`	O-Cluster	—	Clustering
`ALGO_RANDOM_FOREST`	Random Forest	—	Classification
`ALGO_SINGULAR_VALUE_DECOMP`	Singular Value Decomposition (can also be used for Principal Component Analysis)	—	Feature extraction
`ALGO_SUPPORT_VECTOR_MACHINES`	Support Vector Machine	yes	Default regression algorithm; regression, classification, and anomaly detection (classification with no target)
`ALGO_XGBOOST`	XGBoost	—	Classification and regression

36.5.3 Supply Transformations

Use xform_list to specify transformations in the model creation procedures.

You can optionally specify transformations for the build data in the xform_list parameter to CREATE_MODEL2 and CREATE_MODEL procedures. The transformation instructions are embedded in the model and reapplied whenever the model is applied to new data.

36.5.3.1 Create a Transformation List

You can create a transformation list using the DBMS_DATA_MINING_TRANSFORM package.

The following are the ways to create a transformation list:

The STACK interface in DBMS_DATA_MINING_TRANSFORM.

The STACK interface offers a set of pre-defined transformations that you can apply to an attribute or to a group of attributes. For example, you can specify supervised binning for all categorical attributes.
The SET_TRANSFORM procedure in DBMS_DATA_MINING_TRANSFORM.

The SET_TRANSFORM procedure applies a specified SQL expression to a specified attribute. For example, the following statement appends a transformation instruction for country_id to a list of transformations called my_xforms. The transformation instruction divides country_id by 10 before algorithmic processing begins. The reverse transformation multiplies country_id by 10.
```
  dbms_data_mining_transform.SET_TRANSFORM (my_xforms,
     'country_id', NULL, 'country_id/10', 'country_id*10');
```
The reverse transformation is applied in the model details. If country_id is the target of a supervised model, the reverse transformation is also applied to the scored target.

36.5.3.2 Transformation List and Automatic Data Preparation

You can provide transformation list and Automatic Data Preparation (ADP) to customize the data transformation.

The transformation list argument to CREATE_MODEL2 and CREATE_MODEL interacts with the PREP_AUTO setting, which controls ADP:

When ADP is on and you specify a transformation list, your transformations are applied with the automatic transformations and embedded in the model. The transformations that you specify are processed before the automatic transformations.
When ADP is off and you specify a transformation list, your transformations are applied and embedded in the model, but no system-generated transformations are performed.
When ADP is on and you do not specify a transformation list, the system-generated transformations are applied and embedded in the model.
When ADP is off and you do not specify a transformation list, no transformations are embedded in the model; you must separately prepare the data sets you use for building, testing, and scoring the model.

Related Topics