Before Creating a Model

Explains the preparation steps before creating a model.

Models are database schema objects that perform machine learning. The DBMS_DATA_MINING PL/SQL package is the API for creating, configuring, evaluating, and querying machine learning models (model details).

Before you create a model, you must decide what you want the model to do. You must identify the training data and determine if transformations are required. You can specify model settings to influence the behavior of the model behavior. The preparation steps are summarized in the following table.

Table 3-11 Preparation for Creating an Oracle Machine Learning for SQL Model

Preparation Step Description

Choose the machine learning function

See Choose the Machine Learning Technique

Choose the algorithm

See Choose the Algorithm

Identify the build (training) data

See Data Preparation

For classification and regression models, identify the test data

See Splitting the Data

Determine your data transformation strategy and create and populate a settings tables (if needed)

See Specify Model Settings

Choose the Machine Learning Technique

Describes providing an Oracle Machine Learning for SQL machine learning function for the CREATE_MODEL and CREATE_MODEL2 procedure.

An OML4SQL machine learning technique specifies a class of problems that can be modeled and solved. You specify a machine learning with the mining_function argument of the CREATE_MODEL and CREATE_MODEL2 procedure.

OML4SQL machine learning functions implement either supervised or unsupervised learning. Supervised learning uses a set of independent attributes to predict the value of a dependent attribute or target. Unsupervised learning does not distinguish between dependent and independent attributes. Supervised functions are predictive. Unsupervised functions are descriptive.

Note:

In OML4SQL terminology, a function is a general type of problem to be solved by a given approach to machine learning. In SQL language terminology, a function is an operation that returns a result.

In OML4SQL documentation, the term function, or machine learning function refers to an OML4SQL machine learning function; the term SQL function or SQL machine learning function refers to a SQL function for scoring (applying machine learning models).

You can specify any of the values in the following table for the mining_function parameter to the CREATE_MODEL and CREATE_MODEL2 procedure.

Table 3-12 Oracle Machine Learning mining_function Values

mining_function Value Description

ASSOCIATION

Association is a descriptive machine learning function. An association model identifies relationships and the probability of their occurrence within a data set (association rules).

Association models use the Apriori algorithm.

ATTRIBUTE_IMPORTANCE

Attribute importance is a predictive machine learning function. An attribute importance model identifies the relative importance of attributes in predicting a given outcome.

Attribute importance models use the Minimum Description Length algorithm and CUR Matrix Decomposition.

CLASSIFICATION

Classification is a predictive machine learning function. A classification model uses historical data to predict a categorical target.

Classification models can use Naive Bayes, Neural Network, Decision Tree, logistic regression, Random Forest, Support Vector Machine, Explicit Semantic Analysis, or XGBoost. The default is Naive Bayes.

You can also specify the classification machine learning function for anomaly detection for a One-Class SVM model and a Multivariate State Estimation Technique - Sequential Probability Ratio Test model.

CLUSTERING

Clustering is a descriptive machine learning function. A clustering model identifies natural groupings within a data set.

Clustering models can use k-Means, O-Cluster, or Expectation Maximization. The default is k-Means.

FEATURE_EXTRACTION

Feature extraction is a descriptive machine learning function. A feature extraction model creates a set of optimized attributes.

Feature extraction models can use Non-Negative Matrix Factorization, Singular Value Decomposition (which can also be used for Principal Component Analysis) or Explicit Semantic Analysis. The default is Non-Negative Matrix Factorization.

REGRESSION

Regression is a predictive machine learning function. A regression model uses historical data to predict a numerical target.

Regression models can use Support Vector Machine, GLM regression, or XGBoost. The default is Support Vector Machine.

TIME_SERIES

Time series is a predictive machine learning function. A time series model forecasts the future values of a time-ordered series of historical numeric data over a user-specified time window. Time series models use the Exponential Smoothing algorithm. The default is Exponential Smoothing.

Choose the Algorithm

Learn about providing the algorithm settings for a model.

The ALGO_NAME setting specifies the algorithm for a model. If you use the default algorithm for the machine learning technique, or if there is only one algorithm available for the machine learning technique, then you do not need to specify the ALGO_NAME setting.

Table 3-13 Oracle Machine Learning Algorithms

ALGO_NAME Value Algorithm Default? Machine Learning Model Function

ALGO_AI_MDL

Minimum Description Length

Attribute importance

ALGO_APRIORI_ASSOCIATION_RULES

Apriori

Association

ALGO_CUR_DECOMPOSITION

CUR Matrix Decomposition

Attribute importance

ALGO_DECISION_TREE

Decision Tree

Classification

ALGO_EXPECTATION_MAXIMIZATION

Expectation Maximization

Clustering and Anomaly Detection

ALGO_EXPLICIT_SEMANTIC_ANALYS

Explicit Semantic Analysis

Feature extraction and classification

ALGO_EXPONENTIAL_SMOOTHING

Exponential Smoothing

Time series and time series regression

ALGO_EXTENSIBLE_LANG

Language used for an extensible algorithm

All machine learning functions are supported

ALGO_GENERALIZED_LINEAR_MODEL

Generalized Linear Model

Classification and regression

ALGO_KMEANS

k-Means

yes

Clustering

ALGO_MSET_SPRT

Multivariate State Estimation Technique - Sequential Probability Ratio Test

Anomaly detection (classification with no target)

ALGO_NAIVE_BAYES

Naive Bayes

yes

Classification

ALGO_NEURAL_NETWORK

Neural Network

Classification

ALGO_NONNEGATIVE_MATRIX_FACTOR

Non-Negative Matrix Factorization

yes

Feature extraction

ALGO_O_CLUSTER

O-Cluster

Clustering

ALGO_RANDOM_FOREST

Random Forest

Classification

ALGO_SINGULAR_VALUE_DECOMP

Singular Value Decomposition (can also be used for Principal Component Analysis)

Feature extraction

ALGO_SUPPORT_VECTOR_MACHINES

Support Vector Machine

yes

Default regression algorithm; regression, classification, and anomaly detection (classification with no target)

ALGO_XGBOOST

XGBoost

Classification and regression