Interfaces to Oracle Machine Learning for SQL

Oracle Machine Learning Modeling, Transformations, and Convenience Functions

You can access PL/SQL interface to perform data modeling, transformations, and predictive analytics.

The following table displays the PL/SQL packages for Oracle Machine Learning. In Oracle Database releases prior to Release 21c, Oracle Machine Learning was named Oracle Data Mining.

Table 3-1 Oracle Machine Learning PL/SQL Packages

Package Name	Description
`DBMS_DATA_MINING`	Routines for creating and managing machine learning models
`DBMS_DATA_MINING_TRANSFORM`	Routines for transforming the data for machine learning
`DBMS_PREDICTIVE_ANALYTICS`	Routines that perform predictive analytics

Related Topics

DBMS_DATA_MINING

The DBMS_DATA_MINING package contains routines for creating machine learning models, for performing operations on the models, and for querying them.

The package includes routines for:

Creating, dropping, and performing other DDL operations on machine learning models
Obtaining detailed information about model attributes, rules, and other information internal to the model (model details)
Computing test metrics for classification models
Specifying costs for classification models
Exporting and importing models
Building models using Oracle Machine Learning native algorithms as well as algorithms written in R

Related Topics

Oracle Database PL/SQL Packages and Types Reference

About Oracle Machine Learning Models

Machine learning models are database schema objects that perform machine learning techniques.

As with all schema objects, access to machine learning models is controlled by database privileges. Models can be exported and imported. They support comments and they can be tracked in the Oracle Database auditing system.

Machine learning models are created by the CREATE_MODEL2 or the CREATE_MODEL procedures in the DBMS_DATA_MINING PL/SQL package. Models are created for a specific machine learning technique, and they use a specific algorithm to perform that function. Machine learning technique is a term that refers to a class of machine learning problems to be solved. Examples of machine learning techniques are: regression, classification, attribute importance, clustering, anomaly detection, and feature selection. OML4SQL supports one or more algorithms for each machine learning technique.

Along with the machine learning technique, in the CREATE_MODEL2 procedure, you can specify an algorithm and other characteristics of a model. In CREATE_MODEL procedure you can specify a settings table to specify an algorithm and other characteristics of a model. Some settings are general, some are specific to a machine learning technique, and some are specific to an algorithm.

Note:

Most types of machine learning models can be used to score data. However, it is possible to score data without applying a model. Dynamic scoring and predictive analytics return scoring results without a user-supplied model. They create and apply transient models that are not visible to you.

Related Topics

DBMS_DATA_MINING_TRANSFORM

The DBMS_DATA_MINING_TRANSFORM package contains routines that perform data transformations such as binning, normalization, and outlier treatment.

The package includes routines for:

Specifying transformations in a format that can be embedded in a machine learning model.
Specifying transformations as relational views (external to machine learning model objects).
Specifying distinct properties for columns in the build data. For example, you can specify that the column must be interpreted as unstructured text, or that the column must be excluded from Automatic Data Preparation.

Related Topics

Oracle Database PL/SQL Packages and Types Reference

Transformation Methods in DBMS_DATA_MINING_TRANSFORM

Summarizes the methods for transforming data in DBMS_DATA_MINING_TRANSFORM package.

Table 3-2 DBMS_DATA_MINING_TRANSFORM Transformation Methods

Transformation Method	Description
`XFORM` interface	`CREATE`, `INSERT`, and `XFORM` routines specify transformations in external views
`STACK` interface	`CREATE`, `INSERT`, and `XFORM` routines specify transformations for embedding in a model
`SET_TRANSFORM`	Specifies transformations for embedding in a model

The statements in the following example create a Support Vector Machine (SVM) classification model called T_SVM_Clas_sample with an embedded transformation that causes the comments attribute to be treated as unstructured text data. The T_SVM_CLAS_SAMPLE model is created by oml4sql-classification-text-mining-svm.sql example.

Example 3-1 Sample Embedded Transformation

DECLARE
  xformlist dbms_data_mining_transform.TRANSFORM_LIST;
BEGIN
  dbms_data_mining_transform.SET_TRANSFORM(
    xformlist, 'comments', null, 'comments', null, 'TEXT');
  DBMS_DATA_MINING.CREATE_MODEL(
    model_name          => 'T_SVM_Clas_sample',
    mining_function     => dbms_data_mining.classification,
    data_table_name     => 'mining_build_text',
    case_id_column_name => 'cust_id',
    target_column_name  => 'affinity_card',
    settings_table_name => 't_svmc_sample_settings',
    xform_list => xformlist);
END;
/

Predictive Analytics

Predictive analytics is a technology that captures Oracle Machine Learning for SQL processes in simple routines.

Sometimes called "one-click machine learning," predictive analytics simplifies and automates the machine learning process.

Predictive analytics uses OML4SQL technology, but knowledge of OML4SQL is not needed to use predictive analytics. You can use predictive analytics by specifying an operation to perform on your data. You do not need to create or use OML4SQL models or understand the OML4SQL functions and algorithms summarized in "Oracle Machine Learning for SQL Basics ".

Oracle Machine Learning for SQL predictive analytics operations are described in the following table:

Table 3-3 Oracle Predictive Analytics Operations

Operation	Description
`EXPLAIN`	Explains how individual predictors (columns) affect the variation of values in a target column
`PREDICT`	For each case (row), predicts the values in a target column
`PROFILE`	Creates a set of rules for cases (rows) that imply the same target value

The Oracle predictive analytics operations are implemented in the DBMS_PREDICTIVE_ANALYTICS PL/SQL package. They are also available in Oracle Data Miner.

Related Topics

About Oracle Machine Learning for SQL

DBMS_PREDICTIVE_ANALYTICS

The DBMS_PREDICTIVE_ANALYTICS package contains routines that perform an automated form of machine learning known as predictive analytics. With predictive analytics, you do not need to be aware of model building or scoring. All machine learning activities are handled internally by the procedure.

The DBMS_PREDICTIVE_ANALYTICS package includes these routines:

EXPLAIN ranks attributes in order of influence in explaining a target column.
PREDICT predicts the value of a target column based on values in the input data.
PROFILE generates rules that describe the cases from the input data.

The EXPLAIN statement in the following example lists attributes in the view mining_data_build_v in order of their importance in predicting affinity_card.

Example 3-2 Sample EXPLAIN Statement

BEGIN 
    DBMS_PREDICTIVE_ANALYTICS.EXPLAIN( 
        data_table_name      => 'mining_data_build_v', 
        explain_column_name  => 'affinity_card', 
        result_table_name    => 'explain_results'); 
END; 
/

Related Topics

Oracle Database PL/SQL Packages and Types Reference

Oracle Machine Learning Data Dictionary Views

Lists Oracle Machine Learning data dictionary views.

The data dictionary views for Oracle Machine Learning are listed in the following table. A database administrator (DBA) and USER versions of the views are also available.

Table 3-4 Data Dictionary Views for Oracle Machine Learning

View Name	Description
ALL_MINING_MODELS	Provides information about all accessible machine learning models
ALL_MINING_MODEL_ATTRIBUTES	Provides information about the attributes of all accessible machine learning models
ALL_MINING_MODEL_PARTITIONS	Provides information about the partitions of all accessible partitioned machine learning models
ALL_MINING_MODEL_SETTINGS	Provides information about the configuration settings for all accessible machine learning models
ALL_MINING_MODEL_VIEWS	Provides information about the model views for all accessible machine learning models
ALL_MINING_MODEL_XFORMS	Provides the user-specified transformations embedded in all accessible machine learning models.

SQL Functions

Oracle Machine Learning for SQL supports SQL functions for performing prediction, clustering, and feature extraction.

The functions score data by applying an OML4SQL model object or by running an analytic clause that performs dynamic scoring.

The following example shows a query that applies the classification model svmc_sh_clas_sample to the data in the view mining_data_apply_v. The query returns the average age of customers who are likely to use an affinity card. The results are broken out by gender.

Example 3-3 The PREDICTION Function

SELECT cust_gender,
       COUNT(*) AS cnt,
       ROUND(AVG(age)) AS avg_age
  FROM mining_data_apply_v
 WHERE PREDICTION(svmc_sh_clas_sample USING *) = 1
GROUP BY cust_gender
ORDER BY cust_gender;

The output is as follows:


C        CNT    AVG_AGE
- ---------- ----------
F         59         41
M        409         45

Related Topics

In-Database Scoring

Oracle Machine Learning for SQL Scoring Functions

Use OML4SQL functions score data. Functions can apply a machine learning model schema object to data or dynamically mine it with an analytic clause. SQL functions exist for all OML4SQL scoring algorithms.

All OML4SQL functions, as listed in the following table can operate on an R machine learning model with the corresponding OML4SQL function. However, the functions are not limited to the ones listed here.

Table 3-5 OML4SQL Functions

Function	Description
`CLUSTER_ID`	Returns the ID of the predicted cluster
`CLUSTER_DETAILS`	Returns detailed information about the predicted cluster
`CLUSTER_DISTANCE`	Returns the distance from the centroid of the predicted cluster
`CLUSTER_PROBABILITY`	Returns the probability of a case belonging to a given cluster
`CLUSTER_SET`	Returns a list of all possible clusters to which a given case belongs along with the associated probability of inclusion
`FEATURE_COMPARE`	Compares two similar and dissimilar set of texts from two different documents or keyword phrases or a combination of both
`FEATURE_ID`	Returns the ID of the feature with the highest coefficient value
`FEATURE_DETAILS`	Returns detailed information about the predicted feature
`FEATURE_SET`	Returns a list of objects containing all possible features along with the associated coefficients
`FEATURE_VALUE`	Returns the value of the predicted feature
`ORA_DM_PARTITION_NAME`	Returns the partition names for a partitioned model
`PREDICTION`	Returns the best prediction for the target
`PREDICTION_BOUNDS`	(GLM only) Returns the upper and lower bounds of the interval wherein the predicted values (linear regression) or probabilities (logistic regression) lie.
`PREDICTION_COST`	Returns a measure of the cost of incorrect predictions
`PREDICTION_DETAILS`	Returns detailed information about the prediction
`PREDICTION_PROBABILITY`	Returns the probability of the prediction
`PREDICTION_SET`	Returns the results of a classification model, including the predictions and associated probabilities for each case
`VECTOR_EMBEDDING`	Generates a single vector embedding for different data types

The following example shows a query that returns the results of the CLUSTER_ID function. The query applies the model em_sh_clus_sample, which finds groups of customers that share certain characteristics. The query returns the identifiers of the clusters and the number of customers in each cluster. The em_sh_clus_sample model is created by the oml4sql-clustering-expectation-maximization.sql example.

Example 3-4 CLUSTER_ID Function

-- -List the clusters into which the customers in this
-- -data set have been grouped.
--
SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt 
  FROM mining_data_apply_v
GROUP BY CLUSTER_ID(em_sh_clus_sample USING *)
ORDER BY cnt DESC;


-- List the clusters into which the customers in this
-- data set have been grouped.
--
SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt
FROM mining_data_apply_v
GROUP BY CLUSTER_ID(em_sh_clus_sample USING *)
ORDER BY cnt DESC;

The output is as follows:


      CLUS        CNT
---------- ----------
         9        311
         3        294
         7        215
        12        201
        17        123
        16        114
        14         86
        19         64
        15         56
        18         36

Related Topics

Oracle Machine Learning for SQL Functions

Oracle Machine Learning for SQL Statistical Functions

Various SQL statistical functions are available in Oracle Database to explore and analyze data.

A variety of scalable statistical functions are accessible through SQL in Oracle Database. These statistical functions are implemented as SQL functions. The SQL statistical functions can be used to compute standard univariate statistics such as MEAN, MAX, MIN, MEDIAN, MODE, and standard deviation on the data. Users can also perform various other statistical functions such as t-test, f-test, aggregate functions, analytic functions, or ANOVA. The functions listed in the following table are available from SQL.

Table 3-6 SQL Statistical Functions Supported by OML4SQL

Function	Description
`APPROX_COUNT`	Returns approximate count of an expression
`APPROX_SUM`	Returns approximate sum of an expression
`APPROX_RANK`	Returns approximate value in a group of values
`CORR`	Retuns the coefficient of correlation of a set of number pairs
`CORR_S`	Calculates the Spearman's rho correlation coefficient
`CORR_K`	Calculates the Kendall's tau-b correlation coefficient
`COVAR_POP`	Returns the population covariance of a set of number pairs
`COVAR_SAMP`	Returns the sample covariance of a set of number pairs.
`LAG`	`LAG` is an analytic function. It provides access to more than one row of a table at the same time without a self join.
`LEAD`	`LEAD` is an analytic function. It provides access to more than one row of a table at the same time without a self join.
`STATS_BINOMIAL_TEST`	`STATS_BINOMIAL_TEST` is an exact probability test used for dichotomous variables, where only two possible values exist.
`STATS_CROSSTAB`	`STATS_CROSSTAB` is a method used to analyze two nominal variables.
`STATS_F_TEST`	`STATS_F_TEST` tests whether two variances are significantly different.
`STATS_KS_TEST`	`STATS_KS_TEST` is a Kolmogorov-Smirnov function that compares two samples to test whether they are from the same population or from populations that have the same distribution.
`STATS_MODE`	Takes as its argument a set of values and returns the value that occurs with the greatest frequency
`STATS_MW_TEST`	A Mann Whitney test compares two independent samples to test the null hypothesis that two populations have the same distribution function against the alternative hypothesis that the two distribution functions are different.
`STATS_ONE_WAY_ANOVA`	Tests differences in means (for groups or variables) for statistical significance by comparing two different estimates of variance
`STATS_T_TEST_*`	The t-test measures the significance of a difference of means
`STATS_T_TEST_ONE`	A one-sample t-test
`STATS_T_TEST_PAIRED`	A two-sample, paired t-test (also known as a crossed t-test)
`STATS_T_TEST_INDEP` and `STATS_T_TEST_INDEPU`	A t-test of two independent groups with the same variance (pooled variances) A t-test of two independent groups with unequal variance (unpooled variances)
`STDDEV`	returns the sample standard deviation of a set of numbers
`STDDEV_POP`	Computes the population standard deviation and returns the square root of the population variance
`STDDEV_SAMP`	Computes the cumulative sample standard deviation and returns the square root of the sample variance
`SUM`	Returns the sum of values

DBMS_STAT_FUNCS PL/SQL package is also available for users.

Related Topics

R Operators and Functions Supported by Oracle Machine Learning for R

Oracle Data Miner

Oracle Machine Learning for SQL supports a graphical interface called Oracle Data Miner.

Oracle Data Miner is a graphical interface to OML4SQL. Oracle Data Miner is an extension to Oracle SQL Developer, which is available for download free of charge on the Oracle Technology Network.

Oracle Data Miner uses a work flow paradigm to capture, document, and automate the process of building, evaluating, and applying OML4SQL models. Within a work flow, you can specify data transformations, build and evaluate multiple models, and score multiple data sets. You can then save work flows and share them with other users.

Figure 3-1 An Oracle Data Miner Workflow

Description of "Figure 3-1 An Oracle Data Miner Workflow"

For information about Oracle Data Miner, including installation instructions, visit Oracle Technology Network.

Related Topics

Oracle Data Miner