Interfaces to Oracle Machine Learning for SQL

Introduces supported interfaces for Oracle Machine Learning for SQL.

The programmatic interfaces to Oracle Machine Learning for SQL are PL/SQL for building and maintaining models and a family of SQL functions for scoring. OML4SQL also supports a graphical user interface, which is implemented as an extension to Oracle SQL Developer.

Oracle Predictive Analytics, a set of simplified OML4SQL routines, is built on top of OML4SQL and is implemented as a PL/SQL package.

Oracle Machine Learning Modeling, Transformations, and Convenience Functions

You can access PL/SQL interface to perform data modeling, transformations, and predictive analytics.

The following table displays the PL/SQL packages for Oracle Machine Learning. In Oracle Database releases prior to Release 21c, Oracle Machine Learning was named Oracle Data Mining.

Table 3-1 Oracle Machine Learning PL/SQL Packages

Package Name Description

DBMS_DATA_MINING

Routines for creating and managing machine learning models

DBMS_DATA_MINING_TRANSFORM

Routines for transforming the data for machine learning

DBMS_PREDICTIVE_ANALYTICS

Routines that perform predictive analytics

DBMS_DATA_MINING

The DBMS_DATA_MINING package contains routines for creating machine learning models, for performing operations on the models, and for querying them.

The package includes routines for:

  • Creating, dropping, and performing other DDL operations on machine learning models

  • Obtaining detailed information about model attributes, rules, and other information internal to the model (model details)

  • Computing test metrics for classification models

  • Specifying costs for classification models

  • Exporting and importing models

  • Building models using Oracle Machine Learning native algorithms as well as algorithms written in R

About Oracle Machine Learning Models

Machine learning models are database schema objects that perform machine learning techniques.

As with all schema objects, access to machine learning models is controlled by database privileges. Models can be exported and imported. They support comments and they can be tracked in the Oracle Database auditing system.

Machine learning models are created by the CREATE_MODEL2 or the CREATE_MODEL procedures in the DBMS_DATA_MINING PL/SQL package. Models are created for a specific machine learning technique, and they use a specific algorithm to perform that function. Machine learning technique is a term that refers to a class of machine learning problems to be solved. Examples of machine learning techniques are: regression, classification, attribute importance, clustering, anomaly detection, and feature selection. OML4SQL supports one or more algorithms for each machine learning technique.

Along with the machine learning technique, in the CREATE_MODEL2 procedure, you can specify an algorithm and other characteristics of a model. In CREATE_MODEL procedure you can specify a settings table to specify an algorithm and other characteristics of a model. Some settings are general, some are specific to a machine learning technique, and some are specific to an algorithm.

Note:

Most types of machine learning models can be used to score data. However, it is possible to score data without applying a model. Dynamic scoring and predictive analytics return scoring results without a user-supplied model. They create and apply transient models that are not visible to you.

DBMS_DATA_MINING_TRANSFORM

The DBMS_DATA_MINING_TRANSFORM package contains routines that perform data transformations such as binning, normalization, and outlier treatment.

The package includes routines for:

  • Specifying transformations in a format that can be embedded in a machine learning model.

  • Specifying transformations as relational views (external to machine learning model objects).

  • Specifying distinct properties for columns in the build data. For example, you can specify that the column must be interpreted as unstructured text, or that the column must be excluded from Automatic Data Preparation.

Transformation Methods in DBMS_DATA_MINING_TRANSFORM

Summarizes the methods for transforming data in DBMS_DATA_MINING_TRANSFORM package.

Table 3-2 DBMS_DATA_MINING_TRANSFORM Transformation Methods

Transformation Method Description

XFORM interface

CREATE, INSERT, and XFORM routines specify transformations in external views

STACK interface

CREATE, INSERT, and XFORM routines specify transformations for embedding in a model

SET_TRANSFORM

Specifies transformations for embedding in a model

The statements in the following example create a Support Vector Machine (SVM) classification model called T_SVM_Clas_sample with an embedded transformation that causes the comments attribute to be treated as unstructured text data. The T_SVM_CLAS_SAMPLE model is created by oml4sql-classification-text-mining-svm.sql example.

Example 3-1 Sample Embedded Transformation

DECLARE
  xformlist dbms_data_mining_transform.TRANSFORM_LIST;
BEGIN
  dbms_data_mining_transform.SET_TRANSFORM(
    xformlist, 'comments', null, 'comments', null, 'TEXT');
  DBMS_DATA_MINING.CREATE_MODEL(
    model_name          => 'T_SVM_Clas_sample',
    mining_function     => dbms_data_mining.classification,
    data_table_name     => 'mining_build_text',
    case_id_column_name => 'cust_id',
    target_column_name  => 'affinity_card',
    settings_table_name => 't_svmc_sample_settings',
    xform_list => xformlist);
END;
/ 

Predictive Analytics

Predictive analytics is a technology that captures Oracle Machine Learning for SQL processes in simple routines.

Sometimes called "one-click machine learning," predictive analytics simplifies and automates the machine learning process.

Predictive analytics uses OML4SQL technology, but knowledge of OML4SQL is not needed to use predictive analytics. You can use predictive analytics by specifying an operation to perform on your data. You do not need to create or use OML4SQL models or understand the OML4SQL functions and algorithms summarized in "Oracle Machine Learning for SQL Basics ".

Oracle Machine Learning for SQL predictive analytics operations are described in the following table:

Table 3-3 Oracle Predictive Analytics Operations

Operation Description

EXPLAIN

Explains how individual predictors (columns) affect the variation of values in a target column

PREDICT

For each case (row), predicts the values in a target column

PROFILE

Creates a set of rules for cases (rows) that imply the same target value

The Oracle predictive analytics operations are implemented in the DBMS_PREDICTIVE_ANALYTICS PL/SQL package. They are also available in Oracle Data Miner.

DBMS_PREDICTIVE_ANALYTICS

The DBMS_PREDICTIVE_ANALYTICS package contains routines that perform an automated form of machine learning known as predictive analytics. With predictive analytics, you do not need to be aware of model building or scoring. All machine learning activities are handled internally by the procedure.

The DBMS_PREDICTIVE_ANALYTICS package includes these routines:

  • EXPLAIN ranks attributes in order of influence in explaining a target column.

  • PREDICT predicts the value of a target column based on values in the input data.

  • PROFILE generates rules that describe the cases from the input data.

The EXPLAIN statement in the following example lists attributes in the view mining_data_build_v in order of their importance in predicting affinity_card.

Example 3-2 Sample EXPLAIN Statement

BEGIN 
    DBMS_PREDICTIVE_ANALYTICS.EXPLAIN( 
        data_table_name      => 'mining_data_build_v', 
        explain_column_name  => 'affinity_card', 
        result_table_name    => 'explain_results'); 
END; 
/

Oracle Machine Learning Data Dictionary Views

Lists Oracle Machine Learning data dictionary views.

The data dictionary views for Oracle Machine Learning are listed in the following table. A database administrator (DBA) and USER versions of the views are also available.

Table 3-4 Data Dictionary Views for Oracle Machine Learning

View Name Description

ALL_MINING_MODELS

Provides information about all accessible machine learning models

ALL_MINING_MODEL_ATTRIBUTES

Provides information about the attributes of all accessible machine learning models

ALL_MINING_MODEL_PARTITIONS

Provides information about the partitions of all accessible partitioned machine learning models

ALL_MINING_MODEL_SETTINGS

Provides information about the configuration settings for all accessible machine learning models

ALL_MINING_MODEL_VIEWS

Provides information about the model views for all accessible machine learning models

ALL_MINING_MODEL_XFORMS

Provides the user-specified transformations embedded in all accessible machine learning models.

SQL Functions

Oracle Machine Learning for SQL supports SQL functions for performing prediction, clustering, and feature extraction.

The functions score data by applying an OML4SQL model object or by running an analytic clause that performs dynamic scoring.

The following example shows a query that applies the classification model svmc_sh_clas_sample to the data in the view mining_data_apply_v. The query returns the average age of customers who are likely to use an affinity card. The results are broken out by gender.

Example 3-3 The PREDICTION Function

SELECT cust_gender,
       COUNT(*) AS cnt,
       ROUND(AVG(age)) AS avg_age
  FROM mining_data_apply_v
 WHERE PREDICTION(svmc_sh_clas_sample USING *) = 1
GROUP BY cust_gender
ORDER BY cust_gender;
The output is as follows:

C        CNT    AVG_AGE
- ---------- ----------
F         59         41
M        409         45

Related Topics

Oracle Machine Learning for SQL Scoring Functions

Use OML4SQL functions score data. Functions can apply a machine learning model schema object to data or dynamically mine it with an analytic clause. SQL functions exist for all OML4SQL scoring algorithms.

All OML4SQL functions, as listed in the following table can operate on an R machine learning model with the corresponding OML4SQL function. However, the functions are not limited to the ones listed here.

Table 3-5 OML4SQL Functions

Function Description

CLUSTER_ID

Returns the ID of the predicted cluster

CLUSTER_DETAILS

Returns detailed information about the predicted cluster

CLUSTER_DISTANCE

Returns the distance from the centroid of the predicted cluster

CLUSTER_PROBABILITY

Returns the probability of a case belonging to a given cluster

CLUSTER_SET

Returns a list of all possible clusters to which a given case belongs along with the associated probability of inclusion

FEATURE_COMPARE Compares two similar and dissimilar set of texts from two different documents or keyword phrases or a combination of both

FEATURE_ID

Returns the ID of the feature with the highest coefficient value

FEATURE_DETAILS

Returns detailed information about the predicted feature

FEATURE_SET

Returns a list of objects containing all possible features along with the associated coefficients

FEATURE_VALUE

Returns the value of the predicted feature

ORA_DM_PARTITION_NAME Returns the partition names for a partitioned model

PREDICTION

Returns the best prediction for the target

PREDICTION_BOUNDS

(GLM only) Returns the upper and lower bounds of the interval wherein the predicted values (linear regression) or probabilities (logistic regression) lie.

PREDICTION_COST

Returns a measure of the cost of incorrect predictions

PREDICTION_DETAILS

Returns detailed information about the prediction

PREDICTION_PROBABILITY

Returns the probability of the prediction

PREDICTION_SET

Returns the results of a classification model, including the predictions and associated probabilities for each case

VECTOR_EMBEDDING

Generates a single vector embedding for different data types

The following example shows a query that returns the results of the CLUSTER_ID function. The query applies the model em_sh_clus_sample, which finds groups of customers that share certain characteristics. The query returns the identifiers of the clusters and the number of customers in each cluster. The em_sh_clus_sample model is created by the oml4sql-clustering-expectation-maximization.sql example.

Example 3-4 CLUSTER_ID Function

-- -List the clusters into which the customers in this
-- -data set have been grouped.
--
SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt 
  FROM mining_data_apply_v
GROUP BY CLUSTER_ID(em_sh_clus_sample USING *)
ORDER BY cnt DESC;


-- List the clusters into which the customers in this
-- data set have been grouped.
--
SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt
FROM mining_data_apply_v
GROUP BY CLUSTER_ID(em_sh_clus_sample USING *)
ORDER BY cnt DESC;
 

The output is as follows:


      CLUS        CNT
---------- ----------
         9        311
         3        294
         7        215
        12        201
        17        123
        16        114
        14         86
        19         64
        15         56
        18         36

Oracle Machine Learning for SQL Statistical Functions

Various SQL statistical functions are available in Oracle Database to explore and analyze data.

A variety of scalable statistical functions are accessible through SQL in Oracle Database. These statistical functions are implemented as SQL functions. The SQL statistical functions can be used to compute standard univariate statistics such as MEAN, MAX, MIN, MEDIAN, MODE, and standard deviation on the data. Users can also perform various other statistical functions such as t-test, f-test, aggregate functions, analytic functions, or ANOVA. The functions listed in the following table are available from SQL.

Table 3-6 SQL Statistical Functions Supported by OML4SQL

Function Description
APPROX_COUNT Returns approximate count of an expression
APPROX_SUM Returns approximate sum of an expression
APPROX_RANK Returns approximate value in a group of values
CORR Retuns the coefficient of correlation of a set of number pairs
CORR_S Calculates the Spearman's rho correlation coefficient
CORR_K Calculates the Kendall's tau-b correlation coefficient
COVAR_POP Returns the population covariance of a set of number pairs
COVAR_SAMP Returns the sample covariance of a set of number pairs.
LAG LAG is an analytic function. It provides access to more than one row of a table at the same time without a self join.
LEAD LEAD is an analytic function. It provides access to more than one row of a table at the same time without a self join.
STATS_BINOMIAL_TEST STATS_BINOMIAL_TEST is an exact probability test used for dichotomous variables, where only two possible values exist.
STATS_CROSSTAB STATS_CROSSTAB is a method used to analyze two nominal variables.
STATS_F_TEST STATS_F_TEST tests whether two variances are significantly different.
STATS_KS_TEST STATS_KS_TEST is a Kolmogorov-Smirnov function that compares two samples to test whether they are from the same population or from populations that have the same distribution.
STATS_MODE Takes as its argument a set of values and returns the value that occurs with the greatest frequency
STATS_MW_TEST A Mann Whitney test compares two independent samples to test the null hypothesis that two populations have the same distribution function against the alternative hypothesis that the two distribution functions are different.
STATS_ONE_WAY_ANOVA Tests differences in means (for groups or variables) for statistical significance by comparing two different estimates of variance
STATS_T_TEST_* The t-test measures the significance of a difference of means
STATS_T_TEST_ONE A one-sample t-test
STATS_T_TEST_PAIRED A two-sample, paired t-test (also known as a crossed t-test)
STATS_T_TEST_INDEP and STATS_T_TEST_INDEPU A t-test of two independent groups with the same variance (pooled variances)

A t-test of two independent groups with unequal variance (unpooled variances)

STDDEV returns the sample standard deviation of a set of numbers
STDDEV_POP Computes the population standard deviation and returns the square root of the population variance
STDDEV_SAMP Computes the cumulative sample standard deviation and returns the square root of the sample variance
SUM Returns the sum of values

DBMS_STAT_FUNCS PL/SQL package is also available for users.

Oracle Data Miner

Oracle Machine Learning for SQL supports a graphical interface called Oracle Data Miner.

Oracle Data Miner is a graphical interface to OML4SQL. Oracle Data Miner is an extension to Oracle SQL Developer, which is available for download free of charge on the Oracle Technology Network.

Oracle Data Miner uses a work flow paradigm to capture, document, and automate the process of building, evaluating, and applying OML4SQL models. Within a work flow, you can specify data transformations, build and evaluate multiple models, and score multiple data sets. You can then save work flows and share them with other users.

Figure 3-1 An Oracle Data Miner Workflow

Description of Figure 3-1 follows
Description of "Figure 3-1 An Oracle Data Miner Workflow"

For information about Oracle Data Miner, including installation instructions, visit Oracle Technology Network.

Related Topics