Interfaces to Oracle Machine Learning for SQL
Introduces supported interfaces for Oracle Machine Learning for SQL.
The programmatic interfaces to Oracle Machine Learning for SQL are PL/SQL for building and maintaining models and a family of SQL functions for scoring. OML4SQL also supports a graphical user interface, which is implemented as an extension to Oracle SQL Developer.
Oracle Predictive Analytics, a set of simplified OML4SQL routines, is built on top of OML4SQL and is implemented as a PL/SQL package.
Oracle Machine Learning Modeling, Transformations, and Convenience Functions
You can access PL/SQL interface to perform data modeling, transformations, and predictive analytics.
The following table displays the PL/SQL packages for Oracle Machine Learning. In Oracle Database releases prior to Release 21c, Oracle Machine Learning was named Oracle Data Mining.
Table 3-1 Oracle Machine Learning PL/SQL Packages
Package Name | Description |
---|---|
|
|
|
|
|
Related Topics
DBMS_DATA_MINING
The DBMS_DATA_MINING
package contains routines for creating machine learning models, for performing operations on the models, and for querying them.
The package includes routines for:
-
Creating, dropping, and performing other DDL operations on machine learning models
-
Obtaining detailed information about model attributes, rules, and other information internal to the model (model details)
-
Computing test metrics for classification models
-
Specifying costs for classification models
-
Exporting and importing models
-
Building models using Oracle Machine Learning native algorithms as well as algorithms written in R
Related Topics
About Oracle Machine Learning Models
Machine learning models are database schema objects that perform machine learning techniques.
As with all schema objects, access to machine learning models is controlled by database privileges. Models can be exported and imported. They support comments and they can be tracked in the Oracle Database auditing system.
Machine learning models are created by the
CREATE_MODEL2
or the CREATE_MODEL
procedures in
the DBMS_DATA_MINING
PL/SQL package. Models are created for a specific
machine learning technique, and they use a specific
algorithm to perform that function. Machine learning technique
is a term that refers to a class of machine learning problems to be
solved. Examples of machine learning techniques
are: regression, classification, attribute importance, clustering, anomaly detection, and feature selection. OML4SQL supports one or more algorithms for each machine learning technique.
Along with the machine learning technique, in the CREATE_MODEL2
procedure, you can specify an algorithm and other characteristics of a model. In CREATE_MODEL
procedure you can specify a settings table to specify an algorithm and other characteristics of a model. Some settings are general, some are specific to a machine learning technique, and some are specific to an algorithm.
Note:
Most types of machine learning models can be used to score data. However, it is possible to score data without applying a model. Dynamic scoring and predictive analytics return scoring results without a user-supplied model. They create and apply transient models that are not visible to you.
DBMS_DATA_MINING_TRANSFORM
The DBMS_DATA_MINING_TRANSFORM
package contains routines that perform data transformations such as binning, normalization, and outlier treatment.
The package includes routines for:
-
Specifying transformations in a format that can be embedded in a machine learning model.
-
Specifying transformations as relational views (external to machine learning model objects).
-
Specifying distinct properties for columns in the build data. For example, you can specify that the column must be interpreted as unstructured text, or that the column must be excluded from Automatic Data Preparation.
Related Topics
Transformation Methods in DBMS_DATA_MINING_TRANSFORM
Summarizes the methods for transforming data in DBMS_DATA_MINING_TRANSFORM
package.
Table 3-2 DBMS_DATA_MINING_TRANSFORM Transformation Methods
Transformation Method | Description |
---|---|
|
|
|
|
|
Specifies transformations for embedding in a model |
oml4sql-classification-text-mining-svm.sql
example.
Example 3-1 Sample Embedded Transformation
DECLARE xformlist dbms_data_mining_transform.TRANSFORM_LIST; BEGIN dbms_data_mining_transform.SET_TRANSFORM( xformlist, 'comments', null, 'comments', null, 'TEXT'); DBMS_DATA_MINING.CREATE_MODEL( model_name => 'T_SVM_Clas_sample', mining_function => dbms_data_mining.classification, data_table_name => 'mining_build_text', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', settings_table_name => 't_svmc_sample_settings', xform_list => xformlist); END; /
Predictive Analytics
Predictive analytics is a technology that captures Oracle Machine Learning for SQL processes in simple routines.
Sometimes called "one-click machine learning," predictive analytics simplifies and automates the machine learning process.
Predictive analytics uses OML4SQL technology, but knowledge of OML4SQL is not needed to use predictive analytics. You can use predictive analytics by specifying an operation to perform on your data. You do not need to create or use OML4SQL models or understand the OML4SQL functions and algorithms summarized in "Oracle Machine Learning for SQL Basics ".
Oracle Machine Learning for SQL predictive analytics operations are described in the following table:
Table 3-3 Oracle Predictive Analytics Operations
Operation | Description |
---|---|
|
Explains how individual predictors (columns) affect the variation of values in a target column |
|
For each case (row), predicts the values in a target column |
|
Creates a set of rules for cases (rows) that imply the same target value |
The Oracle predictive analytics operations are implemented in the DBMS_PREDICTIVE_ANALYTICS
PL/SQL package. They are also available in Oracle Data Miner.
Related Topics
DBMS_PREDICTIVE_ANALYTICS
The DBMS_PREDICTIVE_ANALYTICS
package contains routines that perform an automated form of machine learning known as predictive analytics. With predictive analytics, you do not need to be aware of model building or scoring. All machine learning activities are handled internally by the procedure.
The DBMS_PREDICTIVE_ANALYTICS
package includes these routines:
-
EXPLAIN
ranks attributes in order of influence in explaining a target column. -
PREDICT
predicts the value of a target column based on values in the input data. -
PROFILE
generates rules that describe the cases from the input data.
The EXPLAIN
statement in the following example lists attributes in the view mining_data_build_v
in order of their importance in predicting affinity_card
.
Related Topics
Oracle Machine Learning Data Dictionary Views
Lists Oracle Machine Learning data dictionary views.
The data dictionary views for Oracle Machine Learning are listed in the following table. A database administrator (DBA) and USER versions of the views are also available.
Table 3-4 Data Dictionary Views for Oracle Machine Learning
View Name | Description |
---|---|
Provides information about all accessible machine learning models |
|
Provides information about the attributes of all accessible machine learning models |
|
Provides information about the partitions of all accessible partitioned machine learning models |
|
Provides information about the configuration settings for all accessible machine learning models |
|
Provides information about the model views for all accessible machine learning models |
|
Provides the user-specified transformations embedded in all accessible machine learning models. |
SQL Functions
Oracle Machine Learning for SQL supports SQL functions for performing prediction, clustering, and feature extraction.
The functions score data by applying an OML4SQL model object or by running an analytic clause that performs dynamic scoring.
The following example shows a query that applies the classification model svmc_sh_clas_sample
to the data in the view mining_data_apply_v
. The query returns the average age of customers who are likely to use an affinity card. The results are broken out by gender.
Example 3-3 The PREDICTION Function
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(svmc_sh_clas_sample USING *) = 1 GROUP BY cust_gender ORDER BY cust_gender;The output is as follows:
C CNT AVG_AGE
- ---------- ----------
F 59 41
M 409 45
Related Topics
Oracle Machine Learning for SQL Scoring Functions
Use OML4SQL functions score data. Functions can apply a machine learning model schema object to data or dynamically mine it with an analytic clause. SQL functions exist for all OML4SQL scoring algorithms.
All OML4SQL functions, as listed in the following table can operate on an R machine learning model with the corresponding OML4SQL function. However, the functions are not limited to the ones listed here.
Table 3-5 OML4SQL Functions
Function | Description |
---|---|
Returns the ID of the predicted cluster |
|
Returns detailed information about the predicted cluster |
|
Returns the distance from the centroid of the predicted cluster |
|
Returns the probability of a case belonging to a given cluster |
|
Returns a list of all possible clusters to which a given case belongs along with the associated probability of inclusion |
|
FEATURE_COMPARE |
Compares two similar and dissimilar set of texts from two different documents or keyword phrases or a combination of both |
Returns the ID of the feature with the highest coefficient value |
|
Returns detailed information about the predicted feature |
|
Returns a list of objects containing all possible features along with the associated coefficients |
|
Returns the value of the predicted feature |
|
ORA_DM_PARTITION_NAME |
Returns the partition names for a partitioned model |
Returns the best prediction for the target |
|
(GLM only) Returns the upper and lower bounds of the interval wherein the predicted values (linear regression) or probabilities (logistic regression) lie. |
|
Returns a measure of the cost of incorrect predictions |
|
Returns detailed information about the prediction |
|
Returns the probability of the prediction |
|
Returns the results of a classification model, including the predictions and associated probabilities for each case |
|
Generates a single vector embedding for different data types |
The following example shows a query that returns the
results of the CLUSTER_ID
function. The query applies the model
em_sh_clus_sample, which finds groups of customers that share certain characteristics. The
query returns the identifiers of the clusters and the number of customers in each cluster.
The em_sh_clus_sample model is created by the
oml4sql-clustering-expectation-maximization.sql
example.
Example 3-4 CLUSTER_ID Function
-- -List the clusters into which the customers in this -- -data set have been grouped. -- SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt FROM mining_data_apply_v GROUP BY CLUSTER_ID(em_sh_clus_sample USING *) ORDER BY cnt DESC; -- List the clusters into which the customers in this -- data set have been grouped. -- SELECT CLUSTER_ID(em_sh_clus_sample USING *) AS clus, COUNT(*) AS cnt FROM mining_data_apply_v GROUP BY CLUSTER_ID(em_sh_clus_sample USING *) ORDER BY cnt DESC;
The output is as follows:
CLUS CNT
---------- ----------
9 311
3 294
7 215
12 201
17 123
16 114
14 86
19 64
15 56
18 36
Related Topics
Oracle Machine Learning for SQL Statistical Functions
Various SQL statistical functions are available in Oracle Database to explore and analyze data.
A variety of scalable statistical functions are accessible through SQL in Oracle Database. These statistical functions are implemented as SQL functions. The SQL statistical functions can be used to compute standard univariate statistics such as MEAN
, MAX
, MIN
, MEDIAN
, MODE
, and standard deviation on the data. Users can also perform various other statistical functions such as t-test, f-test, aggregate functions, analytic functions, or ANOVA. The functions listed in the following table are available from SQL.
Table 3-6 SQL Statistical Functions Supported by OML4SQL
Function | Description |
---|---|
APPROX_COUNT |
Returns approximate count of an expression |
APPROX_SUM |
Returns approximate sum of an expression |
APPROX_RANK |
Returns approximate value in a group of values |
CORR |
Retuns the coefficient of correlation of a set of number pairs |
CORR_S |
Calculates the Spearman's rho correlation coefficient |
CORR_K |
Calculates the Kendall's tau-b correlation coefficient |
COVAR_POP |
Returns the population covariance of a set of number pairs |
COVAR_SAMP |
Returns the sample covariance of a set of number pairs. |
LAG |
LAG is an analytic function. It provides access to more than one row of a table at the same time without a self join.
|
LEAD |
LEAD is an analytic function. It provides access to more than one row of a table at the same time without a self join.
|
STATS_BINOMIAL_TEST |
STATS_BINOMIAL_TEST is an exact probability test used for dichotomous variables, where only two possible values exist.
|
STATS_CROSSTAB |
STATS_CROSSTAB is a method used to analyze two nominal variables.
|
STATS_F_TEST |
STATS_F_TEST tests whether two variances are significantly different.
|
STATS_KS_TEST |
STATS_KS_TEST is a Kolmogorov-Smirnov function that compares two samples to test whether they are from the same population or from populations that have the same distribution.
|
STATS_MODE |
Takes as its argument a set of values and returns the value that occurs with the greatest frequency |
STATS_MW_TEST |
A Mann Whitney test compares two independent samples to test the null hypothesis that two populations have the same distribution function against the alternative hypothesis that the two distribution functions are different. |
STATS_ONE_WAY_ANOVA |
Tests differences in means (for groups or variables) for statistical significance by comparing two different estimates of variance |
STATS_T_TEST_* |
The t-test measures the significance of a difference of means |
STATS_T_TEST_ONE |
A one-sample t-test |
STATS_T_TEST_PAIRED |
A two-sample, paired t-test (also known as a crossed t-test) |
STATS_T_TEST_INDEP and STATS_T_TEST_INDEPU |
A t-test of two independent groups with the same variance (pooled variances)
A t-test of two independent groups with unequal variance (unpooled variances) |
STDDEV |
returns the sample standard deviation of a set of numbers |
STDDEV_POP |
Computes the population standard deviation and returns the square root of the population variance |
STDDEV_SAMP |
Computes the cumulative sample standard deviation and returns the square root of the sample variance |
SUM |
Returns the sum of values |
DBMS_STAT_FUNCS
PL/SQL package is also available for users.
Oracle Data Miner
Oracle Machine Learning for SQL supports a graphical interface called Oracle Data Miner.
Oracle Data Miner is a graphical interface to OML4SQL. Oracle Data Miner is an extension to Oracle SQL Developer, which is available for download free of charge on the Oracle Technology Network.
Oracle Data Miner uses a work flow paradigm to capture, document, and automate the process of building, evaluating, and applying OML4SQL models. Within a work flow, you can specify data transformations, build and evaluate multiple models, and score multiple data sets. You can then save work flows and share them with other users.
For information about Oracle Data Miner, including installation instructions, visit Oracle Technology Network.
Related Topics