1 Introduction to Oracle Machine Learning for SQL
Introduces Oracle Machine Learning for SQL to perform a variety of machine learning tasks.
1.1 About Oracle Machine Learning for SQL
Oracle Machine Learning for SQL (OML4SQL) provides scalable in-database machine learning algorithms through PL/SQL and SQL APIs. The algorithms are fast and scalable, support algorithm-specific automatic data preparation, and can score in batch or real-time.
OML4SQL provides a powerful, state-of-the-art machine learning capability within Oracle Database. The parallelized algorithms in the database keep data under database control. There is no need to extract data to separate machine learning engines, which adds latency to data access and raises concerns about data security, storage, and recency. The algorithms are fast and scalable, support algorithm-specific automatic data preparation, and can score in batch or real-time. You can use OML4SQL to build and deploy predictive and descriptive machine learning applications, to add intelligent capabilities to existing applications, and to generate predictive queries for data exploration. OML4SQL provides explanatory prediction details when scoring data, so you can understand why an individual prediction is made.
OML4SQL offers a broad set of in-database algorithms for performing a variety of machine learning tasks, such as classification, regression, anomaly detection, feature extraction, clustering, and market basket analysis. The algorithms can work on standard case data, transactional data, star schemas, and unstructured text data. OML4SQL is uniquely suited to the analysis of very large data sets.
Oracle Machine Learning for SQL, along with Oracle Machine Learning for R and Oracle Machine Learning for Python, is a component of Oracle Machine Learning that provides three powerful APIs for in-database machine learning, among other features.
1.2 Oracle Machine Learning for SQL in the Database Kernel
Learn about the implementation of Oracle Machine Learning for SQL (OML4SQL) in Oracle Database kernel and its advantages.
OML4SQL is implemented in the Oracle Database kernel. OML4SQL models are first class database objects. Oracle Machine Learning for SQL processes use built-in features of Oracle Database to maximize scalability and make efficient use of system resources.
OML4SQL within Oracle Database offers many advantages:
-
No Data Movement: Some machine learning products require that the data be exported from a corporate database and converted to a specialized format. With OML4SQL, no data movement or conversion is needed. This makes the entire process less complex, time-consuming, and error-prone, and it allows for the analysis of very large data sets.
-
Security: Your data is protected by the extensive security mechanisms of Oracle Database. Moreover, specific database privileges are needed for different machine learning activities. Only users with the appropriate privileges can define, manipulate, or apply machine learning model objects.
-
Data Preparation and Administration: Most data must be cleansed, filtered, normalized, sampled, and transformed in various ways before it can be mined. Up to 80% of the effort in a machine learning project is often devoted to data preparation. OML4SQL can automatically manage key steps in the data preparation process. Additionally, Oracle Database provides extensive administrative tools for preparing and managing data.
-
Ease of Data Refresh: Machine learning processes within Oracle Database have ready access to refreshed data. OML4SQL can easily deliver machine learning results based on current data, thereby maximizing its timeliness and relevance.
-
Oracle Database Analytics: Oracle Database offers many features for advanced analytics and business intelligence. You can easily integrate machine learning with other analytical features of the database, such as statistical analysis and analytic views.
-
Oracle Technology Stack: You can take advantage of all aspects of Oracle's technology stack to integrate machine learning within a larger framework for business intelligence or scientific inquiry.
-
Domain Environment: Machine learning models have to be built, tested, validated, managed, and deployed in their appropriate application domain environments. Machine learning results may need to be post-processed as part of domain specific computations (for example, calculating estimated risks and response probabilities) and then stored into permanent repositories or data warehouses. With OML4SQL, the pre- and post-machine learning activities can all be accomplished within the same environment.
-
Application Programming Interfaces: The PL/SQL API and SQL language operators provide direct access to OML4SQL functionality in Oracle Database.
Related Topics
1.3 Oracle Machine Learning for SQL in Oracle Exadata
Understand how complex scoring and algorithmic processing is done using Oracle Exadata.
Scoring refers to the process of applying a OML4SQL model to data to generate predictions. The scoring process may require significant system resources. Vast amounts of data may be involved, and algorithmic processing may be very complex.
With OML4SQL, scoring can be off-loaded to intelligent Oracle Exadata Storage Servers where processing is extremely performant.
Oracle Exadata Storage Servers combine Oracle's smart storage software and Oracle's industry-standard hardware to deliver the industry's highest database storage performance. For more information about Oracle Exadata, visit the Oracle Technology Network.
Related Topics
1.4 About Partitioned Models
Introduces partitioned models to organize and represent multiple models.
When you build a model on your data set and apply it to new data, sometimes the
prediction may be generic that performs badly when run on new and evolving data. To
overcome this, the data set can be divided into different parts based on some
characteristics. Oracle Machine Learning for SQL supports
partitioned model. Partitioned models allow users to build a type of ensemble model for
each data partition. The top-level model has sub models that are automatically produced.
The sub models are based on the attribute options. For example, if your data set has an
attribute called REGION
with four values and you have defined it as the
partitioned attribute. Then, four sub models are created for this attribute. The sub
models are automatically managed and used as a single model. The partitioned model
automates a typical machine learning task and can potentially
achieve better accuracy through multiple targeted models.
The partitioned model and its sub models reside as first class, persistent database objects. Persistent means that the partitioned model has an on-disk representation. In a partition model, the performance of partitioned models with a large number of partitions is enhanced, and dropping a single model within a partition model is also improved.
To create a partitioned model, include the ODMS_PARTITION_COLUMNS
setting. To define the number of partitions, include the ODMS_MAX_PARTITIONS
setting. When you are making predictions, you must use the top-level model. The correct sub model is selected automatically based on the attribute, the attribute options, and the partition setting. You must include the partition columns as part of the USING
clause when scoring. The GROUPING
hint is an optional hint that applies to machine learning scoring functions when scoring partitioned models.
The partition names, key values, and the structure of the partitioned model are available in the ALL_MINING_MODEL_PARTITIONS
view.
Related Topics
See Also:
Oracle Database SQL Language Reference on how to useGROUPING
hint.
Oracle Machine Learning for SQL User’s Guide to understand more about partitioned models.
1.5 Interfaces to Oracle Machine Learning for SQL
Introduces supported interfaces for Oracle Machine Learning for SQL.
The programmatic interfaces to Oracle Machine Learning for SQL are PL/SQL for building and maintaining models and a family of SQL functions for scoring. OML4SQL also supports a graphical user interface, which is implemented as an extension to Oracle SQL Developer.
Oracle Predictive Analytics, a set of simplified OML4SQL routines, is built on top of OML4SQL and is implemented as a PL/SQL package.
1.5.1 PL/SQL API
The OML4SQL PL/SQL API is built into the DBMS_DATA_MINING
PL/SQL package, which has routines for building, testing, and maintaining machine learning models. This package also has a batch apply operation.
The following example shows part of a simple PL/SQL script for creating an SVM classification model called SVMC_SH_Clas_sample. The model build uses weights, specified in a weights table, and settings, specified in a settings table. The weights influence the weighting of target classes. The settings override default behavior. The model uses Automatic Data Preparation (prep_auto_on
setting). The model is trained on the data in mining_data_build_v.
Example 1-1 Creating a Classification Model
----------------------- CREATE AND POPULATE A CLASS WEIGHTS TABLE ------------ CREATE TABLE svmc_sh_sample_class_wt ( target_value NUMBER, class_weight NUMBER); INSERT INTO svmc_sh_sample_class_wt VALUES (0,0.35); INSERT INTO svmc_sh_sample_class_wt VALUES (1,0.65); COMMIT; ----------------------- CREATE AND POPULATE A SETTINGS TABLE ------------------ CREATE TABLE svmc_sh_sample_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(4000)); BEGIN INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES (dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines); INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES (dbms_data_mining.svms_kernel_function, dbms_data_mining.svms_linear); INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES (dbms_data_mining.clas_weights_table_name, 'svmc_sh_sample_class_wt'); INSERT INTO svmc_sh_sample_settings (setting_name, setting_value) VALUES (dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on); END; / ------------------------ CREATE THE MODEL ------------------------------------- BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'SVMC_SH_Clas_sample', mining_function => dbms_data_mining.classification, data_table_name => 'mining_data_build_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', settings_table_name => 'svmc_sh_sample_settings'); END; /
1.5.2 SQL Functions
Oracle Machine Learning for SQL supports SQL functions for performing prediction, clustering, and feature extraction.
The functions score data by applying an OML4SQL model object or by running an analytic clause that performs dynamic scoring.
The following example shows a query that applies the classification model svmc_sh_clas_sample
to the data in the view mining_data_apply_v
. The query returns the average age of customers who are likely to use an affinity card. The results are broken out by gender.
Example 1-2 The PREDICTION Function
SELECT cust_gender, COUNT(*) AS cnt, ROUND(AVG(age)) AS avg_age FROM mining_data_apply_v WHERE PREDICTION(svmc_sh_clas_sample USING *) = 1 GROUP BY cust_gender ORDER BY cust_gender;The output is as follows:
C CNT AVG_AGE
- ---------- ----------
F 59 41
M 409 45
Related Topics
1.5.3 Oracle Data Miner
Oracle Machine Learning for SQL supports a graphical interface called Oracle Data Miner.
Oracle Data Miner is a graphical interface to OML4SQL. Oracle Data Miner is an extension to Oracle SQL Developer, which is available for download free of charge on the Oracle Technology Network.
Oracle Data Miner uses a work flow paradigm to capture, document, and automate the process of building, evaluating, and applying OML4SQL models. Within a work flow, you can specify data transformations, build and evaluate multiple models, and score multiple data sets. You can then save work flows and share them with other users.
For information about Oracle Data Miner, including installation instructions, visit Oracle Technology Network.
Related Topics
1.5.4 Predictive Analytics
Predictive analytics is a technology that captures Oracle Machine Learning for SQL processes in simple routines.
Sometimes called "one-click machine learning," predictive analytics simplifies and automates the machine learning process.
Predictive analytics uses OML4SQL technology, but knowledge of OML4SQL is not needed to use predictive analytics. You can use predictive analytics by specifying an operation to perform on your data. You do not need to create or use OML4SQL models or understand the OML4SQL functions and algorithms summarized in "Oracle Machine Learning for SQL Basics ".
Oracle Machine Learning for SQL predictive analytics operations are described in the following table:
Table 1-1 Oracle Predictive Analytics Operations
Operation | Description |
---|---|
|
Explains how individual predictors (columns) affect the variation of values in a target column |
|
For each case (row), predicts the values in a target column |
|
Creates a set of rules for cases (rows) that imply the same target value |
The Oracle predictive analytics operations are implemented in the DBMS_PREDICTIVE_ANALYTICS
PL/SQL package. They are also available in Oracle Data Miner.
1.6 Overview of Database Analytics
Oracle Database supports native analytical features. Since all these features are on a common server, they can be combined efficiently. Analytical results can be integrated with Oracle Business Intelligence Suite Enterprise Edition and other BI tools.
The possibilities for combining different analytics are virtually limitless. Example 1-3 shows Oracle Machine Learning for SQL and text processing within a single SQL query. The query selects all customers who have a high propensity to attrite (> 80% chance), are valuable customers (customer value rating > 90), and have had a recent conversation with customer services regarding a Checking Plus account. The propensity to attrite information is computed using a OML4SQL model called tree_model
. The query uses the Oracle Text CONTAINS
operator to search call center notes for references to Checking Plus accounts.
The following table shows some of the built-in analytics that Oracle Database can do:
Table 1-2 Oracle Database Native Analytics
Analytical Feature | Description | Documented In... |
---|---|---|
Complex data transformations |
Data transformation is a key aspect of analytical applications and ETL (extract, transform, and load). You can use SQL expressions to implement data transformations, or you can use the
|
|
Oracle Database provides a long list of SQL statistical functions with support for: hypothesis testing (such as t-test, F-test), correlation computation (such as pearson correlation), cross-tab statistics, and descriptive statistics (such as median and mode). The |
Oracle Database SQL Language Reference and Oracle Database PL/SQL Packages and Types Reference |
|
Window and analytic SQL functions |
Oracle Database supports analytic and windowing functions for computing cumulative, moving, and centered aggregates. With windowing aggregate functions, you can calculate moving and cumulative versions of |
|
Linear algebra |
The |
|
OLAP |
Oracle OLAP supports multidimensional analysis and can be used to improve performance of multidimensional queries. Oracle OLAP provides functionality previously found only in specialized OLAP databases. Moving beyond drill-downs and roll-ups, Oracle OLAP also supports time-series analysis, modeling, and forecasting. |
|
Spatial analytics |
Oracle Spatial provides advanced spatial features to support high-end GIS and LBS solutions. Oracle Spatial's analysis and machine learning capabilities include functions for binning, detection of regional patterns, spatial correlation, colocation machine learning, and spatial clustering. Oracle Spatial also includes support for topology and network data models and analytics. The topology data model of Oracle Spatial allows one to work with data about nodes, edges, and faces in a topology. It includes network analysis functions for computing shortest path, minimum cost spanning tree, nearest-neighbors analysis, traveling salesman problem, among others. |
|
Graph |
The Property Graph delivers advanced graph query and analytics capabilities in Oracle Database. The in-memory graph server (PGX) provides a machine learning library, which supports graph-empowered machine learning algorithms. The machine learning library supports DeepWalk, supervised GraphWise, and Pg2vec algorithms. |
|
Text Analysis |
Oracle Text uses standard SQL to index, search, and analyze text and documents stored in the Oracle database, in files, and on the web. Oracle Text also supports automatic classification and clustering of document collections. Many of the analytical features of Oracle Text are layered on top of Oracle Machine Learning functionality. |
Example 1-3 SQL Query Combining Oracle Machine Learning for SQL and Oracle Text
SELECT A.cust_name, A.contact_info FROM customers A WHERE PREDICTION_PROBABILITY(tree_model, 'attrite' USING A.*) > 0.8 AND A.cust_value > 90 AND A.cust_id IN (SELECT B.cust_id FROM call_center B WHERE B.call_date BETWEEN '01-Jan-2005' AND '30-Jun-2005' AND CONTAINS(B.notes, 'Checking Plus', 1) > 0);