8.1 About Machine Learning Classes and Algorithms

These classes provide access to in-database machine learning algorithms.

Algorithm Classes

Class Algorithm Function of Algorithm Description
oml.ai

Minimum Description Length

Attribute importance for classification or regression

Ranks attributes according to their importance in predicting a target.

oml.ar

Apriori

Association rules

Performs market basket analysis by identifying co-occurring items (frequent itemsets) within a set.

oml.dt

Decision Tree

Classification

Extracts predictive information in the form of human-understandable rules. The rules are if-then-else expressions; they explain the decisions that lead to the prediction.

oml.em

Expectation Maximization

Clustering Performs probabilistic clustering based on a density estimation algorithm.
oml.esa

Explicit Semantic Analysis

Feature extraction

Extracts text-based features from a corpus of documents. Performs document similarity comparisons.

oml.glm

Generalized Linear Model

Classification

Regression

Implements logistic regression for classification of binary targets and linear regression for continuous targets.

oml.km

k-Means

Clustering

Uses unsupervised learning to group data based on similarity into a predetermined number of clusters.

oml.nb

Naive Bayes

Classification

Makes predictions by deriving the probability of a prediction from the underlying evidence, as observed in the data.

oml.nn

Neural Network

Classification

Regression

Learns from examples and tunes the weights of the connections among the neurons during the learning process.

oml.rf

Random Forest

Classification

Provides an ensemble learning technique for classification of data.

oml.svd

Singular Value Decomposition

Feature extraction

Performs orthogonal linear transformations that capture the underlying variance of the data by decomposing a rectangular matrix into three matrices.

oml.svm

Support Vector Machine

Anomaly detection

Classification

Regression

Builds a model that is a profile of a class, which, when the model is applied, identifies cases that are somehow different from that profile.

Repeatable Results

You can use the case_id parameter in the fit method of the OML4Py machine learning algorithm classes to achieve repeatable sampling, data splits (train and held aside), and random data shuffling.

Persisting Models

In-database models created through the OML4Py API exist as temporary objects that are dropped when the database connection ends unless you take one of the following actions:

  • Save a default-named model object in a datastore, as in the following example:
    regr2 = oml.glm("regression")
    oml.ds.save(regr2, 'regression2')
  • Use the model_name parameter in the fit function when building the model, as in the following example:
    regr2 = regr2.fit(X, y, model_name = 'regression2')
  • Change the name of an existing model using the model_name function of the model, as in the following example:
    regr2(model_name = 'myRegression2')

To drop a persistent named model, use the oml.drop function.

Creating a Model from an Existing In-Database Model

You can create an OML4Py model as a proxy object for an existing in-database machine learning model. The in-database model could have been created through OML4Py, OML4SQL, or OML4R. To do so, when creating the OML4Py, specify the name of the existing model and, optionally, the name of the owner of the model, as in the following example.

ar_mod = oml.ar(model_name = 'existing_ar_model', model_owner = 'SH', **setting)

An OML4Py model created this way persists until you drop it with the oml.drop function.

Scoring New Data with a Model

For most of the OML4Py machine learning classes, you can use the predict and predict_proba methods of the model object to score new data.

For in-database models, you can use the SQL PREDICTION function on model proxy objects, which scores directly in the database. You can use in-database models directly from SQL if you prepare the data properly. For open source models, you can use Embedded Python Execution and enable data-parallel execution for performance and scalability.

Deploying Models Through a REST API

The REST API for Oracle Machine Learning Services provides REST endpoints hosted on an Oracle Autonomous Database instance. These endpoints allow you to store OML models along with their metadata, and to create scoring endpoints for the models.