Changes in This Release for Oracle Data Mining Concepts Guide

Changes in this release for Oracle Data Mining Concepts.

Changes in Oracle Data Mining 18c

The following changes are documented in Oracle Data Mining Concepts for 18c.

New Features

The following features are new in this release:

New Mining Function

Time Series

Time series analysis provides forecasts of future values based on past history. For example, forecasting sales based on the prior sequence of sales.  Forecasting is a critical component of business and governmental decision making.

See Time Series.

New Algorithms

  • Random Forest

    Random Forest is a powerful machine learning algorithm. It uses an ensemble method that combines multiple trees built with random feature selection. Effectively, individual trees are built in random subspaces and combined using the bagging ensemble method.

    Random forest is a very popular algorithm which has excellent performance on a number of benchmarks. It is part of Oracle R Enterprise (ORE) but the implementation is based on a public R package. Implementing it as kernel code brings significant performance and scalability benefits.

    See Random Forest.

  • Enhanced Explicit Semantic Analysis Machine Learning Algorithm to Support Classification

    Explicit Semantic Analysis (ESA) is exposed in Oracle Database 12c Release 2, as a topic model only under FEATURE_EXTRACTION. It typically uses hundreds of thousands of explicit features. The algorithm can be easily adapted to perform Classification to address use cases with hundreds of thousands of classes of challenging Classification problem that is not appropriately addressed by the current Oracle Advanced Analytics (OAA) algorithms.

    The task of large text classification is very important in the context of big data. Extending ESA to Classification significantly enhances our offering in the text classification domain and allows OAA to address use cases which are currently intractable for the product.

    See Explicit Semantic Analysis.

  • Neural Network

    The Neural Network algorithm is a biologically inspired approach where a collection of interconnected units (neurons) learn to approximate a function. Neural Networks are appropriate for nonlinear approximation in both Classification and Regression problems.

    Neural networks are powerful algorithms that can learn arbitrary nonlinear functions. There have been successfully used in a number of hard problems, including non-linear regression or time series, computer vision, and speech recognition.

    See Neural Network.

  • CUR Decomposition-based Algorithm for Attribute and Row Importance

    The CUR algorithm allows users to find the columns and features that best explain their data. This algorithm has gained popularity because it allows the user to gain insight into their data using easily understandable terms. In contrast, decomposition method like Singular Value Decomposition (SVD) derive implicit features that are hard to interpret. CUR is tries to use the insights derived from SVD but translate them in terms of the original rows and columns.

    A CUR-based attribute and row importance can be used to provide data insight as well as a data filter followed by additional analytical processing. This will be the first Oracle Advanced Analytics (OAA) algorithm that singles out not only important columns but important rows.

    See CUR Matrix Decomposition.

  • Exponential Smoothing

    Exponential Smoothing (ESM) allows users to make predictions from time series data. Exponential Smoothing Methods (ESM) are widely used for forecasting from time series data. Originally, thought to be less flexible and accurate than competitors, such as ARIMA, ESM has more recently been shown to cover a broader class of models and has been extended to increase both its descriptive realism and accuracy. Oracle ESM includes many of these recent extensions, a total of 14 models, including the popular Holt (trend) and Holt-Winters (trend and seasonality) models, and the ability to handle irregular time series intervals.

    See Exponential Smoothing.

Algorithm Enhancements

  • Algorithm Meta Data Registration

    The algorithm meta data registration simplifies and streamlines the integration of new algorithms in the R extensibility framework. This feature allows a uniform consistent approach of registering new algorithm functions and their settings.

    The integration of new algorithms in the extensibility framework will be simplified. The GUI will be able to seamlessly pick up and support such new algorithms.

    See About Algorithm Meta Data Registration.

  • Alternating Direction Method of Multipliers (ADMM)

    A new Generalized Liner Models (GLM) distributed solver Alternating Direction Method of Multipliers (ADMM) is introduced.

    See GLM Solvers.

  • Association Rules Sampling

    A new specialized sampling approach is introduced for Association Rules.

    See Improved Sampling.

New Administrative Tasks

IMPORT and EXPORT Serialized Models

Serialized machine learning models can be exported in a serialized object form. Serialized models can be moved to another platform for scoring.

See Oracle Data Mining User’s Guide and Oracle Database PL/SQL Packages and Types Reference.

Deprecated Features

The following features are deprecated in this release, and may be desupported in another release. See Oracle Database Upgrade Guide for a complete list of deprecated features in this release.

Desupported Features

See Oracle Database Upgrade Guide for a complete list of desupported features in this release.

Other Changes

The following is an additional change in Oracle Data Mining Concepts for 18c:

  • “Oracle Data Mining with R Extensibility” topic is moved from Chapter Introduction to Oracle Data Mining to a new chapter: R Extensibility.

  • Throughout the document, "mining function(s)" were replaced with "mining technique(s)" to distinguish between a mining methodology and an API or SQL function.