Automatic Data Preparation
Most algorithms require some form of data transformation. During the model build process, Oracle Machine Learning for SQL can automatically perform the transformations required by the algorithm.
You can choose to supplement the automatic transformations with additional transformations of your own, or you can choose to manage all the transformations yourself.
In calculating automatic transformations, OML4SQL uses heuristics that address the common requirements of a given algorithm. This process results in reasonable model quality in most cases.
Binning and normalization are transformations that are commonly needed by machine learning algorithms.
Related Topics
Binning
Binning, also called discretization, is a technique for reducing the cardinality of continuous and discrete data. Binning groups related values together in bins to reduce the number of distinct values.
Binning can improve resource utilization and model build response time dramatically without significant loss in model quality. Binning can improve model quality by strengthening the relationship between attributes.
Supervised binning is a form of intelligent binning in which important characteristics of the data are used to determine the bin boundaries. In supervised binning, the bin boundaries are identified by a single-predictor decision tree that takes into account the joint distribution with the target. Supervised binning can be used for both numerical and categorical attributes.
How ADP Transforms the Data
The following table shows how ADP prepares the data for each algorithm.
Table 35-2 Oracle Machine Learning Algorithms With ADP
See Also:
-
Part III, Algorithms, in Oracle Machine Learning for SQL Concepts for more information about algorithm-specific data preparation