This chapter describes Support Vector Machines, a powerful algorithm based on statistical learning theory. Support Vector Machines is implemented by Oracle Data Mining for classification, regression, and anomaly detection.
Reference:
Milenova, B.L., Yarmus, J.S., Campos, M.M., "SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines", Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005.
This chapter contains the following sections:
Support Vector Machines (SVM) is a powerful, stateoftheart algorithm with strong theoretical foundations based on the VapnikChervonenkis theory. SVM has strong regularization properties. Regularization refers to the generalization of the model to new data.
SVM models have similar functional form to neural networks and radial basis functions, both popular data mining techniques. However, neither of these algorithms has the wellfounded theoretical approach to regularization that forms the basis of SVM. The quality of generalization and ease of training of SVM is far beyond the capacities of these more traditional methods.
SVM can model complex, realworld problems such as text and image classification, handwriting recognition, and bioinformatics and biosequence analysis.
SVM performs well on data sets that have many attributes, even if there are very few cases on which to train the model. There is no upper limit on the number of attributes; the only constraints are those imposed by hardware. Traditional neural nets do not perform well under these circumstances.
Oracle Data Mining has its own proprietary implementation of SVM, which exploits the many benefits of the algorithm while compensating for some of the limitations inherent in the SVM framework. Oracle Data Mining SVM provides the scalability and usability that are needed in a production quality data mining system.
Usability is a major enhancement, because SVM has often been viewed as a tool for experts. The algorithm typically requires data preparation, tuning, and optimization. Oracle Data Mining minimizes these requirements. You do not need to be an expert to build a quality SVM model in Oracle Data Mining. For example:
Data preparation is not required in most cases. (See "Data Preparation for SVM" .)
Default tuning parameters are generally adequate. (See "Tuning an SVM Model" .)
When dealing with very large data sets, sampling is often required. However, sampling is not required with Oracle Data Mining SVM, because the algorithm itself uses stratified sampling to reduce the size of the training data as needed.
Oracle Data Mining SVM is highly optimized. It builds a model incrementally by optimizing small working sets toward a global solution. The model is trained until convergence on the current working set, then the model adapts to the new data. The process continues iteratively until the convergence conditions are met. The Gaussian kernel uses caching techniques to manage the working sets. See "KernelBased Learning".
Oracle Data Mining SVM supports active learning, an optimization method that builds a smaller, more compact model while reducing the time and memory resources required for training the model. See "Active Learning".
SVM is a kernelbased algorithm. A kernel is a function that transforms the input data to a highdimensional space where the problem is solved. Kernel functions can be linear or nonlinear.
Oracle Data Mining supports linear and Gaussian (nonlinear) kernels.
In Oracle Data Mining, the linear kernel function reduces to a linear equation on the original attributes in the training data. A linear kernel works well when there are many attributes in the training data.
The Gaussian kernel transforms each case in the training data to a point in an ndimensional space, where n is the number of cases. The algorithm attempts to separate the points into subsets with homogeneous target values. The Gaussian kernel uses nonlinear separators, but within the kernel space it constructs a linear equation.
Active learning is an optimization method for controlling model growth and reducing model build time. Without active learning, SVM models grow as the size of the build data set increases, which effectively limits SVM models to small and medium size training sets (less than 100,000 cases). Active learning provides a way to overcome this restriction. With active learning, SVM models can be built on very large training sets.
Active learning forces the SVM algorithm to restrict learning to the most informative training examples and not to attempt to use the entire body of data. In most cases, the resulting models have predictive accuracy comparable to that of a standard (exact) SVM model.
Active learning provides a significant improvement in both linear and Gaussian SVM models, whether for classification, regression, or anomaly detection. However, active learning is especially advantageous for the Gaussian kernel, because nonlinear models can otherwise grow to be very large and can place considerable demands on memory and other system resources.
SVM has builtin mechanisms that automatically choose appropriate settings based on the data. You may need to override the systemdetermined settings for some domains.
The build settings described in Table 181 are available for configuring SVM models. Settings pertain to regression, classification, and anomaly detection unless otherwise specified.
Table 181 Build Settings for Support Vector Machines
Setting Name  Configures....  Description 


Kernel 
Linear or Gaussian. The algorithm automatically uses the kernel function that is most appropriate to the data. SVM uses the linear kernel when there are many attributes (more than 100) in the training data, otherwise it uses the Gaussian kernel. See "KernelBased Learning". The number of attributes does not correspond to the number of columns in the training data. SVM explodes categorical attributes to binary, numeric attributes. In addition, Oracle Data Mining interprets each row in a nested column as a separate attribute. See "Data Preparation for SVM". 

Standard deviation for Gaussian kernel 
Controls the spread of the Gaussian kernel function. SVM uses a datadriven approach to find a standard deviation value that is on the same scale as distances between typical cases. 

Cache size for Gaussian kernel 
Amount of memory allocated to the Gaussian kernel cache maintained in memory to improve model build time. The default cache size is 50 MB. 

Active learning 
Whether or not to use active learning. This setting is especially important for nonlinear (Gaussian) SVM models. By default, active learning is enabled. See "Active Learning". 

Complexity factor 
Regularization setting that balances the complexity of the model against model robustness to achieve good generalization on new data. SVM uses a datadriven approach to finding the complexity factor. 

Convergence tolerance 
The criterion for completing the model training process. The default is 0.001. 

Epsilon factor for regression 
Regularization setting for regression, similar to complexity factor. Epsilon specifies the allowable residuals, or noise, in the data. 

Outliers for anomaly detection 
The expected outlier rate in anomaly detection. The default rate is 0.1. 
See Also:
Oracle Database PL/SQL Packages and Types Reference for details about SVM settingsThe SVM algorithm operates natively on numeric attributes. The algorithm automatically "explodes" categorical data into a set of binary attributes, one per category value. For example, a character column for marital status with values married
or single
would be transformed to two numeric attributes: married
and single
. The new attributes could have the value 1 (true) or 0 (false).
When there are missing values in columns with simple data types (not nested), SVM interprets them as missing at random. The algorithm automatically replaces missing categorical values with the mode and missing numerical values with the mean.
When there are missing values in nested columns, SVM interprets them as sparse. The algorithm automatically replaces sparse numerical data with zeros and sparse categorical data with zero vectors.
SVM requires the normalization of numeric input. Normalization places the values of numeric attributes on the same scale and prevents attributes with a large original scale from biasing the solution. Normalization also minimizes the likelihood of overflows and underflows. Furthermore, normalization brings the numerical attributes to the same scale (0,1) as the exploded categorical data.
The SVM algorithm automatically handles missing value treatment and the transformation of categorical data, but normalization and outlier detection must be handled by ADP or prepared manually. ADP performs minmax normalization for SVM.
Note:
Oracle Corporation recommends that you use Automatic Data Preparation with SVM. The transformations performed by ADP are appropriate for most models.SVM classification is based on the concept of decision planes that define decision boundaries. A decision plane is one that separates between a set of objects having different class memberships. SVM finds the vectors ("support vectors") that define the separators giving the widest separation of classes.
SVM classification supports both binary and multiclass targets.
In SVM classification, weights are a biasing mechanism for specifying the relative importance of target values (classes).
SVM models are automatically initialized to achieve the best average prediction across all classes. However, if the training data does not represent a realistic distribution, you can bias the model to compensate for class values that are underrepresented. If you increase the weight for a class, the percent of correct predictions for that class should increase.
The Oracle Data Mining APIs use priors to specify class weights for SVM. To use priors in training a model, you create a priors table and specify its name as a build setting for the model.
Priors are associated with probabilistic models to correct for biased sampling procedures. SVM uses priors as a weight vector that biases optimization and favors one class over another.
See Also:
"Priors"Oracle Data Mining uses SVM as the oneclass classifier for anomaly detection. When SVM is used for anomaly detection, it has the classification mining function but no target.
Oneclass SVM models, when applied, produce a prediction and a probability for each case in the scoring data. If the prediction is 1, the case is considered typical. If the prediction is 0, the case is considered anomalous. This behavior reflects the fact that the model is trained with normal data.
You can specify the percentage of the data that you expect to be anomalous with the SVMS_OUTLIER_RATE
build setting. If you have some knowledge that the number of ÒsuspiciousÓ cases is a certain percentage of your population, then you can set the outlier rate to that percentage. The model will identify approximately that many ÒrareÓ cases when applied to the general population. The default is 10%, which is probably high for many anomaly detection problems.
SVM uses an epsiloninsensitive loss function to solve regression problems.
SVM regression tries to find a continuous function such that the maximum number of data points lie within the epsilonwide insensitivity tube. Predictions falling within epsilon distance of the true target value are not interpreted as errors.
The epsilon factor is a regularization setting for SVM regression. It balances the margin of error with model robustness to achieve the best generalization to new data. See Table 181 for descriptions of build settings for SVM.