Low Rank Decomposition

Singular Value Decomposition (SVD) keeps lower-order bases (the ones with the largest singular values) and ignores higher-order bases (the ones with the smallest singular values) to capture the most important aspects of the data.

To reduce dimensionality, SVD keeps lower-order bases and ignores higher-order bases. The rationale behind this strategy is that the low-order bases retain the characteristics of the data that contribute most to its variance and are likely to capture the most important aspects of the data.

Given a data set X (nxm), where n is the number of rows and m is the number of attributes, a low-rank SVD uses only k components (k <= min(m, n)). In typical implementations of SVD, the value of k requires a visual inspection of the ranked singular values associated with the individual components. In Oracle Machine Learning for SQL, SVD automatically estimates the cutoff point, which corresponds to a significant drop in the explained variance.

SVD produces two sets of orthonormal bases (U and V). Either of these bases can be used as a new coordinate system. In Oracle Machine Learning for SQL, SVD, V is the new coordinate system, and U represents the projection of X in this coordinate system. The algorithm computes the projection of new data as follows:

Figure 7-13 Computing Projection of New Data

Description of Figure 7-13 follows
Description of "Figure 7-13 Computing Projection of New Data"

where X (nxk) is the projected data in the reduced data space, defined by the first k components, and Vk and Sk define the reduced component set.