CUR Matrix Decomposition

16 CUR Matrix Decomposition

Learn to use the CUR Matrix Decomposition algorithm for identifying important attributes.

About CUR Matrix Decomposition
CUR Matrix Decomposition is a low-rank matrix decomposition algorithm that is explicitly expressed in a small number of actual columns and/or actual rows of data matrix.
Singular Vectors
Singular Value Decomposition (SVD) initiates CUR Matrix Decomposition by providing singular vectors essential for calculating leverage scores.
Statistical Leverage Score
Statistical leverage scores highlight the most representative columns or rows, aiding in the selection of important data points.
Column (Attribute) Selection and Row Selection
CUR Matrix Decomposition identifies and ranks attributes and rows by their leverage scores, ensuring high importance in analysis.
CUR Matrix Decomposition Algorithm Configuration
Configure the CUR Matrix Decomposition algorithm setting to build your model.

Related Topics

Parent topic: Algorithms

16.1 About CUR Matrix Decomposition

CUR Matrix Decomposition is a low-rank matrix decomposition algorithm that is explicitly expressed in a small number of actual columns and/or actual rows of data matrix.

CUR Matrix Decomposition was developed as an alternative to Singular Value Decomposition (SVD) and Principal Component Analysis (PCA). CUR Matrix Decomposition selects columns and rows that exhibit high statistical leverage or large influence from the data matrix. By implementing the CUR Matrix Decomposition algorithm, a small number of most important attributes and/or rows can be identified from the original data matrix. Therefore, CUR Matrix Decomposition is an important tool for exploratory data analysis. CUR Matrix Decomposition can be applied to a variety of areas and facilitates regression, classification, and clustering.

Related Topics

Data Preparation for SVD

Parent topic: CUR Matrix Decomposition

16.2 Singular Vectors

Singular Value Decomposition (SVD) initiates CUR Matrix Decomposition by providing singular vectors essential for calculating leverage scores.

SVD returns left and right singular vectors for calculating column and row leverage scores. Perform SVD on the following matrix:

A ε R^mxn

The matrix is factorized as follows:

A=UΣV^T

where U = [u¹ u²...u^m] and V = [v¹ v²...vⁿ] are orthogonal matrices.

Σ is a diagonal m × n matrix with non-negative real numbers σ1,...,σ_ρ on the diagonal, where ρ = min {m,n} and σ_ξ is the ξ^th singular value of A.

Let u^ξ and v^ξ be the ξ^th left and right singular vector of A, the j^th column of A can thus be approximated by the top k singular vectors and corresponding singular values as:

Description of the illustration singular_vectors.eps

where v^ξ_j is the j^th coordinate of the ξ^th right singular vector.

Parent topic: CUR Matrix Decomposition

16.3 Statistical Leverage Score

Statistical leverage scores highlight the most representative columns or rows, aiding in the selection of important data points.

Leverage scores are statistics that determine which column (or rows) are most representative with respect to a rank subspace of a matrix. The statistical leverage scores represent the column (or attribute) and row importance. The normalized statistical leverage scores for all columns are computed from the top k right singular vectors as follows:

Description of statistical_leverage_score.eps follows

Description of the illustration statistical_leverage_score.eps

where k is called rank parameter and

j =
                1,...,n

. Given that π_j>=0 and

Description of the illustration statistical_leverage_score_1.eps

, these scores form a probability distribution over the n columns.

Similarly, the normalized statistical leverage scores for all rows are computed from the top k left singular vectors as:

Description of statistical_leverage_score_2.eps follows

Description of the illustration statistical_leverage_score_2.eps

where i = 1,...,m.

Parent topic: CUR Matrix Decomposition

16.4 Column (Attribute) Selection and Row Selection

CUR Matrix Decomposition identifies and ranks attributes and rows by their leverage scores, ensuring high importance in analysis.

The CUR matrix decomposition in Oracle Machine Learning is designed for attribute and or row importance. It returns attributes and rows with high importance that are ranked by their leverage (importance) scores. Column (attribute) selection and row selection is the final stage in CUR. Attribute selection: Selects attributes with high leverage scores and reports their names, scores (as importance) and ranks (by importance).

Row selection: Selects rows with high leverage scores and reports their names, scores (as importance) and ranks (by importance).

CUR first selects the j^th column (or attribute) of A with probability p_j= min {1,cπ_j} for all j ε {1,...,n}
If users enable row selection, select i^th row of A with probability pˊ_i = min {1,rπˊ_i} for all i ε {1,...,m}
Report the name (or ID) and leverage score (as importance) for all selected attributes (if row importance is disabled) or for all selected attributes and rows (if row importance is enabled).

c is the approximated (or expected) number of columns that users want to select, and r is the approximated (or expected) number of rows that users want to select.

To realize column and row selections, you need to calculate the probability to select each column and row.

Calculate the probability for each column as follows:

p_j = min {1,cπ_j}

Calculate the probability for each row as follows:

pˊ_i = min{1, cπˊ_i}.

A column or row is selected if the probability is greater than some threshold.

Parent topic: CUR Matrix Decomposition

16.5 CUR Matrix Decomposition Algorithm Configuration

Configure the CUR Matrix Decomposition algorithm setting to build your model.

Create a model with the algorithm specific settings. Define the algorithm name as ALGO_CUR_DECOMPOSITION and mining function as ATTRIBUTE_IMPORTANCE.