Column (Attribute) Selection and Row Selection

CUR Matrix Decomposition identifies and ranks attributes and rows by their leverage scores, ensuring high importance in analysis.

The CUR matrix decomposition in Oracle Machine Learning is designed for attribute and or row importance. It returns attributes and rows with high importance that are ranked by their leverage (importance) scores. Column (attribute) selection and row selection is the final stage in CUR. Attribute selection: Selects attributes with high leverage scores and reports their names, scores (as importance) and ranks (by importance).

Row selection: Selects rows with high leverage scores and reports their names, scores (as importance) and ranks (by importance).

CUR first selects the j^th column (or attribute) of A with probability p_j= min {1,cπ_j} for all j ε {1,...,n}
If users enable row selection, select i^th row of A with probability pˊ_i = min {1,rπˊ_i} for all i ε {1,...,m}
Report the name (or ID) and leverage score (as importance) for all selected attributes (if row importance is disabled) or for all selected attributes and rows (if row importance is enabled).

c is the approximated (or expected) number of columns that users want to select, and r is the approximated (or expected) number of rows that users want to select.

To realize column and row selections, you need to calculate the probability to select each column and row.

Calculate the probability for each column as follows:

p_j = min {1,cπ_j}

Calculate the probability for each row as follows:

pˊ_i = min{1, cπˊ_i}.

A column or row is selected if the probability is greater than some threshold.