12 Row Importance
Use row importance as an unsupervised technique to preprocess data before model building with other machine learning techniques.
12.1 About Row Importance
Row importance captures the influence of the rows or cases in a data set.
Row importance technique is used in dimensionality reduction of large data sets. Row importance identifies the most influential rows of the data matrix. The rows with high importance are ranked by their importance scores. The "importance" of a row is determined by high statistical leverage scores. In CUR matrix decomposition, row importance is often combined with column (attribute) importance. Row importance can serve as a data preprocessing step prior to model building using regression, classification, and clustering.
12.2 Selecting Important Rows
The rows with high importance are ranked by their importance scores. The "importance" of a row is determined by high statistical leverage scores.
Row importance, that is, rows with high leverage scores are reported as names (as case_id
), scores (as importance), and ranks (by importance).
12.3 Row Importance Algorithms
Oracle Machine Learning supports CUR matrix decomposition algorithm to determine row and column (attribute) importance.
Popular algorithms for dimensionality reduction are Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and CUR Matrix Decomposition. All these algorithms apply low-rank matrix decomposition.
In CUR matrix decomposition, the attributes include 2-Dimensional numerical
columns, levels of exploded 2D categorical columns, and attribute name or subname or
value pairs for nested columns. To arrive at row importance or selection, the algorithm
computes singular vectors, calculates leverage scores, and then selects rows. Row
importance is performed when users specify CURS_ROW_IMP_ENABLE
for the
CURS_ROW_IMPORTANCE
parameter in the settings table and the
case_id
column is present. Unless users explicitly specify, row
importance is not performed.
Related Topics