About Row Importance
Identify and rank influential rows in a data set using statistical leverage scores for dimensionality reduction.
Row importance technique is used in dimensionality reduction of large data sets. Row importance identifies the most influential rows of the data matrix. The rows with high importance are ranked by their importance scores. The "importance" of a row is determined by high statistical leverage scores. In CUR matrix decomposition, row importance is often combined with column (attribute) importance. Row importance can serve as a data preprocessing step prior to model building using regression, classification, and clustering.
Row Importance Algorithms
Oracle Machine Learning supports CUR matrix decomposition algorithm to determine row and column (attribute) importance.
Popular algorithms for dimensionality reduction are Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and CUR Matrix Decomposition. All these algorithms apply low-rank matrix decomposition.
In CUR matrix decomposition, the attributes include 2-Dimensional
numerical columns, levels of exploded 2D categorical columns, and attribute
name or subname or value pairs for nested columns. To arrive at row
importance or selection, the algorithm computes singular vectors, calculates
leverage scores, and then selects rows. Row importance is performed when
users specify CURS_ROW_IMP_ENABLE
for the
CURS_ROW_IMPORTANCE
parameter in the settings table
and the case_id
column is present. Unless users explicitly
specify, row importance is not performed.
Related Topics