MySQL HeatWave User Guide

10.2.8 ML_EXPLAIN_TABLE

ML_EXPLAIN_TABLE explains predictions for an entire table of unlabeled data and saves results to an output table. It limits explanations to the 100 most relevant features.

ML_EXPLAIN_TABLE Overview

Note

ML_EXPLAIN_TABLE is a very memory-intensive process. We recommend using the batch_size option to limit operations to batches of 10 to 100 rows by splitting large tables into smaller tables. For tables with over ten columns, we suggest a batch_size value of 10.

A call to ML_EXPLAIN_TABLE can include columns that were not present during ML_TRAIN. A table can include extra columns, and still use the MySQL HeatWave AutoML model. This allows side by side comparisons of target column labels, ground truth, and explanations in the same table. ML_EXPLAIN_TABLE ignores any extra columns, and appends them to the results.

A loaded model and trained with the appropriate prediction explainer is required to run ML_EXPLAIN_TABLE. See Generate Prediction Explanations for a Table.

The output table includes a primary key:

  • If the input table has a primary key, the output table will have the same primary key.

  • If the input table does not have a primary key, the output table will have a new primary key column that auto increments.

    • As of MySQL 8.4.1, the name of the new primary key column is _4aad19ca6e_pk_id. The input table must not have a column with the name _4aad19ca6e_pk_id that is not a primary key.

    • Before MySQL 8.4.1, the name of the new primary key column is _id. The input table must not have a column with the name _id that is not a primary key.

ML_EXPLAIN_TABLE does not support recommendation, anomaly detection, and topic modeling models. A call with one of these models produces an error.

ML_EXPLAIN_TABLE does not support the following model types:

  • Forecasting

  • Recommendation

  • Anomaly detection

  • Anomaly detection for logs

  • Topic modeling

ML_EXPLAIN_TABLE Syntax

mysql> CALL sys.ML_EXPLAIN_TABLE(table_name, model_handle, output_table_name, [options]);

options: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      ['prediction_explainer', {'permutation_importance'|'shap'}|NULL]
      ['batch_size', 'N']
      }
}

Required ML_EXPLAIN_TABLE Parameters

Set the following required parameters.

  • table_name: Specifies the fully qualified name of the input table (database_name.table_name). The input table should contain the same feature columns as the table used to train the model. If the target column is included in the input table, it is not considered when generating prediction explanations.

  • model_handle: Specifies the model handle or a session variable containing the model handle. See Work with Model Handles.

  • output_table_name: Specifies the table where explanation data is stored. A fully qualified table name must be specified (database_name.table_name). If the table already exists, an error is returned.

ML_EXPLAIN_TABLE Options

Set the following options as needed.

  • prediction_explainer: The name of the prediction explainer that you have trained for this model using ML_EXPLAIN.

    • permutation_importance: The default prediction explainer.

    • shap: The SHAP prediction explainer, which produces feature importance values based on Shapley values.

  • batch_size: The size of each batch. You can set a value between 1 and 100. For tables with over ten columns, we recommend a value of 10.

Syntax Examples

  • The following example generates explanations for a table of data with the default Permutation Importance prediction explainer. The ML_EXPLAIN_TABLE call specifies the fully qualified name of the table to generate explanations for, the session variable containing the model handle, and the fully qualified output table name.

    mysql> CALL sys.ML_EXPLAIN_TABLE('census_data.census_train', @census_model, 'census_data.census_train_permutation', JSON_OBJECT('prediction_explainer', 'permutation_importance'));
    

    To view ML_EXPLAIN_TABLE results, query the output table. The SELECT statement retrieves explanation data from the output table. The table includes the primary key, _4aad19ca6e_pk_id, and the ml_results column, which uses JSON format:

    mysql> SELECT * FROM census_train_permutation LIMIT 3;
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | _4aad19ca6e_pk_id | age | workclass | fnlwgt | education  | education-num | marital-status     | occupation      | relationship | race  | sex    | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | age_attribution | education-num_attribution | marital-status_attribution | education_attribution | hours-per-week_attribution | relationship_attribution | race_attribution | sex_attribution | workclass_attribution | fnlwgt_attribution | capital-gain_attribution | Notes                                                                                                                                                      | ml_results                                                                                                                                                                                                                                                                                                                                                                                                                                     |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                 1 |  37 | Private   |  99146 | Bachelors  |            13 | Married-civ-spouse | Exec-managerial | Husband      | White | Male   |            0 |         1977 |             50 | United-States  | >50K    | <=50K      |            -0.1 |                     -0.08 |                      -0.05 |                 -0.05 |                      -0.03 |                    -0.03 |             0.02 |           -0.02 |                  0.01 |                  0 |                        0 | race (White) had the largest impact towards predicting =50K, whereas age (37) contributed the most against predicting <=50K                               | {"attributions": {"age": -0.1, "education-num": -0.08, "marital-status": -0.05, "education": -0.05, "hours-per-week": -0.03, "relationship": -0.03, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "race (White) had the largest impact towards predicting <=50K, whereas age (37) contributed the most against predicting <=50K"}                          |
    |                 2 |  34 | Private   |  27409 | 9th        |             5 | Married-civ-spouse | Craft-repair    | Husband      | White | Male   |            0 |            0 |             50 | United-States  | <=50K   | <=50K      |               0 |                         0 |                      -0.04 |                  0.06 |                      -0.03 |                     0.02 |             0.02 |           -0.02 |                  0.01 |                  0 |                        0 | education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": -0.04, "education": 0.06, "hours-per-week": -0.03, "relationship": 0.02, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K"} |
    |                 3 |  30 | Private   | 299507 | Assoc-acdm |            12 | Separated          | Other-service   | Unmarried    | White | Female |            0 |            0 |             40 | United-States  | <=50K   | <=50K      |               0 |                         0 |                          0 |                     0 |                          0 |                     0.03 |             0.01 |            0.02 |                     0 |                  0 |                        0 | relationship (Unmarried) had the largest impact towards predicting <=50K                                                                                   | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": 0.0, "education": 0.0, "hours-per-week": 0.0, "relationship": 0.03, "race": 0.01, "sex": 0.02, "workclass": 0.0, "fnlwgt": -0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "relationship (Unmarried) had the largest impact towards predicting <=50K"}                                                                                         |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

See Also