MySQL HeatWave User Guide

6.5.6.2 Generate Prediction Explanations for a Table

ML_EXPLAIN_TABLE explains predictions for an entire table of unlabeled data and saves results to an output table. Explanations are performed in parallel.

Note

ML_EXPLAIN_TABLE is a very memory-intensive process. We recommend using the batch_size option to limit operations to batches of 10 to 100 rows by splitting large tables into smaller tables. For tables with over ten columns, we suggest a batch_size value of 10. See ML_EXPLAIN_TABLE and Track Progress for MySQL HeatWave AutoML Routines to learn more.

Before You Begin
Unsupported Model Types

You cannot generate prediction explanations on a table for the following model types:

  • Forecasting

  • Recommendation

  • Anomaly detection

  • Anomaly detection for logs

  • Topic modeling

Prepare to Generate Explanations for a Table

Before running ML_EXPLAIN_TABLE, you must train, and then load the model you want to use.

  1. The following example trains a dataset with the classification machine learning task.

    mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
    
  2. The following example loads the trained model.

    mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);
    

For more information about training and loading models, see Train a Model and Load a Model.

After training and loading the model, you can generate prediction explanations for a table. For parameter and option descriptions, see ML_EXPLAIN_TABLE.

Generate Explanations for a Table with the Default Permutation Importance Explainer

After training and loading a model, you can run ML_EXPLAIN_TABLE to generate a table of prediction explanations with the default Permutation Importance explainer>. However, if you train the shap prediction explainer with ML_EXPLAIN, you need to run ML_EXPLAIN again with the permutation_importance explainer before running ML_EXPLAIN_TABLE with the same explainer.

  1. Run the ML_EXPLAIN_TABLE routine.

    mysql> CALL sys.ML_EXPLAIN_TABLE(table_name, model_handle, output_table_name, [options]);
    

    The following example runs ML_EXPLAIN_TABLE with the permutation_importance explainer.

    mysql> CALL sys.ML_EXPLAIN_TABLE('census_data.census_train', @census_model, 'census_data.census_train_permutation', JSON_OBJECT('prediction_explainer', 'permutation_importance'));
    

    Where:

    • census_data.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

    • @census_model is the session variable for the trained model.

    • census_data.census_train_permutation is the fully qualified name of the output table that contains the explanations (schema_name.table_name).

    • prediction_explainer is set to permutation_importance for the Permutation Importance prediction explainer.

  2. Query the output table to review a sample of the results.

    mysql> SELECT * FROM table_name LIMIT N;
    

    The following example queries the top three rows of the output table.

    mysql> SELECT * FROM census_train_permutation LIMIT 3;
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | _4aad19ca6e_pk_id | age | workclass | fnlwgt | education  | education-num | marital-status     | occupation      | relationship | race  | sex    | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | age_attribution | education-num_attribution | marital-status_attribution | education_attribution | hours-per-week_attribution | relationship_attribution | race_attribution | sex_attribution | workclass_attribution | fnlwgt_attribution | capital-gain_attribution | Notes                                                                                                                                                      | ml_results                                                                                                                                                                                                                                                                                                                                                                                                                                     |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                 1 |  37 | Private   |  99146 | Bachelors  |            13 | Married-civ-spouse | Exec-managerial | Husband      | White | Male   |            0 |         1977 |             50 | United-States  | >50K    | <=50K      |            -0.1 |                     -0.08 |                      -0.05 |                 -0.05 |                      -0.03 |                    -0.03 |             0.02 |           -0.02 |                  0.01 |                  0 |                        0 | race (White) had the largest impact towards predicting <=50K, whereas age (37) contributed the most against predicting <=50K                               | {"attributions": {"age": -0.1, "education-num": -0.08, "marital-status": -0.05, "education": -0.05, "hours-per-week": -0.03, "relationship": -0.03, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "race (White) had the largest impact towards predicting <=50K, whereas age (37) contributed the most against predicting <=50K"}                          |
    |                 2 |  34 | Private   |  27409 | 9th        |             5 | Married-civ-spouse | Craft-repair    | Husband      | White | Male   |            0 |            0 |             50 | United-States  | <=50K   | <=50K      |               0 |                         0 |                      -0.04 |                  0.06 |                      -0.03 |                     0.02 |             0.02 |           -0.02 |                  0.01 |                  0 |                        0 | education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": -0.04, "education": 0.06, "hours-per-week": -0.03, "relationship": 0.02, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K"} |
    |                 3 |  30 | Private   | 299507 | Assoc-acdm |            12 | Separated          | Other-service   | Unmarried    | White | Female |            0 |            0 |             40 | United-States  | <=50K   | <=50K      |               0 |                         0 |                          0 |                     0 |                          0 |                     0.03 |             0.01 |            0.02 |                     0 |                  0 |                        0 | relationship (Unmarried) had the largest impact towards predicting <=50K                                                                                   | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": 0.0, "education": 0.0, "hours-per-week": 0.0, "relationship": 0.03, "race": 0.01, "sex": 0.02, "workclass": 0.0, "fnlwgt": -0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "relationship (Unmarried) had the largest impact towards predicting <=50K"}                                                                                         |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

    The results display information on the columns that had the largest impact towards the predictions and the columns that contributed the most against the prediction.

    A warning displays if the model is of low quality.

Generate Explanations for a Table with the SHAP Explainer

To generate a table of prediction explanations with the SHAP explainer, you must first run the SHAP explainer with ML_EXPLAIN.

  1. Run the ML_EXPLAIN routine.

    mysql> CALL sys.ML_EXPLAIN ('table_name', 'target_column_name', model_handle, [options]);
    

    The following example run the shap prediction explainer.

    mysql> CALL sys.ML_EXPLAIN('census_data.census_train', 'revenue', @census_model, JSON_OBJECT('prediction_explainer', 'shap'));
    

    Where:

    • census_data.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

    • revenue is the name of the target column, which contains ground truth values.

    • @census_model is the session variable for the trained model.

    • prediction_explainer is set to shap for the SHAP prediction explainer.

  2. Run the ML_EXPLAIN_TABLE routine.

    mysql> CALL sys.ML_EXPLAIN_TABLE(table_name, model_handle, output_table_name, [options]);
    

    The following example runs the shap prediction explainer.

    mysql> CALL sys.ML_EXPLAIN_TABLE('census_data.census_train', @census_model, 'census_data.census_train_explanations', JSON_OBJECT('prediction_explainer', 'shap'));
    

    Where:

    • census_data.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

    • @census_model is the session variable for the trained model.

    • census_data.census_train_explanations is the fully qualified name of the output table that contains the explanations (schema_name.table_name).

    • prediction_explainer is set to shap for the SHAP prediction explainer.

  3. Query the output table to review a sample of the results.

    mysql> SELECT * FROM table_name LIMIT N;
    

    The following example queries the top three rows of the output table.

    mysql> SELECT * FROM census_train_explanations LIMIT 3;
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | _4aad19ca6e_pk_id | age | workclass | fnlwgt | education  | education-num | marital-status     | occupation      | relationship | race  | sex    | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | age_attribution | capital-gain_attribution | education_attribution | education-num_attribution | fnlwgt_attribution | hours-per-week_attribution | marital-status_attribution | race_attribution | relationship_attribution | sex_attribution | workclass_attribution | ml_results                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                 1 |  37 | Private   |  99146 | Bachelors  |            13 | Married-civ-spouse | Exec-managerial | Husband      | White | Male   |            0 |         1977 |             50 | United-States  | >50K    | <=50K      |      -0.0302428 |                0.0105781 |            -0.0250973 |                -0.0307275 |         -0.0394281 |                 -0.0115214 |                 -0.0417846 |       0.00787589 |               -0.0510622 |      -0.0265045 |             0.0125372 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": -0.03024279141207572, "capital-gain_attribution": 0.01057805203258716, "education_attribution": -0.025097336669631397, "education-num_attribution": -0.03072748073245629, "fnlwgt_attribution": -0.03942808683149933, "hours-per-week_attribution": -0.01152140445039615, "marital-status_attribution": -0.041784639464290424, "race_attribution": 0.00787588729783209, "relationship_attribution": -0.05106219218656989, "sex_attribution": -0.026504470317501083, "workclass_attribution": 0.012537165912050216}}   |
    |                 2 |  34 | Private   |  27409 | 9th        |             5 | Married-civ-spouse | Craft-repair    | Husband      | White | Male   |            0 |            0 |             50 | United-States  | <=50K   | <=50K      |       0.0208705 |                0.0108355 |             0.0445232 |                0.00589651 |          -0.042798 |                 -0.0120815 |                 -0.0419652 |       0.00772465 |               -0.0284059 |      -0.0263092 |             0.0133885 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": 0.020870525610603933, "capital-gain_attribution": 0.010835454035912382, "education_attribution": 0.044523246311996556, "education-num_attribution": 0.005896512933976113, "fnlwgt_attribution": -0.04279802962032341, "hours-per-week_attribution": -0.012081540592359136, "marital-status_attribution": -0.04196518165858853, "race_attribution": 0.007724645581626567, "relationship_attribution": -0.028405864878017332, "sex_attribution": -0.026309175183378983, "workclass_attribution": 0.013388466297739862}} |
    |                 3 |  30 | Private   | 299507 | Assoc-acdm |            12 | Separated          | Other-service   | Unmarried    | White | Female |            0 |            0 |             40 | United-States  | <=50K   | <=50K      |       0.0415136 |                0.0104095 |             0.0205295 |                0.00435879 |        -0.00299281 |                  0.0095282 |                  0.0237676 |       0.00823583 |                0.0288422 |       0.0345606 |            0.00340862 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": 0.04151362477220458, "capital-gain_attribution": 0.01040951100227322, "education_attribution": 0.020529478225449593, "education-num_attribution": 0.004358785226045173, "fnlwgt_attribution": -0.0029928127703645266, "hours-per-week_attribution": 0.009528199786418556, "marital-status_attribution": 0.023767622241075377, "race_attribution": 0.008235833643263954, "relationship_attribution": 0.028842206656526016, "sex_attribution": 0.03456064630884248, "workclass_attribution": 0.0034086183477553336}}    |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

    The results display feature importance values for each column.

    A warning displays if the model is of low quality.

What's Next