MySQL AI User Guide

4.5.6.2 Generating Prediction Explanations for a Table

ML_EXPLAIN_TABLE explains predictions for an entire table of unlabeled data. Explanations are performed in parallel.

Note

ML_EXPLAIN_TABLE is a very memory-intensive process. We recommend limiting the input table to a maximum of 100 rows. If the input table has more than ten columns, limit it to ten rows.

Before You Begin
Unsupported Model Types

You cannot generate prediction explanations on a table for the following model types:

  • Forecasting

  • Recommendation

  • Anomaly detection

  • Anomaly detection for logs

  • Topic modeling

Input Tables and Output Tables

You can specify the output table and the input table as the same table if all the following conditions are met:

  • The input table does not have the columns that are created for the output table when generating predictions. Output columns are specific to each machine learning task. Some of these columns include:

    • Prediction

    • ml_results

    • [input_column_name]_attribution

  • The input table does not have a primary key, and it does not have a column named _4aad19ca6e_pk_id. This is because ML_EXPLAIN_TABLE adds a column as the primary key with the name _4aad19ca6e_pk_id to the output table.

If you specify the output table and the input table as the same name, the predictions are inserted into the input table.

Preparing to Generate Explanations for a Table

Before running ML_EXPLAIN_TABLE, you must train, and then load the model you want to use.

  1. The following example trains a dataset with the classification machine learning task.

    mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
    
  2. The following example loads the trained model.

    mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);
    

For more information about training and loading models, see Train a Model and Load a Model.

After training and loading the model, you can generate prediction explanations for a table. For parameter and option descriptions, see ML_EXPLAIN_TABLE.

Generating Explanations for a Table with the Default Permutation Importance Explainer

After training and loading a model, you can run ML_EXPLAIN_TABLE to generate a table of prediction explanations with the default Permutation Importance explainer>. However, if you train the shap prediction explainer with ML_EXPLAIN, you need to run ML_EXPLAIN again with the permutation_importance explainer before running ML_EXPLAIN_TABLE with the same explainer.

  1. Run the ML_EXPLAIN_TABLE routine.

    mysql> CALL sys.ML_EXPLAIN_TABLE(table_name, model_handle, output_table_name, [options]);
    

    The following example runs ML_EXPLAIN_TABLE with the permutation_importance explainer.

    mysql> CALL sys.ML_EXPLAIN_TABLE('census_data.census_train', @census_model, 'census_data.census_train_permutation', JSON_OBJECT('prediction_explainer', 'permutation_importance'));
    

    Where:

    • census_data.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

    • @census_model is the session variable for the trained model.

    • census_data.census_train_permutation is the fully qualified name of the output table that contains the explanations (schema_name.table_name).

    • prediction_explainer is set to permutation_importance for the Permutation Importance prediction explainer.

  2. Query the output table to review a sample of the results.

    mysql> SELECT * FROM table_name LIMIT N;
    

    The following example queries the top three rows of the output table.

    mysql> SELECT * FROM census_train_permutation LIMIT 3;
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | _4aad19ca6e_pk_id | age | workclass | fnlwgt | education  | education-num | marital-status     | occupation      | relationship | race  | sex    | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | age_attribution | education-num_attribution | marital-status_attribution | education_attribution | hours-per-week_attribution | relationship_attribution | race_attribution | sex_attribution | workclass_attribution | fnlwgt_attribution | capital-gain_attribution | Notes                                                                                                                                                      | ml_results                                                                                                                                                                                                                                                                                                                                                                                                                                     |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                 1 |  37 | Private   |  99146 | Bachelors  |            13 | Married-civ-spouse | Exec-managerial | Husband      | White | Male   |            0 |         1977 |             50 | United-States  | >50K    | <=50K      |            -0.1 |                     -0.08 |                      -0.05 |                 -0.05 |                      -0.03 |                    -0.03 |             0.02 |           -0.02 |                  0.01 |                  0 |                        0 | race (White) had the largest impact towards predicting <=50K, whereas age (37) contributed the most against predicting <=50K                               | {"attributions": {"age": -0.1, "education-num": -0.08, "marital-status": -0.05, "education": -0.05, "hours-per-week": -0.03, "relationship": -0.03, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "race (White) had the largest impact towards predicting <=50K, whereas age (37) contributed the most against predicting <=50K"}                          |
    |                 2 |  34 | Private   |  27409 | 9th        |             5 | Married-civ-spouse | Craft-repair    | Husband      | White | Male   |            0 |            0 |             50 | United-States  | <=50K   | <=50K      |               0 |                         0 |                      -0.04 |                  0.06 |                      -0.03 |                     0.02 |             0.02 |           -0.02 |                  0.01 |                  0 |                        0 | education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": -0.04, "education": 0.06, "hours-per-week": -0.03, "relationship": 0.02, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K"} |
    |                 3 |  30 | Private   | 299507 | Assoc-acdm |            12 | Separated          | Other-service   | Unmarried    | White | Female |            0 |            0 |             40 | United-States  | <=50K   | <=50K      |               0 |                         0 |                          0 |                     0 |                          0 |                     0.03 |             0.01 |            0.02 |                     0 |                  0 |                        0 | relationship (Unmarried) had the largest impact towards predicting <=50K                                                                                   | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": 0.0, "education": 0.0, "hours-per-week": 0.0, "relationship": 0.03, "race": 0.01, "sex": 0.02, "workclass": 0.0, "fnlwgt": -0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "relationship (Unmarried) had the largest impact towards predicting <=50K"}                                                                                         |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

    The results display information on the columns that had the largest impact towards the predictions and the columns that contributed the most against the prediction.

    A warning displays if the model is of low quality.

Generating Explanations for a Table with the SHAP Explainer

To generate a table of prediction explanations with the SHAP explainer, you must first run the SHAP explainer with ML_EXPLAIN.

  1. Run the ML_EXPLAIN routine.

    mysql> CALL sys.ML_EXPLAIN ('table_name', 'target_column_name', model_handle, [options]);
    

    The following example run the shap prediction explainer.

    mysql> CALL sys.ML_EXPLAIN('census_data.census_train', 'revenue', @census_model, JSON_OBJECT('prediction_explainer', 'shap'));
    

    Where:

    • census_data.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

    • revenue is the name of the target column, which contains ground truth values.

    • @census_model is the session variable for the trained model.

    • prediction_explainer is set to shap for the SHAP prediction explainer.

  2. Run the ML_EXPLAIN_TABLE routine.

    mysql> CALL sys.ML_EXPLAIN_TABLE(table_name, model_handle, output_table_name, [options]);
    

    The following example runs the shap prediction explainer.

    mysql> CALL sys.ML_EXPLAIN_TABLE('census_data.census_train', @census_model, 'census_data.census_train_explanations', JSON_OBJECT('prediction_explainer', 'shap'));
    

    Where:

    • census_data.census_train is the fully qualified name of the table that contains the training dataset (schema_name.table_name).

    • @census_model is the session variable for the trained model.

    • census_data.census_train_explanations is the fully qualified name of the output table that contains the explanations (schema_name.table_name).

    • prediction_explainer is set to shap for the SHAP prediction explainer.

  3. Query the output table to review a sample of the results.

    mysql> SELECT * FROM table_name LIMIT N;
    

    The following example queries the top three rows of the output table.

    mysql> SELECT * FROM census_train_explanations LIMIT 3;
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | _4aad19ca6e_pk_id | age | workclass | fnlwgt | education  | education-num | marital-status     | occupation      | relationship | race  | sex    | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | age_attribution | capital-gain_attribution | education_attribution | education-num_attribution | fnlwgt_attribution | hours-per-week_attribution | marital-status_attribution | race_attribution | relationship_attribution | sex_attribution | workclass_attribution | ml_results                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                 1 |  37 | Private   |  99146 | Bachelors  |            13 | Married-civ-spouse | Exec-managerial | Husband      | White | Male   |            0 |         1977 |             50 | United-States  | >50K    | <=50K      |      -0.0302428 |                0.0105781 |            -0.0250973 |                -0.0307275 |         -0.0394281 |                 -0.0115214 |                 -0.0417846 |       0.00787589 |               -0.0510622 |      -0.0265045 |             0.0125372 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": -0.03024279141207572, "capital-gain_attribution": 0.01057805203258716, "education_attribution": -0.025097336669631397, "education-num_attribution": -0.03072748073245629, "fnlwgt_attribution": -0.03942808683149933, "hours-per-week_attribution": -0.01152140445039615, "marital-status_attribution": -0.041784639464290424, "race_attribution": 0.00787588729783209, "relationship_attribution": -0.05106219218656989, "sex_attribution": -0.026504470317501083, "workclass_attribution": 0.012537165912050216}}   |
    |                 2 |  34 | Private   |  27409 | 9th        |             5 | Married-civ-spouse | Craft-repair    | Husband      | White | Male   |            0 |            0 |             50 | United-States  | <=50K   | <=50K      |       0.0208705 |                0.0108355 |             0.0445232 |                0.00589651 |          -0.042798 |                 -0.0120815 |                 -0.0419652 |       0.00772465 |               -0.0284059 |      -0.0263092 |             0.0133885 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": 0.020870525610603933, "capital-gain_attribution": 0.010835454035912382, "education_attribution": 0.044523246311996556, "education-num_attribution": 0.005896512933976113, "fnlwgt_attribution": -0.04279802962032341, "hours-per-week_attribution": -0.012081540592359136, "marital-status_attribution": -0.04196518165858853, "race_attribution": 0.007724645581626567, "relationship_attribution": -0.028405864878017332, "sex_attribution": -0.026309175183378983, "workclass_attribution": 0.013388466297739862}} |
    |                 3 |  30 | Private   | 299507 | Assoc-acdm |            12 | Separated          | Other-service   | Unmarried    | White | Female |            0 |            0 |             40 | United-States  | <=50K   | <=50K      |       0.0415136 |                0.0104095 |             0.0205295 |                0.00435879 |        -0.00299281 |                  0.0095282 |                  0.0237676 |       0.00823583 |                0.0288422 |       0.0345606 |            0.00340862 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": 0.04151362477220458, "capital-gain_attribution": 0.01040951100227322, "education_attribution": 0.020529478225449593, "education-num_attribution": 0.004358785226045173, "fnlwgt_attribution": -0.0029928127703645266, "hours-per-week_attribution": 0.009528199786418556, "marital-status_attribution": 0.023767622241075377, "race_attribution": 0.008235833643263954, "relationship_attribution": 0.028842206656526016, "sex_attribution": 0.03456064630884248, "workclass_attribution": 0.0034086183477553336}}    |
    +-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    

    The results display feature importance values for each column.

    A warning displays if the model is of low quality.

What's Next