MySQL HeatWave User Guide
ML_EXPLAIN_TABLE
explains
predictions for an entire table of unlabeled data and saves
results to an output table. Explanations are performed in
parallel.
ML_EXPLAIN_TABLE
is a very
memory-intensive process. We recommend using the
batch_size
option to limit operations to
batches of 10 to 100 rows by splitting large tables into
smaller tables. For tables with over ten columns, we suggest
a batch_size
value of
10
. See
ML_EXPLAIN_TABLE
and Track
Progress for MySQL HeatWave AutoML Routines to learn more.
Review the following:
You cannot generate prediction explanations on a table for the following model types:
Forecasting
Recommendation
Anomaly detection
Anomaly detection for logs
Topic modeling
Before running
ML_EXPLAIN_TABLE
, you must
train, and then load the model you want to use.
The following example trains a dataset with the classification machine learning task.
mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
The following example loads the trained model.
mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);
For more information about training and loading models, see Train a Model and Load a Model.
After training and loading the model, you can generate prediction explanations for a table. For parameter and option descriptions, see ML_EXPLAIN_TABLE.
After training and loading a model, you can run
ML_EXPLAIN_TABLE
to generate
a table of prediction explanations with the default
Permutation Importance explainer>. However, if you train the
shap
prediction explainer with
ML_EXPLAIN
, you need to run
ML_EXPLAIN
again with the
permutation_importance
explainer before
running ML_EXPLAIN_TABLE
with
the same explainer.
Run the ML_EXPLAIN_TABLE
routine.
mysql> CALL sys.ML_EXPLAIN_TABLE(table_name
, model_handle
, output_table_name
, [options
]);
The following example runs
ML_EXPLAIN_TABLE
with the
permutation_importance
explainer.
mysql> CALL sys.ML_EXPLAIN_TABLE('census_data.census_train', @census_model, 'census_data.census_train_permutation', JSON_OBJECT('prediction_explainer', 'permutation_importance'));
Where:
census_data.census_train
is the
fully qualified name of the table that contains the
training dataset
(schema_name.table_name
).
@census_model
is the session
variable for the trained model.
census_data.census_train_permutation
is the fully qualified name of the output table that
contains the explanations
(schema_name.table_name
).
prediction_explainer
is set to
permutation_importance
for the
Permutation Importance prediction explainer.
Query the output table to review a sample of the results.
mysql> SELECT * FROM table_name
LIMIT N
;
The following example queries the top three rows of the output table.
mysql> SELECT * FROM census_train_permutation LIMIT 3;
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| _4aad19ca6e_pk_id | age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | age_attribution | education-num_attribution | marital-status_attribution | education_attribution | hours-per-week_attribution | relationship_attribution | race_attribution | sex_attribution | workclass_attribution | fnlwgt_attribution | capital-gain_attribution | Notes | ml_results |
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 37 | Private | 99146 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 1977 | 50 | United-States | >50K | <=50K | -0.1 | -0.08 | -0.05 | -0.05 | -0.03 | -0.03 | 0.02 | -0.02 | 0.01 | 0 | 0 | race (White) had the largest impact towards predicting <=50K, whereas age (37) contributed the most against predicting <=50K | {"attributions": {"age": -0.1, "education-num": -0.08, "marital-status": -0.05, "education": -0.05, "hours-per-week": -0.03, "relationship": -0.03, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "race (White) had the largest impact towards predicting <=50K, whereas age (37) contributed the most against predicting <=50K"} |
| 2 | 34 | Private | 27409 | 9th | 5 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K | <=50K | 0 | 0 | -0.04 | 0.06 | -0.03 | 0.02 | 0.02 | -0.02 | 0.01 | 0 | 0 | education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": -0.04, "education": 0.06, "hours-per-week": -0.03, "relationship": 0.02, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K"} |
| 3 | 30 | Private | 299507 | Assoc-acdm | 12 | Separated | Other-service | Unmarried | White | Female | 0 | 0 | 40 | United-States | <=50K | <=50K | 0 | 0 | 0 | 0 | 0 | 0.03 | 0.01 | 0.02 | 0 | 0 | 0 | relationship (Unmarried) had the largest impact towards predicting <=50K | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": 0.0, "education": 0.0, "hours-per-week": 0.0, "relationship": 0.03, "race": 0.01, "sex": 0.02, "workclass": 0.0, "fnlwgt": -0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "relationship (Unmarried) had the largest impact towards predicting <=50K"} |
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The results display information on the columns that had the largest impact towards the predictions and the columns that contributed the most against the prediction.
A warning displays if the model is of low quality.
To generate a table of prediction explanations with the SHAP
explainer, you must first run the SHAP explainer with
ML_EXPLAIN
.
Run the ML_EXPLAIN
routine.
mysql> CALL sys.ML_EXPLAIN ('table_name
', 'target_column_name
', model_handle
, [options]);
The following example run the shap
prediction explainer.
mysql> CALL sys.ML_EXPLAIN('census_data.census_train', 'revenue', @census_model, JSON_OBJECT('prediction_explainer', 'shap'));
Where:
census_data.census_train
is the
fully qualified name of the table that contains the
training dataset
(schema_name.table_name
).
revenue
is the name of the target
column, which contains ground truth values.
@census_model
is the session
variable for the trained model.
prediction_explainer
is set to
shap
for the SHAP prediction
explainer.
Run the ML_EXPLAIN_TABLE
routine.
mysql> CALL sys.ML_EXPLAIN_TABLE(table_name
, model_handle
, output_table_name
, [options
]);
The following example runs the shap
prediction explainer.
mysql> CALL sys.ML_EXPLAIN_TABLE('census_data.census_train', @census_model, 'census_data.census_train_explanations', JSON_OBJECT('prediction_explainer', 'shap'));
Where:
census_data.census_train
is the
fully qualified name of the table that contains the
training dataset
(schema_name.table_name
).
@census_model
is the session
variable for the trained model.
census_data.census_train_explanations
is the fully qualified name of the output table that
contains the explanations
(schema_name.table_name
).
prediction_explainer
is set to
shap
for the SHAP prediction
explainer.
Query the output table to review a sample of the results.
mysql> SELECT * FROM table_name
LIMIT N
;
The following example queries the top three rows of the output table.
mysql> SELECT * FROM census_train_explanations LIMIT 3;
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| _4aad19ca6e_pk_id | age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | age_attribution | capital-gain_attribution | education_attribution | education-num_attribution | fnlwgt_attribution | hours-per-week_attribution | marital-status_attribution | race_attribution | relationship_attribution | sex_attribution | workclass_attribution | ml_results |
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 37 | Private | 99146 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 1977 | 50 | United-States | >50K | <=50K | -0.0302428 | 0.0105781 | -0.0250973 | -0.0307275 | -0.0394281 | -0.0115214 | -0.0417846 | 0.00787589 | -0.0510622 | -0.0265045 | 0.0125372 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": -0.03024279141207572, "capital-gain_attribution": 0.01057805203258716, "education_attribution": -0.025097336669631397, "education-num_attribution": -0.03072748073245629, "fnlwgt_attribution": -0.03942808683149933, "hours-per-week_attribution": -0.01152140445039615, "marital-status_attribution": -0.041784639464290424, "race_attribution": 0.00787588729783209, "relationship_attribution": -0.05106219218656989, "sex_attribution": -0.026504470317501083, "workclass_attribution": 0.012537165912050216}} |
| 2 | 34 | Private | 27409 | 9th | 5 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K | <=50K | 0.0208705 | 0.0108355 | 0.0445232 | 0.00589651 | -0.042798 | -0.0120815 | -0.0419652 | 0.00772465 | -0.0284059 | -0.0263092 | 0.0133885 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": 0.020870525610603933, "capital-gain_attribution": 0.010835454035912382, "education_attribution": 0.044523246311996556, "education-num_attribution": 0.005896512933976113, "fnlwgt_attribution": -0.04279802962032341, "hours-per-week_attribution": -0.012081540592359136, "marital-status_attribution": -0.04196518165858853, "race_attribution": 0.007724645581626567, "relationship_attribution": -0.028405864878017332, "sex_attribution": -0.026309175183378983, "workclass_attribution": 0.013388466297739862}} |
| 3 | 30 | Private | 299507 | Assoc-acdm | 12 | Separated | Other-service | Unmarried | White | Female | 0 | 0 | 40 | United-States | <=50K | <=50K | 0.0415136 | 0.0104095 | 0.0205295 | 0.00435879 | -0.00299281 | 0.0095282 | 0.0237676 | 0.00823583 | 0.0288422 | 0.0345606 | 0.00340862 | {"predictions": {"revenue": "<=50K"}, "attributions": {"age_attribution": 0.04151362477220458, "capital-gain_attribution": 0.01040951100227322, "education_attribution": 0.020529478225449593, "education-num_attribution": 0.004358785226045173, "fnlwgt_attribution": -0.0029928127703645266, "hours-per-week_attribution": 0.009528199786418556, "marital-status_attribution": 0.023767622241075377, "race_attribution": 0.008235833643263954, "relationship_attribution": 0.028842206656526016, "sex_attribution": 0.03456064630884248, "workclass_attribution": 0.0034086183477553336}} |
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+--------------------------+-----------------------+---------------------------+--------------------+----------------------------+----------------------------+------------------+--------------------------+-----------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The results display feature importance values for each column.
A warning displays if the model is of low quality.
Review ML_EXPLAIN_TABLE for parameter descriptions and options.
Learn how to Score a Model to get insight into the quality of the model.