MySQL HeatWave User Guide
ML_EXPLAIN_TABLE
explains
predictions for an entire table of unlabeled data and saves
results to an output table. It limits explanations to the 100
most relevant features.
ML_EXPLAIN_TABLE
is a very
memory-intensive process. We recommend using the
batch_size
option to limit operations to
batches of 10 to 100 rows by splitting large tables into
smaller tables. For tables with over ten columns, we suggest
a batch_size
value of
10
.
A call to ML_EXPLAIN_TABLE
can
include columns that were not present during
ML_TRAIN
. A table can include
extra columns, and still use the MySQL HeatWave AutoML model. This allows
side by side comparisons of target column labels, ground
truth, and explanations in the same table.
ML_EXPLAIN_TABLE
ignores any
extra columns, and appends them to the results.
A loaded model and trained with the appropriate prediction
explainer is required to run
ML_EXPLAIN_TABLE
. See
Generate
Prediction Explanations for a Table.
The output table includes a primary key:
If the input table has a primary key, the output table will have the same primary key.
If the input table does not have a primary key, the output table will have a new primary key column that auto increments.
As of MySQL 8.4.1, the name of the new primary key
column is _4aad19ca6e_pk_id
. The
input table must not have a column with the name
_4aad19ca6e_pk_id
that is not a
primary key.
Before MySQL 8.4.1, the name of the new primary key
column is _id
. The input table must
not have a column with the name _id
that is not a primary key.
ML_EXPLAIN_TABLE
does not
support recommendation, anomaly detection, and topic modeling
models. A call with one of these models produces an error.
ML_EXPLAIN_TABLE
does not
support the following model types:
Forecasting
Recommendation
Anomaly detection
Anomaly detection for logs
Topic modeling
mysql>CALL sys.ML_EXPLAIN_TABLE(
table_name
,model_handle
,output_table_name
, [options
]);options
: { JSON_OBJECT("key
","value
"[,"key
","value
"] ...)"key","value"
: { ['prediction_explainer', {'permutation_importance'|'shap'}|NULL] ['batch_size', 'N
'] } }
Set the following required parameters.
table_name
: Specifies the fully
qualified name of the input table
(database_name.table_name
). The
input table should contain the same feature columns as the
table used to train the model. If the target column is
included in the input table, it is not considered when
generating prediction explanations.
model_handle
: Specifies the model
handle or a session variable containing the model handle.
See Work with
Model Handles.
output_table_name
: Specifies the table
where explanation data is stored. A fully qualified table
name must be specified
(database_name.table_name
). If
the table already exists, an error is returned.
Set the following options as needed.
prediction_explainer
: The name of the
prediction explainer that you have trained for this model
using ML_EXPLAIN
.
permutation_importance
: The default
prediction explainer.
shap
: The SHAP prediction
explainer, which produces feature importance values
based on Shapley values.
batch_size
: The size of each batch. You
can set a value between 1 and 100. For tables with over
ten columns, we recommend a value of
10
.
The following example generates explanations for a table
of data with the default Permutation Importance prediction
explainer. The
ML_EXPLAIN_TABLE
call
specifies the fully qualified name of the table to
generate explanations for, the session variable containing
the model handle, and the fully qualified output table
name.
mysql> CALL sys.ML_EXPLAIN_TABLE('census_data.census_train', @census_model, 'census_data.census_train_permutation', JSON_OBJECT('prediction_explainer', 'permutation_importance'));
To view ML_EXPLAIN_TABLE
results, query the output table. The
SELECT
statement retrieves explanation
data from the output table. The table includes the primary
key, _4aad19ca6e_pk_id
, and the
ml_results
column, which uses
JSON
format:
mysql> SELECT * FROM census_train_permutation LIMIT 3;
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| _4aad19ca6e_pk_id | age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | age_attribution | education-num_attribution | marital-status_attribution | education_attribution | hours-per-week_attribution | relationship_attribution | race_attribution | sex_attribution | workclass_attribution | fnlwgt_attribution | capital-gain_attribution | Notes | ml_results |
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 37 | Private | 99146 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 1977 | 50 | United-States | >50K | <=50K | -0.1 | -0.08 | -0.05 | -0.05 | -0.03 | -0.03 | 0.02 | -0.02 | 0.01 | 0 | 0 | race (White) had the largest impact towards predicting =50K, whereas age (37) contributed the most against predicting <=50K | {"attributions": {"age": -0.1, "education-num": -0.08, "marital-status": -0.05, "education": -0.05, "hours-per-week": -0.03, "relationship": -0.03, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "race (White) had the largest impact towards predicting <=50K, whereas age (37) contributed the most against predicting <=50K"} |
| 2 | 34 | Private | 27409 | 9th | 5 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K | <=50K | 0 | 0 | -0.04 | 0.06 | -0.03 | 0.02 | 0.02 | -0.02 | 0.01 | 0 | 0 | education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": -0.04, "education": 0.06, "hours-per-week": -0.03, "relationship": 0.02, "race": 0.02, "sex": -0.02, "workclass": 0.01, "fnlwgt": 0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "education (9th) had the largest impact towards predicting <=50K, whereas marital-status (Married-civ-spouse) contributed the most against predicting <=50K"} |
| 3 | 30 | Private | 299507 | Assoc-acdm | 12 | Separated | Other-service | Unmarried | White | Female | 0 | 0 | 40 | United-States | <=50K | <=50K | 0 | 0 | 0 | 0 | 0 | 0.03 | 0.01 | 0.02 | 0 | 0 | 0 | relationship (Unmarried) had the largest impact towards predicting <=50K | {"attributions": {"age": 0.0, "education-num": 0.0, "marital-status": 0.0, "education": 0.0, "hours-per-week": 0.0, "relationship": 0.03, "race": 0.01, "sex": 0.02, "workclass": 0.0, "fnlwgt": -0.0, "capital-gain": 0.0}, "predictions": {"revenue": "<=50K"}, "notes": "relationship (Unmarried) had the largest impact towards predicting <=50K"} |
+-------------------+-----+-----------+--------+------------+---------------+--------------------+-----------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+-----------------+---------------------------+----------------------------+-----------------------+----------------------------+--------------------------+------------------+-----------------+-----------------------+--------------------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+