MySQL HeatWave User Guide
ML_PREDICT_TABLE
generates
predictions for an entire table of trained data and saves the
results to an output table. Predictions are performed in
parallel.
ML_PREDICT_TABLE
is a compute
intensive process. Limiting operations to batches of 10 to 100
rows by splitting large tables into smaller tables is
recommended. Use batch processing with the
batch_size
option. See
ML_PREDICT_TABLE
and Track Progress
for MySQL HeatWave AutoML Routines to learn more.
Review the following:
Before running
ML_PREDICT_TABLE
, you must
train, and then load the model you want to use.
The following example trains a dataset with the classification machine learning task.
mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
The following example loads the trained model.
mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);
For more information about training and loading models, see Train a Model and Load a Model.
After training and loading the model, you can generate predictions for a table of data. For parameter and option descriptions, see ML_PREDICT_TABLE.
To generate predictions for a table, define the input table, the model handle, the output table, and any addtional options.
mysql> CALL sys.ML_PREDICT_TABLE(table_name
, model_handle
, output_table_name
, [options
]);
The following example generates predictions for the entire table in the trained and loaded model.
mysql> CALL sys.ML_PREDICT_TABLE('census_data.census_train', @census_model, 'census_data.census_train_predictions', NULL);
Where:
census_data.census_train
is the fully
qualified name of the test dataset table
(schema_name.table_name
). The
table must have the same feature column names as the
training dataset. The target column is not required. If
it present in the table, it is not considered when
generating predictions.
@census_model
is the session variable
that contains the model handle. Learn more about
Model
Handles.
census_data.census_train_predictions
is the output table where predictions are stored. A
fully qualified table name must be specified
(schema_name.table_name
). If
the table already exists, an error is returned.
NULL
sets no options to the routine.
When the output table is created, you can query a sample of the table to review predictions.
mysql> SELECT * FROM table_name
LIMIT N
;
Replace table_name
with your own
table name, and N
with the number
of rows from the table you want to view.
The following example queries the top five rows of the output table.
mysql> SELECT * FROM census_train_predictions LIMIT 5;
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
| _4aad19ca6e_pk_id | age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | ml_results |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
| 1 | 37 | Private | 99146 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 1977 | 50 | United-States | >50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.58, ">50K": 0.42}} |
| 2 | 34 | Private | 27409 | 9th | 5 | Married-civ-spouse | Craft-repair | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.76, ">50K": 0.24}} |
| 3 | 30 | Private | 299507 | Assoc-acdm | 12 | Separated | Other-service | Unmarried | White | Female | 0 | 0 | 40 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.99, ">50K": 0.01}} |
| 4 | 62 | Self-emp-not-inc | 102631 | Some-college | 10 | Widowed | Farming-fishing | Unmarried | White | Female | 0 | 0 | 50 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.9, ">50K": 0.1}} |
| 5 | 51 | Private | 153486 | Some-college | 10 | Married-civ-spouse | Handlers-cleaners | Husband | White | Male | 0 | 0 | 40 | United-States | <=50K | <=50K | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.7, ">50K": 0.3}} |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
5 rows in set (0.0014 sec)
The predictions and associated probabilities are displayed
in the ml_results
column. You can compare
the predicted revenue values with the real revenue values in
the table. If needed, you can refine and train different
sets of data to try and generate more reliable predictions.
Review ML_PREDICT_TABLE for parameter descriptions and options.
After generating predictions on a table, learn how to Generate Explanations on a table to get insights into which features have the most influence on the predictions.
Learn how to Generate Predictions for a Row of Data.
Learn how to Score a Model to get insight into the quality of the model.