MySQL HeatWave User Guide

6.5.4.2 Generate Predictions for a Table

ML_PREDICT_TABLE generates predictions for an entire table of trained data and saves the results to an output table. Predictions are performed in parallel.

ML_PREDICT_TABLE is a compute intensive process. Limiting operations to batches of 10 to 100 rows by splitting large tables into smaller tables is recommended. Use batch processing with the batch_size option. See ML_PREDICT_TABLE and Track Progress for MySQL HeatWave AutoML Routines to learn more.

Before You Begin
Prepare to Generate Predictions for a Table

Before running ML_PREDICT_TABLE, you must train, and then load the model you want to use.

  1. The following example trains a dataset with the classification machine learning task.

    mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
    
  2. The following example loads the trained model.

    mysql> CALL sys.ML_MODEL_LOAD(@census_model, NULL);
    

For more information about training and loading models, see Train a Model and Load a Model.

After training and loading the model, you can generate predictions for a table of data. For parameter and option descriptions, see ML_PREDICT_TABLE.

Generate Predictions for a Table

To generate predictions for a table, define the input table, the model handle, the output table, and any addtional options.

mysql> CALL sys.ML_PREDICT_TABLE(table_name, model_handle, output_table_name, [options]);

The following example generates predictions for the entire table in the trained and loaded model.

mysql> CALL sys.ML_PREDICT_TABLE('census_data.census_train', @census_model, 'census_data.census_train_predictions', NULL);

Where:

  • census_data.census_train is the fully qualified name of the test dataset table (schema_name.table_name). The table must have the same feature column names as the training dataset. The target column is not required. If it present in the table, it is not considered when generating predictions.

  • @census_model is the session variable that contains the model handle. Learn more about Model Handles.

  • census_data.census_train_predictions is the output table where predictions are stored. A fully qualified table name must be specified (schema_name.table_name). If the table already exists, an error is returned.

  • NULL sets no options to the routine.

When the output table is created, you can query a sample of the table to review predictions.

mysql> SELECT * FROM table_name LIMIT N;

Replace table_name with your own table name, and N with the number of rows from the table you want to view.

The following example queries the top five rows of the output table.

mysql> SELECT * FROM census_train_predictions LIMIT 5;
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
| _4aad19ca6e_pk_id | age | workclass        | fnlwgt | education    | education-num | marital-status     | occupation        | relationship | race  | sex    | capital-gain | capital-loss | hours-per-week | native-country | revenue | Prediction | ml_results                                                                            |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
|                 1 |  37 | Private          |  99146 | Bachelors    |            13 | Married-civ-spouse | Exec-managerial   | Husband      | White | Male   |            0 |         1977 |             50 | United-States  | >50K    | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.58, ">50K": 0.42}} |
|                 2 |  34 | Private          |  27409 | 9th          |             5 | Married-civ-spouse | Craft-repair      | Husband      | White | Male   |            0 |            0 |             50 | United-States  | <=50K   | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.76, ">50K": 0.24}} |
|                 3 |  30 | Private          | 299507 | Assoc-acdm   |            12 | Separated          | Other-service     | Unmarried    | White | Female |            0 |            0 |             40 | United-States  | <=50K   | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.99, ">50K": 0.01}} |
|                 4 |  62 | Self-emp-not-inc | 102631 | Some-college |            10 | Widowed            | Farming-fishing   | Unmarried    | White | Female |            0 |            0 |             50 | United-States  | <=50K   | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.9, ">50K": 0.1}}   |
|                 5 |  51 | Private          | 153486 | Some-college |            10 | Married-civ-spouse | Handlers-cleaners | Husband      | White | Male   |            0 |            0 |             40 | United-States  | <=50K   | <=50K      | {"predictions": {"revenue": "<=50K"}, "probabilities": {"<=50K": 0.7, ">50K": 0.3}}   |
+-------------------+-----+------------------+--------+--------------+---------------+--------------------+-------------------+--------------+-------+--------+--------------+--------------+----------------+----------------+---------+------------+---------------------------------------------------------------------------------------+
5 rows in set (0.0014 sec)

The predictions and associated probabilities are displayed in the ml_results column. You can compare the predicted revenue values with the real revenue values in the table. If needed, you can refine and train different sets of data to try and generate more reliable predictions.

What's Next