10.2.5 ML_PREDICT

10.2.5 ML_PREDICT_ROW

ML_PREDICT_ROW generates predictions for one or more rows of unlabeled data specified in JSON format. Invoke ML_PREDICT_ROW with a SELECT statement.

A call to ML_PREDICT_ROW can include columns that were not present during ML_TRAIN. A table can include extra columns, and still use the MySQL HeatWave AutoML model. This allows side by side comparisons of target column labels, ground truth, and predictions in the same table. ML_PREDICT_ROW ignores any extra columns, and appends them to the results.

ML_PREDICT_ROW does not support the following model types:

Forecasting
Anomaly detection for logs
Recommendation models trained with the TwoTower mdoel.

This topic has the following sections.

ML_PREDICT_ROW Syntax

mysql> SELECT sys.ML_PREDICT_ROW(input_data, model_handle), options);

options: {JSON_OBJECT(keyvalue[, keyvalue]...)|NULL}
keyvalue:
{  
  'threshold', 'N'
  |'topk', 'N'
  |'recommend', {'ratings'|'items'|'users'|'users_to_items'|'items_to_users'|'items_to_items'|'users_to_users'|NULL}
  |'remove_seen', {'true'|'false'}
  |'additional_details', {'true'|'false'}
}

Required ML_PREDICT_ROW Parameters

Set the following required parameters:

input_data: Define the data to generate predictions for. The column names must match the feature column names in the table used to train the model. You can define the input data in the following ways:
Specify a single row of data in JSON format.
```
mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("column_name", value, "column_name", value, ...), model_handle, options);
```
Run ML_PREDICT_ROW on multiple rows of data by specifying the columns as key-value pairs in JSON format and select from a table.
```
mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("output_col_name", schema.`input_col_name`, "output_col_name", schema.`input_col_name`, ...), 
       model_handle, options) FROM input_table_name LIMIT N;
```
model_handle: Define the model handle or a session variable that contains the model handle. See Work with Model Handles.

Review the following options in JSON format.

ML_PREDICT ROW Option for Data Drift Detection

To view data drift detection values for classification, regression, and anomaly detection (as of MySQL 9.3.2) models, set the additional_details option to true. The ml_results includes the drift JSON object literal. See Analyze Data Drift.

ML_PREDICT_ROW Options for Recommendation Models

Set the following options as needed for Recommendation models.

topk: Specify the number of recommendations to provide as a positive integer. The default is 3.
recommend: Specify what to recommend.
- ratings: Use this option to predict ratings. This is the default value.
  The target column is prediction, and the values are float.
  The input table must contain at least two columns with the same names as the user column and item column from the training model.
- items: Use this option to recommend items for users.
  The target column is item_recommendation, and the values are:
```
JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"), "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))
```
  The input table must contain at least one column with the same name as the user column from the training model.
- users: Use this option to recommend users for items.
  The target column is user_recommendation, and the values are:
```
JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"), "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))
```
  The input table must contain at least one column with the same name as the item column from the training model.
- users_to_items: This is the same as items.
- items_to_users: This is the same as users.
- items_to_items: Use this option to recommend similar items for items.
  The target column is item_recommendation, and the values are:
```
JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"))
```
  The input table must contain at least contain a column with the same name as the item column from the training model.
- users_to_users: Use this option to recommend similar users for users.
  The target column is user_recommendation, and the values are:
```
JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"))
```
  The input table must contain at least one column with the same name as the user column from the training model.
remove_seen: If the input table overlaps with the training table, and remove_seen is true, then the model will not repeat existing interactions. The default is true. Set remove_seen to false to repeat existing interactions from the training table.

Options for Anomaly Detection Models

Set the following options as needed for anomaly detection models.

threshold: The threshold you set on anomaly detection models determines which rows in the output table are labeled as anomalies with an anomaly score of 1, or normal with an anomaly score of 0. The value for the threshold is the degree to which a row of data or log segment is considered for anomaly detection. Any sample with an anomaly score above the threshold is classified an anomaly. The default value is (1 - contamination)-th percentile of all the anomaly scores.

Syntax Examples

The following example generates a prediction on a single row of data. The results include the ml_results field, which uses JSON format. Optionally use \G to display the information in an easily readable format.

mysql> SET @row_input = JSON_OBJECT( 
          "age", 25, 
          "workclass", "Private", 
          "fnlwgt", 226802, 
          "education", "11th", 
          "education-num", 7, 
          "marital-status", "Never-married", 
          "occupation", "Machine-op-inspct", 
          "relationship", "Own-child", 
          "race", "Black", 
          "sex", "Male", 
          "capital-gain", 0, 
          "capital-loss", 0, 
          "hours-per-week", 40, 
          "native-country", "United-States"); 
mysql> SELECT sys.ML_PREDICT_ROW(@row_input, @census_model, NULL)\G
*************************** 1. row ***************************
sys.ML_PREDICT_ROW(@row_input, @census_model, NULL): 
{
    "age": 25,
    "sex": "Male",
    "race": "Black",
    "fnlwgt": 226802,
    "education": "11th",
    "workclass": "Private",
    "Prediction": "<=50K",
    "ml_results": {
        "predictions": {
            "revenue": "<=50K"
        },
        "probabilities": {
            ">50K": 0.0032,
            "<=50K": 0.9968
        }
    },
    "occupation": "Machine-op-inspct",
    "capital-gain": 0,
    "capital-loss": 0,
    "relationship": "Own-child",
    "education-num": 7,
    "hours-per-week": 40,
    "marital-status": "Never-married",
    "native-country": "United-States"
}
1 row in set (2.2218 sec)

The following example generates predictions on two rows of data from the input table. Optionally use \G to display the information in an easily readable format.

mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT(
	"age", census_train.`age`,
	"workclass", census_train.`workclass`,
	"fnlwgt", census_train.`fnlwgt`,
	"education", census_train.`education`,
	"education-num", census_train.`education-num`,
	"marital-status", census_train.`marital-status`,
	"occupation", census_train.`occupation`,
	"relationship", census_train.`relationship`,
	"race", census_train.`race`,
	"sex", census_train.`sex`,
	"capital-gain", census_train.`capital-gain`,
	"capital-loss", census_train.`capital-loss`,
	"hours-per-week", census_train.`hours-per-week`,
	"native-country", census_train.`native-country`),
	@census_model, NULL)FROM census_data.census_train LIMIT 2\G
*************************** 1. row ***************************
sys.ML_PREDICT_ROW(JSON_OBJECT(
"age", census_train.`age`,
"workclass", census_train.`workclass`,
"fnlwgt", census_train.`fnlwgt`,
"education", census_train.`education`,
"education-num", census_train.`education-num`,
"marital-status", census_train.`marita: {
                                            "age": 62,
                                            "sex": "Female",
                                            "race": "White",
                                            "fnlwgt": 123582,
                                            "education": "10th",
                                            "workclass": "Private",
                                            "Prediction": "<=50K",
                                            "ml_results": {
                                                "predictions": {
                                                    "revenue": "<=50K"
                                                },
                                                "probabilities": {
                                                    ">50K": 0.0106,
                                                    "<=50K": 0.9894
                                                }
                                            },
                                            "occupation": "Other-service",
                                            "capital-gain": 0,
                                            "capital-loss": 0,
                                            "relationship": "Unmarried",
                                            "education-num": 6,
                                            "hours-per-week": 40,
                                            "marital-status": "Divorced",
                                            "native-country": "United-States"
                                        }
*************************** 2. row ***************************
sys.ML_PREDICT_ROW(JSON_OBJECT(
"age", census_train.`age`,
"workclass", census_train.`workclass`,
"fnlwgt", census_train.`fnlwgt`,
"education", census_train.`education`,
"education-num", census_train.`education-num`,
"marital-status", census_train.`marita: {
                                            "age": 32,
                                            "sex": "Female",
                                            "race": "White",
                                            "fnlwgt": 174215,
                                            "education": "Bachelors",
                                            "workclass": "Federal-gov",
                                            "Prediction": "<=50K",
                                            "ml_results": {
                                                "predictions": {
                                                    "revenue": "<=50K"
                                                },
                                                "probabilities": {
                                                    ">50K": 0.3249,
                                                    "<=50K": 0.6751
                                                }
                                            },
                                            "occupation": "Exec-managerial",
                                            "capital-gain": 0,
                                            "capital-loss": 0,
                                            "relationship": "Not-in-family",
                                            "education-num": 13,
                                            "hours-per-week": 60,
                                            "marital-status": "Never-married",
                                            "native-country": "United-States"
                                        }
2 rows in set (9.6548 sec)

The following example uses explicit feedback and runs the ML_PREDICT_ROW routine to predict the top 3 items that a particular user will like with the users_to_items option.

mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 3));
+----------------------------------------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW('{"user_id": "846"}', @model,  JSON_OBJECT("recommend", "users_to_items", "topk", 3))             |
+----------------------------------------------------------------------------------------------------------------------+
| {"user_id": "846", "ml_results": "{"predictions": {"item_id": ["313", "483", "64"], "rating": [4.06, 4.05, 4.04]}}"} |
+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.2811 sec)

The following example generates predictions on ten rows from an input table. The additional_details parameter is set to TRUE, so you can review data drift details.

mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT(
	"age", census_test.`age`,
	"workclass", census_test.`workclass`,
	"fnlwgt", census_test.`fnlwgt`,
	"education", census_test.`education`,
	"education-num", census_test.`education-num`,
	"marital-status", census_test.`marital-status`,
	"occupation", census_test.`occupation`,
	"relationship", census_test.`relationship`,
	"race", census_test.`race`,
	"sex", census_test.`sex`,
	"capital-gain", census_test.`capital-gain`,
	"capital-loss", census_test.`capital-loss`,
	"hours-per-week", census_test.`hours-per-week`,
	"native-country", census_test.`native-country`),
	@census_model, JSON_OBJECT('additional_details', TRUE))FROM census_data.census_test LIMIT 10;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW(JSON_OBJECT(
"age", census_test.`age`,
"workclass", census_test.`workclass`,
"fnlwgt", census_test.`fnlwgt`,
"education", census_test.`education`,
"education-num", census_test.`education-num`,
"ma                                                                                                                                                                                                                                                                                                                           |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {"age": 37, "sex": "Male", "race": "White", "fnlwgt": 99146, "education": "Bachelors", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"age": 0.0, "fnlwgt": 46.67, "capital-gain": 0.0}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.42, "<=50K": 0.58}}, "occupation": "Exec-managerial", "capital-gain": 0, "capital-loss": 1977, "relationship": "Husband", "education-num": 13, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"}                |
| {"age": 34, "sex": "Male", "race": "White", "fnlwgt": 27409, "education": "9th", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.1, "attribution_percent": {"fnlwgt": 25.0, "education": 33.31, "workclass": 16.22}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.24, "<=50K": 0.76}}, "occupation": "Craft-repair", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 5, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"}                       |
| {"age": 30, "sex": "Female", "race": "White", "fnlwgt": 299507, "education": "Assoc-acdm", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.26, "attribution_percent": {"relationship": 21.36, "education-num": 28.33, "hours-per-week": 33.21}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.01, "<=50K": 0.99}}, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 12, "hours-per-week": 40, "marital-status": "Separated", "native-country": "United-States"} |
| {"age": 62, "sex": "Female", "race": "White", "fnlwgt": 102631, "education": "Some-college", "workclass": "Self-emp-not-inc", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"fnlwgt": 5.93, "relationship": 26.58, "hours-per-week": 35.69}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.1, "<=50K": 0.9}}, "occupation": "Farming-fishing", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 10, "hours-per-week": 50, "marital-status": "Widowed", "native-country": "United-States"} |
| {"age": 51, "sex": "Male", "race": "White", "fnlwgt": 153486, "education": "Some-college", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"sex": 7.84, "workclass": 7.84, "education-num": 83.96}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.3, "<=50K": 0.7}}, "occupation": "Handlers-cleaners", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 10, "hours-per-week": 40, "marital-status": "Married-civ-spouse", "native-country": "United-States"}         |
| {"age": 34, "sex": "Male", "race": "Black", "fnlwgt": 434292, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 2.46, "attribution_percent": {"education": 12.36, "relationship": 22.07, "education-num": 19.92}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.33, "<=50K": 0.67}}, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 9, "hours-per-week": 30, "marital-status": "Married-civ-spouse", "native-country": "United-States"}     |
| {"age": 28, "sex": "Male", "race": "White", "fnlwgt": 240172, "education": "Masters", "workclass": "Self-emp-not-inc", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.23, "attribution_percent": {"sex": 17.41, "fnlwgt": 21.67, "workclass": 17.41}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.24, "<=50K": 0.76}}, "occupation": "Prof-specialty", "capital-gain": 0, "capital-loss": 0, "relationship": "Own-child", "education-num": 14, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States"}             |
| {"age": 56, "sex": "Male", "race": "White", "fnlwgt": 219426, "education": "10th", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.06, "attribution_percent": {"age": 27.74, "race": 22.22, "education-num": 25.1}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.13, "<=50K": 0.87}}, "occupation": "Handlers-cleaners", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 6, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States"}                  |
| {"age": 46, "sex": "Female", "race": "White", "fnlwgt": 295791, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"race": 9.66, "capital-gain": 41.59, "marital-status": 38.47}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.17, "<=50K": 0.83}}, "occupation": "Tech-support", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 9, "hours-per-week": 30, "marital-status": "Divorced", "native-country": "United-States"}              |
| {"age": 46, "sex": "Male", "race": "White", "fnlwgt": 114032, "education": "Some-college", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"age": 0.0, "capital-gain": 0.0, "education-num": 100.0}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.34, "<=50K": 0.66}}, "occupation": "Tech-support", "capital-gain": 0, "capital-loss": 1887, "relationship": "Husband", "education-num": 10, "hours-per-week": 45, "marital-status": "Married-civ-spouse", "native-country": "United-States"}        |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (6.8109 sec)

The following example uses a recommendation model with implicit feedback to predict the ranking for a particular user and item.

mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "836", "item_id": "226"}', @model,  NULL);
+---------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW('{"user_id": "836", "item_id": "226"}', @model,  NULL)             |
+---------------------------------------------------------------------------------------+
| {"item_id": "226", "user_id": "836", "ml_results": {"predictions": {"rating": 2.46}}} |
+---------------------------------------------------------------------------------------+
1 row in set (0.1390 sec)

The following example uses a recommendation model with explicit feedback to predict the top two items that a particular user will like with the users_to_items option.

mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 2));
+----------------------------------------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW('{"user_id": "846"}', @model,  JSON_OBJECT("recommend", "users_to_items", "topk", 2))             |
+----------------------------------------------------------------------------------------------------------------------+
| {"user_id": "846", "ml_results": "{"predictions": {"item_id": ["313", "483"], "rating": [4.06, 4.05]}}"} |
+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.2811 sec)

The following example uses a recommendation model with explicit feedback to predict the top two items similar to another item.

mysql> SELECT sys.ML_PREDICT_ROW('{"item_id": "524"}', @model, JSON_OBJECT("recommend", "items_to_items", "topk", 2));
+------------------------------------------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW('{"item_id": "524"}', @model,  JSON_OBJECT("recommend", "items_to_items", "topk", 2))               |
+------------------------------------------------------------------------------------------------------------------------+
| {"item_id": "524", "ml_results": "{"predictions": {"item_id": ["665", "633"], "similarity": [1.0, 1.0]}}"} |
+------------------------------------------------------------------------------------------------------------------------+