MySQL HeatWave User Guide
ML_PREDICT_ROW
generates
predictions for one or more rows of unlabeled data specified in
JSON
format. Invoke
ML_PREDICT_ROW
with a
SELECT
statement.
A call to ML_PREDICT_ROW
can
include columns that were not present during
ML_TRAIN
. A table can include
extra columns, and still use the MySQL HeatWave AutoML model. This allows
side by side comparisons of target column labels, ground truth,
and predictions in the same table.
ML_PREDICT_ROW
ignores any extra
columns, and appends them to the results.
ML_PREDICT_ROW
does not support
the following model types:
Forecasting
Anomaly detection for logs
This topic has the following sections.
mysql>SELECT sys.ML_PREDICT_ROW(
input_data
,model_handle
), [options
]);options
: { JSON_OBJECT("key
","value
"[,"key
","value
"] ...)"key","value"
: { ['threshold', 'N
'] ['topk', 'N
'] ['recommend', {'ratings'|'items'|'users'|'users_to_items'|'items_to_users'|'items_to_items'|'users_to_users'}|NULL] ['remove_seen', {'true'|'false'}] ['additional_details', {'true'|'false'}] } }
Set the following required parameters:
input_data
: Define the data to generate
predictions for. The column names must match the feature
column names in the table used to train the model. You can
define the input data in the following ways:
Specify a single row of data in JSON
format.
mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("column_name
", value
, "column_name
", value
, ...), model_handle
, options
);
Run ML_PREDICT_ROW
on
multiple rows of data by specifying the columns as
key-value pairs in JSON
format and
select from a table.
mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT("output_col_name
", schema
.`input_col_name
`, "output_col_name
", schema
.`input_col_name
`, ...), model_handle
, options
)
FROM input_table_name
LIMIT N
;
model_handle
: Define the model handle
or a session variable that contains the model handle. See
Work with Model
Handles.
Review the following options in JSON
format.
To view data drift detection values for classification,
regression, and anomaly detection (as of MySQL 9.3.2) models,
set the additional_details
option to
true
. The ml_results
includes the drift
JSON object literal. See
Analyze Data
Drift.
Set the following options as needed for Recommendation models.
topk
: Specify the number of
recommendations to provide as a positive integer. The
default is 3
.
recommend
: Specify what to recommend.
ratings
: Use this option to predict
ratings. This is the default value.
The target column is prediction
,
and the values are float
.
The input table must contain at least two columns with the same names as the user column and item column from the training model.
items
: Use this option to recommend
items for users.
The target column is
item_recommendation
, and the values
are:
JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"), "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))
The input table must contain at least one column with the same name as the user column from the training model.
users
: Use this option to recommend
users for items.
The target column is
user_recommendation
, and the values
are:
JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"), "column_rating_name" , JSON_ARRAY(rating_1, ..., rating_k))
The input table must contain at least one column with the same name as the item column from the training model.
users_to_items
: This is the same as
items
.
items_to_users
: This is the same as
users
.
items_to_items
: Use this option to
recommend similar items for items.
The target column is
item_recommendation
, and the values
are:
JSON_OBJECT("column_item_id_name", JSON_ARRAY("item_1", ... , "item_k"))
The input table must contain at least contain a column with the same name as the item column from the training model.
users_to_users
: Use this option to
recommend similar users for users.
The target column is
user_recommendation
, and the values
are:
JSON_OBJECT("column_user_id_name", JSON_ARRAY("user_1", ... , "user_k"))
The input table must contain at least one column with the same name as the user column from the training model.
remove_seen
: If the input table
overlaps with the training table, and
remove_seen
is true
,
then the model will not repeat existing interactions. The
default is true
. Set
remove_seen
to false
to repeat existing interactions from the training table.
Set the following options as needed for anomaly detection models.
threshold
: The threshold you set on
anomaly detection models determines which rows in the
output table are labeled as anomalies with an anomaly
score of 1
, or normal with an anomaly
score of 0
. The value for the threshold
is the degree to which a row of data or log segment is
considered for anomaly detection. Any sample with an
anomaly score above the threshold is classified an
anomaly. The default value is (1 -
contamination
)-th percentile of all the
anomaly scores.
The following example generates a prediction on a single
row of data. The results include the
ml_results
field, which uses
JSON
format. Optionally use
\G
to display the information in an
easily readable format.
mysql>SET @row_input = JSON_OBJECT( "age", 25, "workclass", "Private", "fnlwgt", 226802, "education", "11th", "education-num", 7, "marital-status", "Never-married", "occupation", "Machine-op-inspct", "relationship", "Own-child", "race", "Black", "sex", "Male", "capital-gain", 0, "capital-loss", 0, "hours-per-week", 40, "native-country", "United-States");
mysql>SELECT sys.ML_PREDICT_ROW(@row_input, @census_model, NULL)\G
*************************** 1. row *************************** sys.ML_PREDICT_ROW(@row_input, @census_model, NULL): { "age": 25, "sex": "Male", "race": "Black", "fnlwgt": 226802, "education": "11th", "workclass": "Private", "Prediction": "<=50K", "ml_results": { "predictions": { "revenue": "<=50K" }, "probabilities": { ">50K": 0.0032, "<=50K": 0.9968 } }, "occupation": "Machine-op-inspct", "capital-gain": 0, "capital-loss": 0, "relationship": "Own-child", "education-num": 7, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States" } 1 row in set (2.2218 sec)
The following example generates predictions on two rows of
data from the input table. Optionally use
\G
to display the information in an
easily readable format.
mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT(
"age", census_train.`age`,
"workclass", census_train.`workclass`,
"fnlwgt", census_train.`fnlwgt`,
"education", census_train.`education`,
"education-num", census_train.`education-num`,
"marital-status", census_train.`marital-status`,
"occupation", census_train.`occupation`,
"relationship", census_train.`relationship`,
"race", census_train.`race`,
"sex", census_train.`sex`,
"capital-gain", census_train.`capital-gain`,
"capital-loss", census_train.`capital-loss`,
"hours-per-week", census_train.`hours-per-week`,
"native-country", census_train.`native-country`),
@census_model, NULL)FROM census_data.census_train LIMIT 2\G
*************************** 1. row ***************************
sys.ML_PREDICT_ROW(JSON_OBJECT(
"age", census_train.`age`,
"workclass", census_train.`workclass`,
"fnlwgt", census_train.`fnlwgt`,
"education", census_train.`education`,
"education-num", census_train.`education-num`,
"marital-status", census_train.`marita: {
"age": 62,
"sex": "Female",
"race": "White",
"fnlwgt": 123582,
"education": "10th",
"workclass": "Private",
"Prediction": "<=50K",
"ml_results": {
"predictions": {
"revenue": "<=50K"
},
"probabilities": {
">50K": 0.0106,
"<=50K": 0.9894
}
},
"occupation": "Other-service",
"capital-gain": 0,
"capital-loss": 0,
"relationship": "Unmarried",
"education-num": 6,
"hours-per-week": 40,
"marital-status": "Divorced",
"native-country": "United-States"
}
*************************** 2. row ***************************
sys.ML_PREDICT_ROW(JSON_OBJECT(
"age", census_train.`age`,
"workclass", census_train.`workclass`,
"fnlwgt", census_train.`fnlwgt`,
"education", census_train.`education`,
"education-num", census_train.`education-num`,
"marital-status", census_train.`marita: {
"age": 32,
"sex": "Female",
"race": "White",
"fnlwgt": 174215,
"education": "Bachelors",
"workclass": "Federal-gov",
"Prediction": "<=50K",
"ml_results": {
"predictions": {
"revenue": "<=50K"
},
"probabilities": {
">50K": 0.3249,
"<=50K": 0.6751
}
},
"occupation": "Exec-managerial",
"capital-gain": 0,
"capital-loss": 0,
"relationship": "Not-in-family",
"education-num": 13,
"hours-per-week": 60,
"marital-status": "Never-married",
"native-country": "United-States"
}
2 rows in set (9.6548 sec)
The following example uses explicit feedback and runs the
ML_PREDICT_ROW
routine to
predict the top 3 items that a particular user will like
with the users_to_items
option.
mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 3));
+----------------------------------------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 3)) |
+----------------------------------------------------------------------------------------------------------------------+
| {"user_id": "846", "ml_results": "{"predictions": {"item_id": ["313", "483", "64"], "rating": [4.06, 4.05, 4.04]}}"} |
+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.2811 sec)
The following example generates predictions on ten rows
from an input table. The
additional_details
parameter is set to
TRUE, so you can review data drift details.
mysql> SELECT sys.ML_PREDICT_ROW(JSON_OBJECT(
"age", census_test.`age`,
"workclass", census_test.`workclass`,
"fnlwgt", census_test.`fnlwgt`,
"education", census_test.`education`,
"education-num", census_test.`education-num`,
"marital-status", census_test.`marital-status`,
"occupation", census_test.`occupation`,
"relationship", census_test.`relationship`,
"race", census_test.`race`,
"sex", census_test.`sex`,
"capital-gain", census_test.`capital-gain`,
"capital-loss", census_test.`capital-loss`,
"hours-per-week", census_test.`hours-per-week`,
"native-country", census_test.`native-country`),
@census_model, JSON_OBJECT('additional_details', TRUE))FROM census_data.census_test LIMIT 10;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW(JSON_OBJECT(
"age", census_test.`age`,
"workclass", census_test.`workclass`,
"fnlwgt", census_test.`fnlwgt`,
"education", census_test.`education`,
"education-num", census_test.`education-num`,
"ma |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {"age": 37, "sex": "Male", "race": "White", "fnlwgt": 99146, "education": "Bachelors", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"age": 0.0, "fnlwgt": 46.67, "capital-gain": 0.0}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.42, "<=50K": 0.58}}, "occupation": "Exec-managerial", "capital-gain": 0, "capital-loss": 1977, "relationship": "Husband", "education-num": 13, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"} |
| {"age": 34, "sex": "Male", "race": "White", "fnlwgt": 27409, "education": "9th", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.1, "attribution_percent": {"fnlwgt": 25.0, "education": 33.31, "workclass": 16.22}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.24, "<=50K": 0.76}}, "occupation": "Craft-repair", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 5, "hours-per-week": 50, "marital-status": "Married-civ-spouse", "native-country": "United-States"} |
| {"age": 30, "sex": "Female", "race": "White", "fnlwgt": 299507, "education": "Assoc-acdm", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.26, "attribution_percent": {"relationship": 21.36, "education-num": 28.33, "hours-per-week": 33.21}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.01, "<=50K": 0.99}}, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 12, "hours-per-week": 40, "marital-status": "Separated", "native-country": "United-States"} |
| {"age": 62, "sex": "Female", "race": "White", "fnlwgt": 102631, "education": "Some-college", "workclass": "Self-emp-not-inc", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"fnlwgt": 5.93, "relationship": 26.58, "hours-per-week": 35.69}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.1, "<=50K": 0.9}}, "occupation": "Farming-fishing", "capital-gain": 0, "capital-loss": 0, "relationship": "Unmarried", "education-num": 10, "hours-per-week": 50, "marital-status": "Widowed", "native-country": "United-States"} |
| {"age": 51, "sex": "Male", "race": "White", "fnlwgt": 153486, "education": "Some-college", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"sex": 7.84, "workclass": 7.84, "education-num": 83.96}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.3, "<=50K": 0.7}}, "occupation": "Handlers-cleaners", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 10, "hours-per-week": 40, "marital-status": "Married-civ-spouse", "native-country": "United-States"} |
| {"age": 34, "sex": "Male", "race": "Black", "fnlwgt": 434292, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 2.46, "attribution_percent": {"education": 12.36, "relationship": 22.07, "education-num": 19.92}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.33, "<=50K": 0.67}}, "occupation": "Other-service", "capital-gain": 0, "capital-loss": 0, "relationship": "Husband", "education-num": 9, "hours-per-week": 30, "marital-status": "Married-civ-spouse", "native-country": "United-States"} |
| {"age": 28, "sex": "Male", "race": "White", "fnlwgt": 240172, "education": "Masters", "workclass": "Self-emp-not-inc", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.23, "attribution_percent": {"sex": 17.41, "fnlwgt": 21.67, "workclass": 17.41}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.24, "<=50K": 0.76}}, "occupation": "Prof-specialty", "capital-gain": 0, "capital-loss": 0, "relationship": "Own-child", "education-num": 14, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States"} |
| {"age": 56, "sex": "Male", "race": "White", "fnlwgt": 219426, "education": "10th", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.06, "attribution_percent": {"age": 27.74, "race": 22.22, "education-num": 25.1}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.13, "<=50K": 0.87}}, "occupation": "Handlers-cleaners", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 6, "hours-per-week": 40, "marital-status": "Never-married", "native-country": "United-States"} |
| {"age": 46, "sex": "Female", "race": "White", "fnlwgt": 295791, "education": "HS-grad", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"race": 9.66, "capital-gain": 41.59, "marital-status": 38.47}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.17, "<=50K": 0.83}}, "occupation": "Tech-support", "capital-gain": 0, "capital-loss": 0, "relationship": "Not-in-family", "education-num": 9, "hours-per-week": 30, "marital-status": "Divorced", "native-country": "United-States"} |
| {"age": 46, "sex": "Male", "race": "White", "fnlwgt": 114032, "education": "Some-college", "workclass": "Private", "Prediction": "<=50K", "ml_results": {"drift": {"metric": 0.0, "attribution_percent": {"age": 0.0, "capital-gain": 0.0, "education-num": 100.0}}, "predictions": {"revenue": "<=50K"}, "probabilities": {">50K": 0.34, "<=50K": 0.66}}, "occupation": "Tech-support", "capital-gain": 0, "capital-loss": 1887, "relationship": "Husband", "education-num": 10, "hours-per-week": 45, "marital-status": "Married-civ-spouse", "native-country": "United-States"} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (6.8109 sec)
The following example uses a recommendation model with implicit feedback to predict the ranking for a particular user and item.
mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "836", "item_id": "226"}', @model, NULL);
+---------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW('{"user_id": "836", "item_id": "226"}', @model, NULL) |
+---------------------------------------------------------------------------------------+
| {"item_id": "226", "user_id": "836", "ml_results": {"predictions": {"rating": 2.46}}} |
+---------------------------------------------------------------------------------------+
1 row in set (0.1390 sec)
The following example uses a recommendation model with
explicit feedback to predict the top two items that a
particular user will like with the
users_to_items
option.
mysql> SELECT sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 2));
+----------------------------------------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW('{"user_id": "846"}', @model, JSON_OBJECT("recommend", "users_to_items", "topk", 2)) |
+----------------------------------------------------------------------------------------------------------------------+
| {"user_id": "846", "ml_results": "{"predictions": {"item_id": ["313", "483"], "rating": [4.06, 4.05]}}"} |
+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.2811 sec)
The following example uses a recommendation model with explicit feedback to predict the top two items similar to another item.
mysql> SELECT sys.ML_PREDICT_ROW('{"item_id": "524"}', @model, JSON_OBJECT("recommend", "items_to_items", "topk", 2));
+------------------------------------------------------------------------------------------------------------------------+
| sys.ML_PREDICT_ROW('{"item_id": "524"}', @model, JSON_OBJECT("recommend", "items_to_items", "topk", 2)) |
+------------------------------------------------------------------------------------------------------------------------+
| {"item_id": "524", "ml_results": "{"predictions": {"item_id": ["665", "633"], "similarity": [1.0, 1.0]}}"} |
+------------------------------------------------------------------------------------------------------------------------+