MySQL AI User Guide
ML_SCORE
scores a model by generating predictions using the feature
columns in a labeled dataset as input and comparing the
predictions to ground truth values in the target column of the
labeled dataset. The dataset used with
ML_SCORE
should have the same feature columns as the dataset used to
train the model but the data should be different. For example,
you might reserve 20 to 30 percent of the labeled training data
for scoring.
ML_SCORE
returns a computed metric indicating the quality of the model.
mysql>CALL sys.ML_SCORE(table_name,target_column_name,model_handle,metric,score, [options]);options: { JSON_OBJECT("key","value"[,"key","value"] ...)"key","value": { ['threshold', 'N'] ['topk', 'N'] ['remove_seen', {'true'|'false'}] ['item_metadata', JSON_OBJECT('table_name'[,'database_name.table_name'] ...)] ['user_metadata', JSON_OBJECT('table_name'[,'database_name.table_name'] ...)] } }
Set the following required parameters.
table_name: Specifies the fully
qualified name of the table used to compute model quality
(database_name.table_name). The
table must contain the same columns as the training
dataset.
target_column_name: If scoring a
supervised or semi-supervised model, specify the name of
the target column containing ground truth values. If
scoring an unsupervised model, set to
NULL. See
AutoML
Learning Types.
model_handle: Specifies the model
handle or a session variable containing the model handle.
See Work with
Model Handles.
metric: Specifies the name of the
metric. The metric selected must be compatible with the
task type used for training the model.
See Optimization
and Scoring Metrics.
score: Specifies the user-defined
variable name for the computed score. The
ML_SCORE
routine populates the variable. User variables are written
as
@.
Any valid name for a user-defined variable is permitted.
var_name
The following options in JSON format are
available for recommendation and anomaly detection models.
Set the following options as needed for recommendation models.
threshold: The optional threshold that
defines positive feedback, and a relevant sample. Only use
with ranking metrics. It can be used for either explicit
or implicit feedback.
topk: The optional top number of
recommendations to provide. The default is
3. Set a positive integer between 1 and
the number of rows in the table.
A recommendation task and ranking
metrics can use both threshold and
topk.
remove_seen: If the input table
overlaps with the training table, and
remove_seen is true,
then the model will not repeat existing interactions. The
default is true. Set
remove_seen to false
to repeat existing interactions from the training table.
item_metadata: Defines the table that
has item descriptions. It is a JSON object that has the
table_name option as a key, which
specifies the table that has item descriptions. One column
must be the same as the item_id in the
input table.
user_metadata: Defines the table that
has user descriptions. It is a JSON object that has the
table_name option as a key, which
specifies the table that has user descriptions. One column
must be the same as the user_id in the
input table.
table_name: To be used with the
item_metadata and
user_metadata options. It specifies
the table name that has item or user descriptions. It
must be a string in a fully qualified format
(schema_name.table_name) that specifies the table
name.
Set the following options as needed for anomaly detection models.
threshold: The threshold you set on
anomaly detection models determines which rows in the
output table are labeled as anomalies with an anomaly
score of 1, or normal with an anomaly
score of 0. The value for the threshold
is the degree to which a row of data or log segment is
considered for anomaly detection. Any sample with an
anomaly score above the threshold is classified an
anomaly. The default value is (1 -
contamination)-th percentile of all the
anomaly scores.
topk: The optional top K rows to
display with the highest anomaly scores. Set a positive
integer between 1 and the number of rows in the table. If
topk is not set,
ML_SCORE
uses threshold.
Do not set both threshold and
topk. Use threshold
or topk, or set
options to
NULL.
The following example runs generates a score by using the
balanced_accuracy metric. Query the
score with the session variable for the
ML_SCORE
routine.
mysql>CALL sys.ML_SCORE('census_data.census_train', 'revenue', 'census_data.census_train_admin_1745439945171', 'balanced_accuracy', @score, NULL);Query OK, 0 rows affected (3.0536 sec) mysql>SELECT @score;+--------------------+ | @score | +--------------------+ | 0.8151071071624756 | +--------------------+ 1 row in set (0.0411 sec)
The following example uses the accuracy
metric with a threshold set to 90%.
mysql>CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9));Query OK, 0 rows affected (1.86 sec) mysql>SELECT @score;+--------------------+ | @score | +--------------------+ | 0.9791129231452942 | +--------------------+ 1 row in set (0.00 sec)
The following example uses the
precision_at_k metric with a
topk value of 10.
mysql>CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10));Query OK, 0 rows affected (5.84 sec) mysql>SELECT @score;+---------------------+ | @score | +---------------------+ | 0 | +---------------------+ 1 row in set (0.00 sec)
The following example overrides the
ensemble_score value from the
ML_TRAIN
routine to a new value of 0.5.
mysql> CALL sys.ML_SCORE('mlcorpus.anomaly_train_with_target', "target", @semsup_gknn,
'precision_at_k', @semsup_score_gknn_weighted,
CAST('{"topk": 10, "experimental": {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON));