MySQL AI User Guide
ML_SCORE
scores a model by generating predictions using the feature
columns in a labeled dataset as input and comparing the
predictions to ground truth values in the target column of the
labeled dataset. The dataset used with
ML_SCORE
should have the same feature columns as the dataset used to
train the model but the data should be different. For example,
you might reserve 20 to 30 percent of the labeled training data
for scoring.
ML_SCORE
returns a computed metric indicating the quality of the model.
mysql>CALL sys.ML_SCORE(
table_name
,target_column_name
,model_handle
,metric
,score
, [options
]);options
: { JSON_OBJECT("key
","value
"[,"key
","value
"] ...)"key","value"
: { ['threshold', 'N
'] ['topk', 'N
'] ['remove_seen', {'true'|'false'}] } }
Set the following required parameters.
table_name
: Specifies the fully
qualified name of the table used to compute model quality
(database_name.table_name
). The
table must contain the same columns as the training
dataset.
target_column_name
: If scoring a
supervised or semi-supervised model, specify the name of
the target column containing ground truth values. If
scoring an unsupervised model, set to
NULL
. See
MySQL HeatWave AutoML
Learning Types.
model_handle
: Specifies the model
handle or a session variable containing the model handle.
See Work with
Model Handles.
metric
: Specifies the name of the
metric. The metric selected must be compatible with the
task
type used for training the model.
See Optimization
and Scoring Metrics.
score
: Specifies the user-defined
variable name for the computed score. The
ML_SCORE
routine populates the variable. User variables are written
as
@
.
Any valid name for a user-defined variable is permitted.
var_name
The following options in JSON
format are
available for recommendation and anomaly detection models.
Set the following options as needed for recommendation models.
threshold
: The optional threshold that
defines positive feedback, and a relevant sample. Only use
with ranking metrics. It can be used for either explicit
or implicit feedback.
topk
: The optional top number of
recommendations to provide. The default is
3
. Set a positive integer between 1 and
the number of rows in the table.
A recommendation
task and ranking
metrics can use both threshold
and
topk
.
remove_seen
: If the input table
overlaps with the training table, and
remove_seen
is true
,
then the model will not repeat existing interactions. The
default is true
. Set
remove_seen
to false
to repeat existing interactions from the training table.
Set the following options as needed for anomaly detection models.
threshold
: The threshold you set on
anomaly detection models determines which rows in the
output table are labeled as anomalies with an anomaly
score of 1
, or normal with an anomaly
score of 0
. The value for the threshold
is the degree to which a row of data or log segment is
considered for anomaly detection. Any sample with an
anomaly score above the threshold is classified an
anomaly. The default value is (1 -
contamination
)-th percentile of all the
anomaly scores.
topk
: The optional top K rows to
display with the highest anomaly scores. Set a positive
integer between 1 and the number of rows in the table. If
topk
is not set,
ML_SCORE
uses threshold
.
Do not set both threshold
and
topk
. Use threshold
or topk
, or set
options
to
NULL
.
The following example runs generates a score by using the
balanced_accuracy
metric. Query the
score with the session variable for the
ML_SCORE
routine.
mysql>CALL sys.ML_SCORE('census_data.census_train', 'revenue', 'census_data.census_train_admin_1745439945171', 'balanced_accuracy', @score, NULL);
Query OK, 0 rows affected (3.0536 sec) mysql>SELECT @score;
+--------------------+ | @score | +--------------------+ | 0.8151071071624756 | +--------------------+ 1 row in set (0.0411 sec)
The following example uses the accuracy
metric with a threshold
set to 90%.
mysql>CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9));
Query OK, 0 rows affected (1.86 sec) mysql>SELECT @score;
+--------------------+ | @score | +--------------------+ | 0.9791129231452942 | +--------------------+ 1 row in set (0.00 sec)
The following example uses the
precision_at_k
metric with a
topk
value of 10.
mysql>CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10));
Query OK, 0 rows affected (5.84 sec) mysql>SELECT @score;
+---------------------+ | @score | +---------------------+ | 0 | +---------------------+ 1 row in set (0.00 sec)
The following example overrides the
ensemble_score
value from the
ML_TRAIN
routine to a new value of 0.5.
mysql> CALL sys.ML_SCORE('mlcorpus.anomaly_train_with_target', "target", @semsup_gknn, 'precision_at_k', @semsup_score_gknn_weighted,
CAST('{"topk": 10, "experimental": {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON));
See also: