MySQL HeatWave User Guide
ML_SCORE
scores a model by
generating predictions using the feature columns in a labeled
dataset as input and comparing the predictions to ground truth
values in the target column of the labeled dataset. The dataset
used with ML_SCORE
should have
the same feature columns as the dataset used to train the model
but the data should be different. For example, you might reserve
20 to 30 percent of the labeled training data for scoring.
ML_SCORE
returns a computed
metric indicating the quality of the model.
ML_SCORE
does not support the
topic modeling model type.
mysql>CALL sys.ML_SCORE(
table_name
,target_column_name
,model_handle
,metric
,score
, [options
]);options
: { JSON_OBJECT("key
","value
"[,"key
","value
"] ...)"key","value"
: { ['threshold', 'N
'] ['topk', 'N
'] ['remove_seen', {'true'|'false'}] } }
Set the following required parameters.
table_name
: Specifies the fully
qualified name of the table used to compute model quality
(database_name.table_name
). The
table must contain the same columns as the training
dataset.
target_column_name
: If scoring a
supervised or semi-supervised model, specify the name of
the target column containing ground truth values. If
scoring an unsupervised model, set to
NULL
. See
MySQL HeatWave AutoML
Learning Types.
model_handle
: Specifies the model
handle or a session variable containing the model handle.
See Work with
Model Handles.
metric
: Specifies the name of the
metric. The metric selected must be compatible with the
task
type used for training the model.
See Optimization and
Scoring Metrics.
score
: Specifies the user-defined
variable name for the computed score. The
ML_SCORE
routine populates
the variable. User variables are written as
@
.
Any valid name for a user-defined variable is permitted.
var_name
The following options in JSON
format are
available for recommendation and anomaly detection models.
Set the following options as needed for recommendation models.
threshold
: The optional threshold that
defines positive feedback, and a relevant sample. Only use
with ranking metrics. It can be used for either explicit
or implicit feedback.
topk
: The optional top number of
recommendations to provide. The default is
3
. Set a positive integer between 1 and
the number of rows in the table.
A recommendation
task and ranking
metrics can use both threshold
and
topk
.
remove_seen
: If the input table
overlaps with the training table, and
remove_seen
is true
,
then the model will not repeat existing interactions. The
default is true
. Set
remove_seen
to false
to repeat existing interactions from the training table.
Set the following options as needed for anomaly detection models.
threshold
: The threshold you set on
anomaly detection models determines which rows in the
output table are labeled as anomalies with an anomaly
score of 1
, or normal with an anomaly
score of 0
. The value for the threshold
is the degree to which a row of data or log segment is
considered for anomaly detection. Any sample with an
anomaly score above the threshold is classified an
anomaly. The default value is (1 -
contamination
)-th percentile of all the
anomaly scores.
topk
: The optional top K rows to
display with the highest anomaly scores. Set a positive
integer between 1 and the number of rows in the table. If
topk
is not set,
ML_SCORE
uses
threshold
.
Do not set both threshold
and
topk
. Use threshold
or topk
, or set
options
to
NULL
.
The following example runs generates a score by using the
balanced_accuracy
metric. Query the
score with the session variable for the
ML_SCORE
routine.
mysql>CALL sys.ML_SCORE('census_data.census_train', 'revenue', 'census_data.census_train_admin_1745439945171', 'balanced_accuracy', @score, NULL);
Query OK, 0 rows affected (3.0536 sec) mysql>SELECT @score;
+--------------------+ | @score | +--------------------+ | 0.8151071071624756 | +--------------------+ 1 row in set (0.0411 sec)
The following example uses the accuracy
metric with a threshold
set to 90%.
mysql>CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9));
Query OK, 0 rows affected (1.86 sec) mysql>SELECT @score;
+--------------------+ | @score | +--------------------+ | 0.9791129231452942 | +--------------------+ 1 row in set (0.00 sec)
The following example uses the
precision_at_k
metric with a
topk
value of 10.
mysql>CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10));
Query OK, 0 rows affected (5.84 sec) mysql>SELECT @score;
+---------------------+ | @score | +---------------------+ | 0.30000001192092896 | +---------------------+ 1 row in set (0.0443 sec)
The following example overrides the
ensemble_score
value from the
ML_TRAIN
routine to a new
value of 0.5.
mysql>CALL sys.ML_SCORE('mlcorpus.anomaly_train_with_target', "target", @semsup_gknn, 'precision_at_k', @semsup_score_gknn_weighted, CAST('{"topk": 10, "experimental": {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON)); Query OK, 0 rows affected (1.7685 sec)
mysql>SELECT @score;
+---------------------+ | @score | +---------------------+ | 0.98844456000887996 | +---------------------+ 1 row in set (0.0443 sec)
See also: