MySQL HeatWave User Guide
Run the ML_TRAIN
routine on a
training dataset to produce a trained machine learning model.
Review how to Prepare Data.
ML_TRAIN
supports training of
the following models:
Classification: Assign items to defined categories.
Regression: Generate a prediction based on the relationship between a dependent variable and one or more independent variables.
Forecasting: Use a timeseries dataset to generate forecasting predictions.
Anomaly Detection: Detect unusual patterns in in data.
Recommendation: Generate user and product recommendations.
Topic Modeling: Generate words and similar expressions that best characterize a set of documents (As of MySQL 9.0.1-u1).
The training dataset used with
ML_TRAIN
must reside in a table
on the DB System.
ML_TRAIN
stores machine
learning models in the MODEL_CATALOG
table.
See The Model
Catalog to learn more.
The time required to train a model can take a few minutes to a few hours depending on the following:
The number of rows and columns in the dataset. MySQL HeatWave AutoML supports tables up to 10 GB in size with a maximum of 100 million rows and or 1017 columns.
The specified ML_TRAIN
parameters.
The size of the MySQL HeatWave Cluster.
To learn more about ML_TRAIN
requirements and options, see
ML_TRAIN or
Machine Learning Use
Cases.
The quality and reliability of a trained model can be assessed
using the ML_SCORE
routine. For
more information, see
Score a Model.
ML_TRAIN
displays the following
message if a trained model has a low score: Model Has
a low training score, expect low quality model
explanations
.
Before training a model, it is good practice to define your own model handle instead of automatically generating one. This allows you to easily remember the model handle for future routines on the trained model instead of having to query it, or depending on the session variable that can no longer be used when the current connection terminates. See Define Model Handle to learn more.
To train a machine learning model:
Optionally, set the value of the session variable, which sets the model handle to this same value.
mysql> SET @variable
= 'model_handle
';
Replace @variable
and
model_handle
with your own
definitions. For example:
mysql> SET @census_model = 'census_test';
The model handle is set to census_test
.
Run the ML_TRAIN
routine.
mysql> CALL sys.ML_TRAIN('table_name
', 'target_column_name
', JSON_OBJECT('task', 'task_name
'), @variable
);
Replace table_name
,
target_column_name
,
task_name
, and
variable
with your own values.
The following example runs
ML_TRAIN
on the
census_data.census_train
training
dataset.
mysql> CALL sys.ML_TRAIN('census_data.census_train', 'revenue', JSON_OBJECT('task', 'classification'), @census_model);
Where:
census_data.census_train
is the
fully qualified name of the table that contains the
training dataset
(schema_name.table_name
).
revenue
is the name of the target
column, which contains ground truth values.
JSON_OBJECT('task',
'classification')
specifies the machine
learning task type.
@census_model
is the session
variable previously set that defines the model handle
to the name defined by the user:
census_test
. If you do not define
the model handle before training the model, the model
handle is automatically generated, and the session
variable only stores the model handle for the duration
of the connection. User variables are written as
@
.
Any valid name for a user-defined variable is
permitted. See
Work with
Model Handles to learn more.
var_name
When the training completes, query the model catalog for
the model handle and the name of the trained table to
confirm the model handle is correctly set. Replace
user1
with your own user name.
mysql> SELECT model_handle, train_table_name FROM ML_SCHEMA_user1
.MODEL_CATALOG;
+-----------------------------------------------------+---------------------------------+
| model_handle | train_table_name |
+-----------------------------------------------------+---------------------------------+
| census_test | census_data.census_train |
+-----------------------------------------------------+---------------------------------+
1 row in set (0.0450 sec)
When done working with a trained model, it is good practice to unload it. See Unload a Model.
For details on all training options and to view more examples for task-specific models, see ML_TRAIN.
Learn how to Load a Model.