MySQL HeatWave User Guide
After preparing the data for topic modeling, you can train the model.
Review and complete all the tasks to Prepare Data for Topic Modeling.
Define the following required parameters for topic modeling.
Set the task
parameter to
topic_modeling
.
document_column
: Define the column
that contains the text that the model uses to generate
topics and tags as output. The output is an array of
word groups that best characterize the text.
When MySQL HeatWave AutoML runs topic modeling, the operation is based on a single algorithm that does not require the tuning of hyperparameters. Moreover, topic modeling is an unsupervised task, which means there are no labels. Therefore, you cannot use the following options for topic modeling:
model_list
optimization_metric
exclude_model_list
exclude_column_list
include_column_list
You cannot run the following routines for topic modeling:
Train the model with the
ML_TRAIN
routine and use the
movies
table previously created. Before
training the model, it is good practice to define the model
handle instead of automatically creating one. See
Define Model Handle.
Optionally, set the value of the session variable, which sets the model handle to this same value.
mysql> SET @variable
= 'model_handle
';
Replace @variable
and
model_handle
with your own
definitions. For example:
mysql> SET @model='topic_modeling_use_case';
The model handle is set to
topic_modeling_use_case
.
Run the ML_TRAIN
routine.
mysql> CALL sys.ML_TRAIN('table_name
', 'target_column_name
', JSON_OBJECT('task', 'task_name
'), model_handle
);
Replace table_name
,
target_column_name
,
task_name
, and
model_handle
with your own
values.
The following example runs
ML_TRAIN
on the dataset
previously created.
mysql> CALL sys.ML_TRAIN('topic_modeling_data.movies', NULL, JSON_OBJECT('task', 'topic_modeling', 'document_column', 'description'), @model);
Where:
topic_modeling_data.movies
is the
fully qualified name of the table that contains the
training dataset
(database_name.table_name
).
NULL
is set for the target column
because topic modeling uses unlabeled data, so you
cannot set a target column.
JSON_OBJECT('task',
'topic_modeling')
specifies the machine
learning task type.
@model
is the session variable
previously set that defines the model handle to the
name defined by the user:
topic_modeling_use_case
. If you
do not define the model handle before training the
model, the model handle is automatically generated,
and the session variable only stores the model
handle for the duration of the connection. User
variables are written as
@
.
Any valid name for a user-defined variable is
permitted. See
Work with
Model Handles to learn more.
var_name
When the training operation finishes, the model handle
is assigned to the @model
session
variable, and the model is stored in the model catalog.
View the entry in the model catalog with the following
query. Replace user1
with
your MySQL account name.
mysql> SELECT model_id, model_handle, train_table_name FROM ML_SCHEMA_user1
.MODEL_CATALOG WHERE model_handle = 'topic_modeling_use_case';
+----------+-------------------------+----------------------------+
| model_id | model_handle | train_table_name |
+----------+-------------------------+----------------------------+
| 8 | topic_modeling_use_case | topic_modeling_data.movies |
+----------+-------------------------+----------------------------+
37 rows in set (0.0449 sec)
Learn how to Generate Predictions for Topic Modeling.