MySQL HeatWave User Guide

6.7.3.4 Train a Forecasting Model

After preparing the data for a forecasting model, you can train the model.

This topic has the following sections.

Before You Begin
Requirements for Forecasting Training

Define the following required parameters to train a forecasting model.

  • Set the task parameter to forecasting.

  • datetime_index: Define the column that has date and time data. The model uses this column as an index for the forecast variable. The following data types for this column are supported: DATETIME, TIMESTAMP, DATE, TIME, and YEAR, or an auto-incrementing index.

    The forecast models SARIMAXForecaster, VARMAXForecaster, and DynFactorForecaster cannot back test, that is forecast into training data, when using exogenous_variables. Therefore, the predict table must not overlap the datetime_index with the training table. The start date in the predict table must be a date immediately following the last date in the training table when exogenous_variables are used. For example, the predict table has to start with year 2024 if the training table with YEAR data type datetime_index ends with year 2023. The predict table cannot start with year, for example, 2025 or 2030, because that would miss out 1 and 6 years, respectively.

    When options do not include exogenous_variables , the predict table can overlap the datetime_index with the training table. This supports back testing, with the exception of the following models: SARIMAXForecaster, VARMAXForecaster, and DynFactorForecaster.

    The valid range of years for datetime_index dates must be between 1678 and 2261. An error generates if any part of the training table or predict table has dates outside this range. The last date in the training table plus the predict table length must still be inside the valid year range. For example, if the datetime_index in the training table has YEAR data type, and the last date is year 2023, the predict table length must be less than 238 rows: 2261 minus 2023 equals 238 rows.

  • endogenous_variables: Define the column or columns to be forecast. One of these columns must also be specified as the target_column_name.

Forecasting Options

Based on the type of forecasting model you train, set the appropriate JSON options:

  • exogenous_variables: Define the column or columns that have independent, non-forecast, predictive variables. These optional variables are not forecast, but help to predict the future values of the forecast variables. These variables affect a model without being affected by it. For example, for sales forecasting these variables might be advertising expenditure, occurrence of promotional events, weather, or holidays. Review Forecasting Models to see which models support exogenous variables.

  • model_list: Set the type of forecasting model algorithm. See Forecasting Models to review supported algorithms.

  • include_column_list: Define the columns of exogenous_variables that must be included for training and should not be dropped.

Unsupported Routines

You cannot run the following routines for a trained forecasting model:

Train Model

After following the steps to Prepare Data for Forecasting Model, train the model with the ML_TRAIN routine and use the electricity_demand_training table previously created. Before training the model, it is good practice to define the model handle instead of automatically creating one. See Define Model Handle.

  1. Optionally, set the value of the session variable, which sets the model handle to this same value.

    mysql> SET @variable = 'model_handle';
    

    Replace @variable and model_handle with your own definitions. For example:

    mysql> SET @model='forecasting_use_case';
    

    The model handle is set to forecasting_use_case.

  2. Run the ML_TRAIN routine.

    mysql> CALL sys.ML_TRAIN('table_name', 'target_column_name', JSON_OBJECT('task', 'task_name'), model_handle);
    

    Replace table_name, target_column_name, task_name, and model_handle with your own values.

    The following example runs ML_TRAIN on the training dataset previously created.

    mysql> CALL sys.ML_TRAIN('forecasting_data.electricity_demand_train', 'demand', 
           JSON_OBJECT('task', 'forecasting', 'datetime_index', 'date', 'endogenous_variables', JSON_ARRAY('demand')), @model);
    

    Where:

    • forecasting_data.electricity_demand_train is the fully qualified name of the table that contains the training dataset (database_name.table_name).

    • demand is the name of the target column, which contains ground truth values.

    • The JSON_OBJECT defines the following:

      • 'task', 'forecasting' specifies the machine learning task type.

      • 'datetime_index', 'date' defines the date column as the one with data and time data.

      • 'endogenous_variables', JSON_ARRAY('demand') defines the endogenous variables in a JSON_ARRAY. Since it is a univariate model, the only endogenous variable is demand.

    • @model is the session variable previously set that defines the model handle to the name defined by the user: forecasting_use_case. If you do not define the model handle before training the model, the model handle is automatically generated, and the session variable only stores the model handle for the duration of the connection. User variables are written as @var_name. Any valid name for a user-defined variable is permitted. See Work with Model Handles to learn more.

  3. When the training operation finishes, the model handle is assigned to the @model session variable, and the model is stored in the model catalog. View the entry in the model catalog with the following query. Replace user1 with your MySQL account name.

    mysql> SELECT model_id, model_handle, train_table_name FROM ML_SCHEMA_user1.MODEL_CATALOG WHERE model_handle = 'forecasting_use_case';
    +----------+----------------------------------------------+--------------------------------------------+
    | model_id | model_handle                                 | train_table_name                           |
    +----------+----------------------------------------------+--------------------------------------------+
    |        3 | forecasting_use_case                         | forecasting_data.electricity_demand_train  |
    +----------+----------------------------------------------+--------------------------------------------+
    
What's Next