{ "cells": [ { "cell_type": "markdown", "id": "e2dba652", "metadata": {}, "source": [ "***\n", "# Building a Forecaster using AutoMLx\n", "
by the Oracle AutoMLx Team
\n", "\n", "***" ] }, { "cell_type": "markdown", "id": "72e85a5a", "metadata": {}, "source": [ "Forecasting Demo notebook.\n", "\n", "Copyright © 2025, Oracle and/or its affiliates.\n", "\n", "Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/" ] }, { "cell_type": "markdown", "id": "10766224", "metadata": {}, "source": [ "## Overview of this Notebook\n", "\n", "In this notebook we will build a forecaster using the Oracle AutoMLx tool for three real-world datasets. We explore the various options available in the Oracle AutoMLx Forecasting module, allowing the user to control the AutoML training process. We finally evaluate the forecasting algorithms using in-built visualization tools. Finally, we provide an overview of the capabilities that Oracle AutoMLx offers for explaining the predictions of the tuned model.\n", "\n", "---\n", "## Prerequisites\n", "\n", " - Experience level: Novice (Python and Machine Learning)\n", " - Professional experience: Some industry experience\n", "---\n", "\n", "## Business Use\n", "\n", "Forecasting uses historical time series data as input to make informed estimates of future trends. Learning an accurate forecasting model requires expertise in data science and statistics. This process typically comprises of:\n", "- Preprocess dataset (clean, impute, engineer features, normalize).\n", "- Pick an appropriate model for the given dataset and prediction task at hand.\n", "- Tune the chosen model’s hyperparameters for the given dataset.\n", "\n", "These steps are significantly time consuming and heavily rely on data scientist expertise. Unfortunately, to make this problem harder, the best feature subset, model, and hyperparameter choice widely varies with the dataset and the prediction task. Hence, there is no one-size-fits-all solution to achieve reasonably good model performance. Using a simple Python API, AutoML can quickly jump-start the datascience process with an accurately-tuned model and appropriate features for a given prediction task.\n", "\n", "## Table of Contents\n", "\n", "- Setup\n", "- Load the M4 Forecasting Competition dataset\n", "- Univariate time series (single-target forecasting)\n", " - Split data into train and test for the forecasting task\n", " - Setting the execution engine\n", " - Create an instance of Oracle AutoMLx\n", " - Train a forecasting model using AutoMLx\n", " - Generate and visualize forecasts\n", " - Analyze the AutoML optimization process\n", " - Algorithm Selection\n", " - Hyperparameter Tuning\n", "- Multivariate time series\n", " - Single-target Forecasting with Exogenous Variables\n", " - Multi-target Forecasting with Exogenous Variables\n", " - Advanced AutoML Configuration\n", " - Train a model using Oracle AutoMLx\n", " - Specify the number of cross-validation (CV) folds\n", " - Make predictions\n", " - Visualization\n", "- Machine Learning Explainability (MLX)\n", " - Initialize an MLExplainer\n", " - Prediction Explanations (Comparative Feature Importance)\n", "- References" ] }, { "cell_type": "markdown", "id": "4f8cd973", "metadata": {}, "source": [ "\n", "## Setup\n", "\n", "Basic setup for the Notebook." ] }, { "cell_type": "code", "execution_count": 1, "id": "0ee2deb5", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:13:42.269625Z", "iopub.status.busy": "2025-04-25T10:13:42.269212Z", "iopub.status.idle": "2025-04-25T10:13:42.802212Z", "shell.execute_reply": "2025-04-25T10:13:42.801679Z" } }, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "id": "bf19498b", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Load the required modules." ] }, { "cell_type": "code", "execution_count": 2, "id": "7f98170e", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:13:42.804699Z", "iopub.status.busy": "2025-04-25T10:13:42.804220Z", "iopub.status.idle": "2025-04-25T10:13:48.150979Z", "shell.execute_reply": "2025-04-25T10:13:48.150414Z" } }, "outputs": [], "source": [ "import datetime\n", "import time\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "try:\n", " from sktime.forecasting.model_selection import temporal_train_test_split\n", "except ImportError:\n", " try:\n", " from sktime.split import TemporalTrainTestSplitter as temporal_train_test_split\n", " except ImportError as e:\n", " raise ImportError(\"Failed to import Splitters from sktime. \"\n", " \"Please ensure you have the correct version of sktime installed.\") from e\n", "\n", "plt.rcParams[\"figure.figsize\"] = [15, 5]\n", "plt.rcParams[\"font.size\"] = 15\n", "\n", "import automlx\n", "from automlx import init" ] }, { "cell_type": "markdown", "id": "7909aaa6", "metadata": {}, "source": [ "\n", "### Load the M4 Forecasting Competition dataset\n", "\n", "\n", "We fetch the series from the repository of the [M4 forecasting competition](https://mofc.unic.ac.cy/m4/) to use throughout this demo." ] }, { "cell_type": "code", "execution_count": 3, "id": "e6b68f77", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:13:48.153400Z", "iopub.status.busy": "2025-04-25T10:13:48.152987Z", "iopub.status.idle": "2025-04-25T10:13:50.716740Z", "shell.execute_reply": "2025-04-25T10:13:50.716027Z" } }, "outputs": [], "source": [ "! wget https://github.com/Mcompetitions/M4-methods/raw/master/Dataset/Train/Weekly-train.csv -q\n", "! wget https://github.com/Mcompetitions/M4-methods/raw/master/Dataset/M4-info.csv -q" ] }, { "cell_type": "code", "execution_count": 4, "id": "90c3c3a0", "metadata": { "execution": { "iopub.execute_input": "2025-04-25T10:13:50.719199Z", "iopub.status.busy": "2025-04-25T10:13:50.718731Z", "iopub.status.idle": "2025-04-25T10:13:50.926656Z", "shell.execute_reply": "2025-04-25T10:13:50.926090Z" } }, "outputs": [], "source": [ "\n", "all_series = pd.read_csv(\"Weekly-train.csv\", index_col=0) # consists of thousands of series\n", "metadata_csv = pd.read_csv(\"M4-info.csv\", index_col=0) # describes their datetime index" ] }, { "cell_type": "markdown", "id": "2465a691", "metadata": {}, "source": [ "\n", "# Univariate time series (single-target forecasting)\n", "The Oracle AutoMLx solution for forecasting can process both univariate (where only a single time series is available) and multivariate time series (where multiple time series are available). We start by displaying an example of use for univariate time series, and will address multivariate data at the end of this notebook.\n", " | Finance_W142 | \n", "
---|---|
2016-03-06 12:00:00 | \n", "5015.306711 | \n", "
2016-03-13 12:00:00 | \n", "5013.432026 | \n", "
2016-03-20 12:00:00 | \n", "4998.150861 | \n", "
2016-03-27 12:00:00 | \n", "5013.397612 | \n", "
2016-04-03 12:00:00 | \n", "5028.193875 | \n", "
2016-04-10 12:00:00 | \n", "5052.535771 | \n", "
2016-04-17 12:00:00 | \n", "5066.283916 | \n", "
2016-04-24 12:00:00 | \n", "5079.054044 | \n", "
2016-05-01 12:00:00 | \n", "5102.910187 | \n", "
2016-05-08 12:00:00 | \n", "5088.244409 | \n", "
\n", " |
---|
(174, 2) | \n", "
None | \n", "
TimeSeriesCV(Shuffle=False, Seed=7) | \n", "
neg_sym_mean_abs_percent_error | \n", "
ETSForecaster | \n", "
{'error': 'add', 'trend': 'add', 'damped_trend': True, 'seasonal': 'add', 'seasonal_periods': 52, 'initialization_method': 'estimated', 'initial_level': None, 'initial_trend': None, 'initial_seasonal': None, 'bounds': None, 'dates': None, 'freq': None, 'missing': 'none', 'start_params': None, 'maxiter': 1500, 'disp': -1, 'return_params': False} | \n", "
25.2.1 | \n", "
3.9.21 (main, Dec 11 2024, 16:24:11) \\n[GCC 11.2.0] | \n", "
Step | \n", "# Samples | \n", "# Features | \n", "Algorithm | \n", "Hyperparameters | \n", "Score (neg_sym_mean_abs_percent_error) | \n", "All Metrics | \n", "Runtime (Seconds) | \n", "Memory Usage (GB) | \n", "Finished | \n", "
---|---|---|---|---|---|---|---|---|---|
Model Selection | \n", "{1: 109, 2: 122, 3: 135, 4: 148, 5: 161} | \n", "2 | \n", "ETSForecaster | \n", "{'error': 'add', 'trend': 'add', 'damped_trend': False, 'seasonal': 'add', 'seasonal_periods': 52, 'initialization_method': 'estimated', 'initial_level': None, 'initial_trend': None, 'initial_seasonal': None, 'bounds': None, 'dates': None, 'freq': None, 'missing': 'none', 'start_params': None, 'maxiter': 1500, 'disp': -1, 'return_params': False} | \n", "-0.0361 | \n", "{'neg_sym_mean_abs_percent_error': -0.03605508323686911} | \n", "5.5388 | \n", "0.2991 | \n", "Fri Apr 25 03:14:22 2025 | \n", "
Model Selection | \n", "{2: 122, 1: 109, 3: 135, 4: 148, 5: 161} | \n", "2 | \n", "STLwARIMAForecaster | \n", "{'seasonal_deg': 1, 'trend_deg': 1, 'low_pass_deg': 0, 'period': 52, 'arima_p': 2, 'arima_d': 1, 'arima_q': 2, 'arima_trend': 'n', 'concentrate_scale': True} | \n", "-0.0855 | \n", "{'neg_sym_mean_abs_percent_error': -0.08549289627140871} | \n", "1.6676 | \n", "0.3153 | \n", "Fri Apr 25 03:14:22 2025 | \n", "
Model Selection | \n", "{1: 109, 2: 122, 3: 135, 4: 148, 5: 161} | \n", "2 | \n", "XGBForecaster | \n", "{'differencing_order': 1, 'acf_local_maxima': '[52, 35, 17, 26, 22, 4, 48, 30, 13, 56, 39, 61, 44, 9, 65, 1]', 'use_X': False, 'n_estimators': 50, 'max_depth': 5} | \n", "-0.0951 | \n", "{'neg_sym_mean_abs_percent_error': -0.09511310353650583} | \n", "4.3645 | \n", "0.3241 | \n", "Fri Apr 25 03:14:24 2025 | \n", "
Model Selection | \n", "{2: 122, 5: 161, 1: 109, 3: 135, 4: 148} | \n", "2 | \n", "ExtraTreesForecaster | \n", "{'differencing_order': 1, 'acf_local_maxima': '[52, 35, 17, 26, 22, 4, 48, 30, 13, 56, 39, 61, 44, 9, 65, 1]', 'use_X': False, 'n_estimators': 40, 'min_samples_leaf': 0.030458} | \n", "-0.1252 | \n", "{'neg_sym_mean_abs_percent_error': -0.12522681848759207} | \n", "4.9313 | \n", "0.3217 | \n", "Fri Apr 25 03:14:23 2025 | \n", "
Model Selection | \n", "{2: 122, 3: 135, 1: 109, 4: 148, 5: 161} | \n", "2 | \n", "LGBMForecaster | \n", "{'differencing_order': 1, 'acf_local_maxima': '[52, 35, 17, 26, 22, 4, 48, 30, 13, 56, 39, 61, 44, 9, 65, 1]', 'use_X': False, 'max_depth': 4, 'n_estimators': 37} | \n", "-0.13 | \n", "{'neg_sym_mean_abs_percent_error': -0.13003654391153155} | \n", "5.0983 | \n", "0.3536 | \n", "Fri Apr 25 03:14:25 2025 | \n", "
Model Selection | \n", "{2: 122, 3: 135, 1: 109, 5: 161, 4: 148} | \n", "2 | \n", "ThetaForecaster | \n", "{'sp': 52, 'deseasonalize': False, 'initial_level': None} | \n", "-0.2091 | \n", "{'neg_sym_mean_abs_percent_error': -0.2090874120377979} | \n", "13.0709 | \n", "0.3049 | \n", "Fri Apr 25 03:14:21 2025 | \n", "
Model Selection | \n", "{1: 109, 2: 122, 4: 148, 3: 135, 5: 161} | \n", "2 | \n", "ExpSmoothForecaster | \n", "{'trend': 'add', 'damped_trend': True, 'seasonal': None, 'sp': 52, 'use_boxcox': False} | \n", "-0.2475 | \n", "{'neg_sym_mean_abs_percent_error': -0.2474953233635102} | \n", "10.3261 | \n", "0.2994 | \n", "Fri Apr 25 03:14:21 2025 | \n", "
Model Selection | \n", "{4: 148, 3: 135, 1: 109, 5: 161, 2: 122} | \n", "2 | \n", "NaiveForecaster | \n", "{'strategy': 'last', 'sp': 52, 'window_length': None} | \n", "-0.4494 | \n", "{'neg_sym_mean_abs_percent_error': -0.4493967043524097} | \n", "10.5111 | \n", "0.2871 | \n", "Fri Apr 25 03:14:21 2025 | \n", "
Model Selection | \n", "{1: 109, 2: 122, 4: 148, 3: 135, 5: 161} | \n", "2 | \n", "STLwESForecaster | \n", "{'seasonal_deg': 1, 'trend_deg': 1, 'low_pass_deg': 1, 'period': 52, 'es_trend': 'add', 'es_damped_trend': True, 'concentrate_scale': True} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "1.9590 | \n", "0.3116 | \n", "Fri Apr 25 03:14:22 2025 | \n", "
Model Selection | \n", "{1: 109, 3: 135, 2: 122, 4: 148, 5: 161} | \n", "2 | \n", "SARIMAXForecaster | \n", "{'sp': 52, 'p': 2, 'd': 1, 'q': 2, 'P': 1, 'D': 0, 'Q': 1, 'trend': 'n', 'use_X': False, 'enforce_stationarity': True, 'enforce_invertibility': True, 'method': 'lbfgs', 'disp': -1, 'concentrate_scale': False} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "19.8842 | \n", "0.3231 | \n", "Fri Apr 25 03:14:26 2025 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
Model Tuning | \n", "{2: 122, 1: 109, 3: 135, 5: 161, 4: 148} | \n", "2 | \n", "ETSForecaster | \n", "{'error': 'add', 'trend': 'add', 'damped_trend': False, 'seasonal': None, 'seasonal_periods': 1, 'initialization_method': 'estimated', 'initial_level': None, 'initial_trend': None, 'initial_seasonal': None, 'bounds': None, 'dates': None, 'freq': None, 'missing': 'none', 'start_params': None, 'maxiter': 1500, 'disp': -1, 'return_params': False} | \n", "-0.2919 | \n", "{'neg_sym_mean_abs_percent_error': -0.29186918028720626} | \n", "0.5513 | \n", "0.3295 | \n", "Fri Apr 25 03:14:31 2025 | \n", "
Model Tuning | \n", "None | \n", "0 | \n", "ETSForecaster | \n", "None | \n", "-inf | \n", "None | \n", "0.8307 | \n", "0.3423 | \n", "-1 | \n", "
Model Tuning | \n", "None | \n", "0 | \n", "STLwARIMAForecaster | \n", "None | \n", "-inf | \n", "None | \n", "0.3418 | \n", "0.3551 | \n", "-1 | \n", "
Model Tuning | \n", "{1: 109, 2: 122, 3: 135, 4: 148, 5: 161} | \n", "2 | \n", "ETSForecaster | \n", "{'error': 'add', 'trend': 'add', 'damped_trend': False, 'seasonal': 'add', 'seasonal_periods': 35, 'initialization_method': 'estimated', 'initial_level': None, 'initial_trend': None, 'initial_seasonal': None, 'bounds': None, 'dates': None, 'freq': None, 'missing': 'none', 'start_params': None, 'maxiter': 1500, 'disp': -1, 'return_params': False} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "7.1072 | \n", "0.3547 | \n", "Fri Apr 25 03:14:32 2025 | \n", "
Model Tuning | \n", "{2: 122, 1: 109, 4: 148, 5: 161, 3: 135} | \n", "2 | \n", "ETSForecaster | \n", "{'error': 'add', 'trend': 'mul', 'damped_trend': False, 'seasonal': 'add', 'seasonal_periods': 52, 'initialization_method': 'estimated', 'initial_level': None, 'initial_trend': None, 'initial_seasonal': None, 'bounds': None, 'dates': None, 'freq': None, 'missing': 'none', 'start_params': None, 'maxiter': 1500, 'disp': -1, 'return_params': False} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "6.7682 | \n", "0.3423 | \n", "Fri Apr 25 03:14:33 2025 | \n", "
Model Tuning | \n", "{2: 122, 1: 109, 3: 135, 4: 148, 5: 161} | \n", "2 | \n", "ETSForecaster | \n", "{'error': 'mul', 'trend': 'add', 'damped_trend': False, 'seasonal': 'add', 'seasonal_periods': 52, 'initialization_method': 'estimated', 'initial_level': None, 'initial_trend': None, 'initial_seasonal': None, 'bounds': None, 'dates': None, 'freq': None, 'missing': 'none', 'start_params': None, 'maxiter': 1500, 'disp': -1, 'return_params': False} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "9.8725 | \n", "0.3547 | \n", "Fri Apr 25 03:14:30 2025 | \n", "
Model Tuning | \n", "{1: 109, 2: 122, 3: 135, 4: 148, 5: 161} | \n", "2 | \n", "STLwARIMAForecaster | \n", "{'seasonal_deg': 0, 'trend_deg': 0, 'low_pass_deg': 0, 'period': 52, 'arima_p': 0, 'arima_d': 1, 'arima_q': 0, 'arima_trend': 'n', 'concentrate_scale': True} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "0.5099 | \n", "0.3397 | \n", "Fri Apr 25 03:14:37 2025 | \n", "
Model Tuning | \n", "{1: 109, 2: 122, 3: 135, 4: 148, 5: 161} | \n", "2 | \n", "STLwARIMAForecaster | \n", "{'seasonal_deg': 0, 'trend_deg': 1, 'low_pass_deg': 0, 'period': 52, 'arima_p': 2, 'arima_d': 1, 'arima_q': 2, 'arima_trend': 'n', 'concentrate_scale': True} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "1.1175 | \n", "0.3551 | \n", "Fri Apr 25 03:14:36 2025 | \n", "
Model Tuning | \n", "{1: 109, 2: 122, 3: 135, 4: 148, 5: 161} | \n", "2 | \n", "STLwARIMAForecaster | \n", "{'seasonal_deg': 1, 'trend_deg': 1, 'low_pass_deg': 0, 'period': 1, 'arima_p': 2, 'arima_d': 1, 'arima_q': 2, 'arima_trend': 'n', 'concentrate_scale': True} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "0.4096 | \n", "0.3551 | \n", "Fri Apr 25 03:14:36 2025 | \n", "
Model Tuning | \n", "{1: 109, 3: 135, 2: 122, 4: 148, 5: 161} | \n", "2 | \n", "STLwARIMAForecaster | \n", "{'seasonal_deg': 1, 'trend_deg': 1, 'low_pass_deg': 0, 'period': 52, 'arima_p': 5, 'arima_d': 1, 'arima_q': 2, 'arima_trend': 'n', 'concentrate_scale': True} | \n", "-inf | \n", "{'neg_sym_mean_abs_percent_error': -inf} | \n", "2.0701 | \n", "0.3547 | \n", "Fri Apr 25 03:14:36 2025 | \n", "
\n", " | W142 | \n", "W143 | \n", "W142_ci_lower | \n", "W142_ci_upper | \n", "
---|---|---|---|---|
90 | \n", "1525.364311 | \n", "4120.705204 | \n", "1516.139510 | \n", "1534.589111 | \n", "
91 | \n", "1532.862846 | \n", "4110.992426 | \n", "1519.246154 | \n", "1546.479538 | \n", "
92 | \n", "1522.772589 | \n", "4118.850533 | \n", "1506.048922 | \n", "1539.496256 | \n", "
93 | \n", "1520.196238 | \n", "4119.049832 | \n", "1501.057987 | \n", "1539.334490 | \n", "
94 | \n", "1520.707582 | \n", "4115.975299 | \n", "1499.581227 | \n", "1541.833938 | \n", "
95 | \n", "1514.154962 | \n", "4119.741508 | \n", "1491.327947 | \n", "1536.981976 | \n", "
96 | \n", "1514.379038 | \n", "4116.655394 | \n", "1490.057821 | \n", "1538.700255 | \n", "
97 | \n", "1509.617328 | \n", "4118.423102 | \n", "1483.957322 | \n", "1535.277333 | \n", "
98 | \n", "1513.422128 | \n", "4111.623366 | \n", "1486.544654 | \n", "1540.299602 | \n", "
99 | \n", "1505.548897 | \n", "4116.366154 | \n", "1477.551476 | \n", "1533.546318 | \n", "
\n", " | Feature | \n", "Attribution | \n", "
---|---|---|
0 | \n", "Yearly periodicity of Finance_W142 | \n", "67.238872 | \n", "
1 | \n", "Trend of Finance_W142 | \n", "17.761128 | \n", "