Exponential Smoothing Method

9.20 Exponential Smoothing Method

The oml.esm function uses the Exponential Smoothing Method (ESM) algorithm to create a time series model.

Exponential Smoothing Methods have been widely used in forecasting for over half a century. It has applications at the strategic, tactical, and operation level. For example, at a strategic level, forecasting is used for projecting return on investment, growth and the effect of innovations. At a tactical level, forecasting is used for projecting costs, inventory requirements, and customer satisfaction. At an operational level, forecasting is used for setting targets and predicting quality and conformance with standards.

In its simplest form, Exponential Smoothing is a moving average method with a single parameter that models an exponentially decreasing effect of past levels on future values. With a variety of extensions, Exponential Smoothing covers a broader class of models than other well-known approaches, such as the Box-Jenkins auto-regressive integrated moving average (ARIMA) approach. Oracle Machine Learning implements Exponential Smoothing using a state-of-the-art state space method that incorporates a single source of error (SSOE) assumption that provides theoretical and performance advantages.

Settings for an ESM model

The following table lists settings for ESM models.

Table 9-18 ESM Model Settings

Setting Name	Setting Value	Description
`EXSM_MODEL`	It can take value in set {EXSM_SIMPLE, EXSM_SIMPLE_MULT, EXSM_HOLT, EXSM_HOLT_DMP, EXSM_MUL_TRND, EXSM_MULTRD_DMP, EXSM_SEAS_ADD, EXSM_SEAS_MUL, EXSM_HW, EXSM_HW_DMP, EXSM_HW_ADDSEA, EXSM_DHW_ADDSEA, EXSM_HWMT, EXSM_HWMT_DMP}	This setting specifies the model. `EXSM_SIMPLE:` Simple exponential smoothing model is applied. `EXSM_SIMPLE_MULT:` Simple exponential smoothing model with multiplicative error is applied. `EXSM_HOLT:` Holt linear exponential smoothing model is applied. `EXSM_HOLT_DMP:` Holt linear exponential smoothing model with damped trend is applied. `EXSM_MUL_TRND:` Exponential smoothing model with multiplicative trend is applied. `EXSM_MULTRD_DMP:` Exponential smoothing model with multiplicative damped trend is applied. `EXSM_SEAS_ADD:` Exponential smoothing with additive seasonality, but no trend, is applied. `EXSM_SEAS_MUL:`Exponential smoothing with multiplicative seasonality, but no trend, is applied. `EXSM_HW:` Holt-Winters triple exponential smoothing model, additive trend, multiplicative seasonality is applied. `EXSM_HW_DMP:` Holt-Winters multiplicative exponential smoothing model with damped trend, additive trend, multiplicative seasonality is applied. `EXSM_HW_ADDSEA:` Holt-Winters additive exponential smoothing model, additive trend, additive seasonality is applied. `EXSM_DHW_ADDSEA:` Holt-Winters additive exponential smoothing model with damped trend, additive trend, additive seasonality is applied. `EXSM_HWMT:` Holt-Winters multiplicative exponential smoothing model with multiplicative trend, multiplicative trend, multiplicative seasonality is applied. `EXSM_HWMT_DMP:` Holt-Winters multiplicative exponential smoothing model with damped multiplicative trend, multiplicative trend, multiplicative seasonality is applied. The default value is `EXSM_SIMPLE`.
`EXSM_SEASONALITY`	`positive integer > 1`	This setting specifies a positive integer value as the length of seasonal cycle. The value specified must be larger than `1`. For example, setting value `4` means that every group of four observations forms a seasonal cycle. This setting is only applicable and must be provided for models with seasonality, otherwise the model throws an error. When `EXSM_INTERVAL` is not set, this setting applies to the original input time series. When `EXSM_INTERVAL` is set, this setting applies to the accumulated time series.
`EXSM_INTERVAL`	It can take value in set {EXSM_INTERVAL_YEAR, EXSM_INTERVAL_QTR, EXSM_INTERVAL_MONTH,EXSM_INTERVAL_WEEK, EXSM_INTERVAL_DAY, EXSM_INTERVAL_HOUR, EXSM_INTERVAL_MIN,EXSM_INTERVAL_SEC}	This setting only applies and must be provided when the time column (`case_id` column) has datetime type. It specifies the spacing interval of the accumulated equally spaced time series. The model throws an error if the time column of input table is of datetime type and setting `EXSM_INTERVAL` is not provided. The model throws an error if the time column of input table is of oracle number type and setting `EXSM_INTERVAL` is provided.
`EXSM_ACCUMULATE`	It can take value in set {EXSM_ACCU_TOTAL, EXSM_ACCU_STD, EXSM_ACCU_MAX, EXSM_ACCU_MIN, EXSM_ACCU_AVG, EXSM_ACCU_MEDIAN, EXSM_ACCU_COUNT}	This setting only applies and must be provided when the time column has datetime type. It specifies how to generate the value of the accumulated time series from the input time series.
`EXSM_SETMISSING`	It can also specify an option taking value in set {EXSM_MISS_MIN, EXSM_MISS_MAX, EXSM_MISS_AVG, EXSM_MISS_MEDIAN, EXSM_MISS_LAST, EXSM_MISS_FIRST, EXSM_MISS_PREV, EXSM_MISS_NEXT, EXSM_MISS_AUTO}.	This setting specifies how to handle missing values, which may come from input data and/or the accumulation process of time series. You can specify either a number or an option. If a number is specified, all the missing values are set to that number. `EXSM_MISS_MIN:` Replaces missing value with minimum of the accumulated time series. `EXSM_MISS_MAX:` Replaces missing value with maximum of the accumulated time series. `EXSM_MISS_AVG:` Replaces missing value with average of the accumulated time series. `EXSM_MISS_MEDIAN:` Replaces missing value with median of the accumulated time series. `EXSM_MISS_LAST:` Replaces missing value with last non-missing value of the accumulated time series. `EXSM_MISS_FIRST:` Replaces missing value with first non-missing value of the accumulated time series. `EXSM_MISS_PREV:`Replaces missing value with the previous non-missing value of the accumulated time series. `EXSM_MISS_NEXT:`Replaces missing value with the next non-missing value of the accumulated time series. `EXSM_MISS_AUTO:`EXSM model treats the input data as an irregular (non-uniformly spaced) time series. If this setting is not provided, `EXSM_MISS_AUTO` is the default value. In such a case, the model treats the input time series as irregular time series, viewing missing values as gaps.
`EXSM_PREDICTION_STEP`	It must be set to a number between 1-30.	This setting specifies how many steps ahead the predictions are to be made. If it is not set, the default value is `1`: the model gives one-step-ahead prediction. A value greater than `30` results in an error.
`EXSM_CONFIDENCE_LEVEL`	It must be a number between 0 and 1, exclusive.	This setting specifies the desired confidence level for prediction. The lower and upper bounds of the specified confidence interval is reported. If this setting is not specified, the default confidence level is `95%`.
`EXSM_OPT_CRITERION`	It takes value in set {EXSM_OPT_CRIT_LIK, EXSM_OPT_CRIT_MSE, EXSM_OPT_CRIT_AMSE, EXSM_OPT_CRIT_SIG, EXSM_OPT_CRIT_MAE}.	This setting specifies the desired optimization criterion. The optimization criterion is useful as a diagnostic for comparing models' fit to the same data. `EXSM_OPT_CRIT_LIK:` Minus twice the log-likelihood of a model. `EXSM_OPT_CRIT_MSE:` Mean square error of a model. `EXSM_OPT_CRIT_AMSE:` Average mean square error over user-specified time window. `EXSM_OPT_CRIT_SIG:` Model's standard deviation of residuals. `EXSM_OPT_CRIT_MAE:` Mean absolute error of a model. The default value is `EXSM_OPT_CRIT_LIK.`
`EXSM_NMSE`	positive integer	This setting specifies the length of the window used in computing the error metric average mean square error (AMSE).

Example 9-20 Using the oml.esm Class

This example creates an ESM model and uses some of the methods of the oml.esm class.

import oml
import pandas as pd

df = pd.DataFrame({'EVENT': ['A', 'B', 'C', 'D'],
                   'START': ['2021-10-04 13:29:00', '2021-10-07 12:30:00',
                             '2021-10-15 04:20:00', '2021-10-18 15:45:03'],
                   'END':   ['2021-10-08 11:29:06', '2021-10-15 10:30:07',
                             '2021-10-29 05:50:15', '2021-10-22 15:40:03']})

df['START'] = pd.to_datetime(df['START'])
df['END'] = pd.to_datetime(df['END'])
df['DURATION'] = df['END'] - df['START']  
df['HOURS'] = df['DURATION'] / pd.Timedelta(hours=1)
df['MINUTES'] = df['DURATION'] / pd.Timedelta(minutes=1)
#For on-premises database follow the below command to connect to the database#
oml.connect("<username>","<password>", dsn="<dsn>")
dat = oml.create(df, table='DF')
train_x = dat[:, 1]
train_y = dat[:, 4]

setting = {'EXSM_INTERVAL':'EXSM_INTERVAL_DAY'}
esm_mod = oml.esm(**setting).fit(train_x, train_y, time_seq = 'START')

esm_mod 
train_x = dat[:, 4]
train_y = dat[:, 5]
esm_mod = oml.esm().fit(train_x, train_y, time_seq = 'HOURS')

esm_mod

Listing for This Example

Create pandas DataFrame with start and end dates for an event. Convert start and end date columns to datetime, and create new columns that contain timedelta between the start and end dates. Convert timedelta into total number of hours and convert timedelta into total number of minutes.

>>> import oml
>>> import pandas as pd

>>> df = pd.DataFrame({'EVENT': ['A', 'B', 'C', 'D'],
                   'START': ['2021-10-04 13:29:00', '2021-10-07 12:30:00',
                             '2021-10-15 04:20:00', '2021-10-18 15:45:03'],
                   'END':   ['2021-10-08 11:29:06', '2021-10-15 10:30:07',
                             '2021-10-29 05:50:15', '2021-10-22 15:40:03']})

>>> df['START'] = pd.to_datetime(df['START'])
>>> df['END'] = pd.to_datetime(df['END'])
>>> df['DURATION'] = df['END'] - df['START']  
>>> df['HOURS'] = df['DURATION'] / pd.Timedelta(hours=1)
>>> df['MINUTES'] = df['DURATION'] / pd.Timedelta(minutes=1)

>>> #For on-premises database follow the below command to connect to the database#
>>> oml.connect("<username>","<password>", dsn="<dsn>")
>>> dat = oml.create(df, table='DF')


Using Datetime type

>>> train_x = dat[:, 1]
>>> train_y = dat[:, 4]

>>> setting = {'EXSM_INTERVAL':'EXSM_INTERVAL_DAY'}
>>> esm_mod = oml.esm(**setting).fit(train_x, train_y, time_seq = 'START')

>>> esm_mod

Algorithm Name: Exponential Smoothing

Mining Function: TIME_SERIES

Target: HOURS

Settings:
                    setting name               setting value
0                      ALGO_NAME  ALGO_EXPONENTIAL_SMOOTHING
1                EXSM_ACCUMULATE             EXSM_ACCU_TOTAL
2          EXSM_CONFIDENCE_LEVEL                         .95
3                  EXSM_INTERVAL           EXSM_INTERVAL_DAY
4                      EXSM_NMSE                           3
5         EXSM_OPTIMIZATION_CRIT           EXSM_OPT_CRIT_LIK
6           EXSM_PREDICTION_STEP                           1
7                EXSM_SETMISSING              EXSM_MISS_AUTO
8                    ODMS_BOXCOX          ODMS_BOXCOX_ENABLE
9                   ODMS_DETAILS                 ODMS_ENABLE
10  ODMS_MISSING_VALUE_TREATMENT     ODMS_MISSING_VALUE_AUTO
11                 ODMS_SAMPLING       ODMS_SAMPLING_DISABLE
12                     PREP_AUTO                          ON

Computed Settings:
  setting name setting value
0   EXSM_MODEL   EXSM_SIMPLE

Global Statistics:
       attribute name attribute value
0   -2 LOG-LIKELIHOOD        -21.1618
1                 AIC         48.3236
2                AICC            None
3               ALPHA     0.000100034
4          ALPHA DISC          0.9999
5                AMSE         12175.3
6                 BIC         46.4825
7           CONVERGED             YES
8       INITIAL ALPHA     0.000100034
9       INITIAL LEVEL         179.353
10                MAE          84.403
11                MSE          9843.9
12           NUM_ROWS               4
13              SIGMA         140.313
14                STD         140.313

Attributes:

Partition: NO

Prediction:

    TIME_SEQ       VALUE  PREDICTION      LOWER       UPPER
0 2021-10-04   94.001667  179.352705        NaN         NaN
1 2021-10-07  190.001944  179.344167        NaN         NaN
2 2021-10-15  337.504167  179.345233        NaN         NaN
3 2021-10-18   95.916667  179.361069        NaN         NaN
4 2021-10-19         NaN  179.352712 -95.656158  454.361582


Using Float type

>>> train_x = dat[:, 4]
>>> train_y = dat[:, 5]
>>> esm_mod = oml.esm().fit(train_x, train_y, time_seq = 'HOURS')

>>> esm_mod

Algorithm Name: Exponential Smoothing

Mining Function: TIME_SERIES

Target: MINUTES

Settings:
                    setting name               setting value
0                      ALGO_NAME  ALGO_EXPONENTIAL_SMOOTHING
1          EXSM_CONFIDENCE_LEVEL                         .95
2                      EXSM_NMSE                           3
3         EXSM_OPTIMIZATION_CRIT           EXSM_OPT_CRIT_LIK
4           EXSM_PREDICTION_STEP                           1
5                EXSM_SETMISSING              EXSM_MISS_AUTO
6                    ODMS_BOXCOX          ODMS_BOXCOX_ENABLE
7                   ODMS_DETAILS                 ODMS_ENABLE
8   ODMS_MISSING_VALUE_TREATMENT     ODMS_MISSING_VALUE_AUTO
9                  ODMS_SAMPLING       ODMS_SAMPLING_DISABLE
10                     PREP_AUTO                          ON

Computed Settings:
  setting name setting value
0   EXSM_MODEL     EXSM_HOLT

Global Statistics:
       attribute name attribute value
0   -2 LOG-LIKELIHOOD         4.47424
1                 AIC         1.05153
2                AICC            None
3               ALPHA     0.000104161
4                AMSE       0.0190133
5                BETA     0.000104153
6                 BIC          -2.017
7           CONVERGED             YES
8       INITIAL LEVEL         8.00977
9       INITIAL TREND        0.452033
10             LAMBDA     4.08563e-05
11                MAE         1175.53
12                MSE       0.0266914
13           NUM_ROWS               4
14              SIGMA        0.188649
15                STD        0.188649

Attributes:

Partition: NO

Prediction:

   TIME_SEQ         VALUE    PREDICTION        LOWER         UPPER
0        94   5640.100000   4807.666451          NaN           NaN
1        95   5755.000000   7554.329741          NaN           NaN
2       190  11400.116667  11869.239245          NaN           NaN
3       337  20250.250000  18649.004898          NaN           NaN
4       338           NaN  29301.840039  19894.31833  41663.104953

Parent topic: OML4Py Classes That Provide Access to In-Database Machine Learning Algorithms