Theoretical Engine Models

This chapter contains reference information for the theoretical models that the Analytical Engine uses.

This chapter covers the following topics:

Introduction

Note: Oracle provides two different modes for the Analytical Engine:

For each model, this chapter indicates which engine modes that model can be used with.

Flags on Causal Factors

You use the Business Modeler to apply the following flags to the causal factors; see "Configuring Global and Local Causal Factors" and "Configuring Promotional Causal Factors":

Flag* Meaning
short For use by the short models (BWINT, IREGR, LOGREGR, LOGISTIC, and REGR). These models use all causal factors that they are given.
long For use by the long models (ARLOGISTIC, CMREGR, ELOG, ICMREGR, and MRIDGE). These models examine all the causal factors they are given, but choose the ones that give the best results.
non-seasonal For use by the non -seasonal models (ARIX and ARX). The only causal factors that should be flagged as non-seasonal are ones that are not a predictable function of time. For example, price varies with time, but randomly, so price should be flagged as non-seasonal.
multiplicative group 1 For use only by the DMULT model. If you are using this model, each causal factor should use one of these flags.
See "DMULT".
multiplicative group 2
*Name of flag as displayed in the Causal Factors screen or in the Promotional Causal Factors screen.

Models not listed here use other mechanisms to choose their causal factors or do not use causal factors at all.

ARIX

ARIX includes integrated auto-regression terms at lag 1 and an unknown seasonal lag k, and linear causal factors.

The value of k is chosen from set of possible seasonal indexes to produce the best fit. Causal factors include the constant and events (without seasonal causal factors and without time).

Availability

ARIX can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes*
DP mode Yes
*The ARIX model is never used on promotional nodes. See "Summary of the Forecasting Process".

Causal Factors Used by This Model

ARIX uses the non-seasonal causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
Possible Season* For daily data: 2, 3, 4, 5, 6, 7, 14, 30, 31, 90, 91, 92, 182, 365
For weekly data: 2, 4, 5, 13, 14, 26, 52
For monthly data: 3, 6, 12.
A vector of possible seasonal patterns of the series.
The parameter is of type vector (other parameters are defined as double in PARAM_TYPE column), with an increasing index (PARAM_INDEX) for each new PARAM_VALUE.
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

The ARIX parameters also apply to the ARX model.

ARLOGISTIC

ARLOGISTIC is an extension of the LOGISTIC model and includes auto-regression and logistic regression terms.

Availability

ARLOGISTIC can be used with the following engine modes:

Engine Mode Supported?
PE mode No (disable model if using this mode)
DP mode Yes

Causal Factors Used by This Model

ARLOGISTIC uses the long causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

ARLOGISTIC uses the same parameters as LOGISTIC; see "Parameters Used by This Model".

ARX

This model includes auto-regression terms at lag 1 and an unknown seasonal lag k, and linear causal factors. The value of k is chosen from set of possible seasonal indexes to produce the best fit. Causal factors include the constant and events (without seasonal causal factors and without time).

Availability

ARX can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes*
DP mode Yes
*The ARX model is never used on promotional nodes. See "Summary of the Forecasting Process".

Causal Factors Used by This Model

ARX uses the non-seasonal causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

ARX uses the same parameters as ARIX; see "ARIX".

BWINT

BWINT (the Multiplicative Regression-Winters model) runs multiplicative regression on the causal factors, then exponentially smooths the resulting residuals in HOLT manner and then runs multiple regression of the smoothed residuals. BWINT models trend, seasonality and causality.

Availability

BWINT can be used with the following engine modes:

Engine Mode Supported?
PE mode No (disable model if using this mode)
DP mode Yes

Causal Factors Used by This Model

BWINT uses the short causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
Alpha* 0.1 The manually set level renovation coefficient, valid only when OptimizedBwint* = 0.
Gamma* 0.3 The manually set trend renovation coefficient, valid only when OptimizedBwint* = 0.
OptimizedAlphaIter* 3 The number of values on the Alpha grid for parameters optimization.
OptimizedBwint* 0 Specifies whether the parameter values (Alpha & Gamma) of the Holt procedure used here are to be optimized (1) or preset (0).
OptimizedGammaIter* 10 The number of values on the Gamma grid for parameters optimization.
Phi* 0.9 The trend damping coefficient, always set manually.
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

CMREGR

CMREGR (the Markov Chain Monte-Carlo model) fits to data an assortment of linear functions of the form: Series= Causals*Coeff + Resid.

Where:

The first set of causal factors consists of a collection of factors along with the lagged time series. Then, for a given length, a chain of that length is generated, and that path of that Markov chain is traveled. The states of the chain are subsets of factors, the transition probabilities for neighboring states are based on the ratio of BICs (Bayesian Information Criteria) and are zero for non- neighboring states. Neighboring states are states that differ only by one member. At each pass, a new factor is chosen randomly. If the current model does not contain this factor, it joins, with the calculated transition probability, the group to form the next model. Thus, the greater the improvement in the model (as measured by BIC), the more probability has the model to be employed. If the current model already contains this factor, then, with the calculated transition probability, it leaves the group.

Also, a special causal factor Lag is used; this is merely the original series lagged back by one time period. When the procedure finds this causal factor useful for modeling, the meaning is that there is a significant autoregressive component in the data, which indicates the presence of random trends. If the influence of Lag is dominant over other factors, which is indicated by a large Lag coefficient, the fit will inhere the lagging effect and when plotted on the same graph as the original series, will seem to "echo" previous observations. This means that the model was unable to pick up any systematic behavior in the series, and the best it can do is to highly correlate fitted values with lagged data.

Availability

CMREGR can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes
DP mode Yes

Causal Factors Used by This Model

CMREGR uses the long causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
Reset_Seed* 1 Specifies whether to reset the seed for random numbers generation at each run or simulation. If the seed is not reset, there will be different results for each run; also the simulation results will differ from batch results. 1= reset_seed; 0 = do not reset seed.
Theoretically the model assumes that the seed is not reset.
ChainLength* 500 Number of models considered for averaging.
Need_Lag*   Specifies whether to use the Lag as a causal factor. Lag - the previous actual observation explains the next one.
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
UseEnvelope no Specifies whether Demantra will use the envelope function described in "Causal Factor Testing (Envelope Function)".
ENVELOPE_RESET_SEED* 0 Specifies whether to reset the randomization seed for the envelope function, which evaluates different sets of causal factors for different engine models.
ENVELOPE_CHAIN_LENGTH* 50 Specifies the number of variations of causal factors to try, for each model.
BestOrMix* 0 Specifies whether to use the best set of causal factors (0) or to use a mix of the causal factors (1).
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

DMULT

DMULT, the Multiplicative Multi-Seasonal Regression model, divides causal factors into two groups and combines them in a multiplicative linear function of the following form:

(sum of values in causal factor group 1) * (sum of values in causal factor group 2)

This function can be used, for example, to combine daily and monthly seasonality.

Availability

DMULT can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes
DP mode Yes

Causal Factors Used by This Model

When you define causal factors and promotional causal factors, the Causal Factors screen and the Promotional Causal Factors screen enable you to place each factor into multiplicative group 1 or multiplicative group 2.

These options correspond to the DAILY_VAL (multiplicative group 1) and MONTHLY_VAL (multiplicative group 2) columns in the causal_factors and the promotional_causal_factors tables.

Typically, one group contains daily causal factors such as the days of the week D1,D2,..,D7. The other group contains the remaining causal factors. Each group should include at least one causal factor, and each causal factor should be in only one group.

Parameters Used by This Model

Parameter Default Description
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
MAX_ITERATIONS* 3 Specifies the maximum number of iterations used by this model. This parameter must be a whole number greater than or equal to 3. If it less than 3, the Analytical Engine uses the value 3.
SET2_COEFF_INI* 0 Specifies the initial values for the coefficients in multiplicative group 2.
The default is 0, which means that the initial values for these is zero, except for the coefficient for the constant causal factor.
* This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

ELOG

ELOG (the Logarithmic CMREGR model) performs the CMREGR procedure on the log-transformed time series.

As with the CMREGR model, this model uses a special causal factor (Lag); see "CMREGR".

Availability

ELOG can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes
DP mode Yes

Causal Factors Used by This Model

ELOG uses the long causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
ChainLength* 500 Length of the generated Markov Chain, that is number of models considered for averaging.
need_lag*   Specifies whether to use the Lag as a causal factor. Lag - the previous actual observation explains the next one.
reset_seed* 1 Specifies whether to reset the seed for random numbers generation at each run or simulation. If the seed is not reset, there will be different results for each run; also the simulation results will differ from batch results.
1= reset_seed; 0 = do not reset seed.
Theoretically the model assumes that the seed is not reset.
LogCorrection 1 Specifies whether to use (1) or not (0) the correct form of the expectation of a lognormal variable.
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
UseEnvelope no Specifies whether Demantra will use the envelope function described in "Causal Factor Testing (Envelope Function)".
ENVELOPE_RESET_SEED* 0 Specifies whether to reset the randomization seed for the envelope function, which evaluates different sets of causal factors for different engine models.
ENVELOPE_CHAIN_LENGTH* 50 Specifies the number of variations of causal factors to try, for each model.
BestOrMix* 0 Specifies whether to use the best set of causal factors (0) or to use a mix of the causal factors (1).
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

FCROST

FCROST (the Croston Model for Intermittent Demand) is useful for intermittent demand, which can be viewed as the demand by a distributor that supplies the product to end customers. What is visible to the demand planner is the bulk demand by the distributor, while the periodic demand of retailers is unknown. Thus, the quantities most probably reflect replenishment orders, rather than demand. Visually the data consists of peaks of random height with random intervals between the peaks.

This model is useful for data involving substantial number of zeros, and is particularly relevant for forecasting demand of slow moving parts. The model utilizes the Holt procedure for forecasting both quantities and inter-event times.

Availability

FCROST can be used with the following engine modes:

Engine Mode Supported?
PE mode No (disable model if using this mode)
DP mode Yes

Causal Factors Used by This Model

None.

Parameters Used by This Model

Parameter Default Description
AlphaQ* 0.1 Level innovation coefficient for quantities, manually set.
AlphaT * 0.1 Level innovation coefficient for inter-event times, always manually set.
GammaQ* 0.3 Trend innovation coefficient for quantities, manually set.
GammaT* 0.3 Trend innovation coefficient for inter-event times, always manually set.
OptimizedAlphaIter* 3 The number of values on the Alpha grid for parameter optimization.
OptimizedFcrost* 0 For forecasting the inter-event times only. Parameter specifies whether the parameter values (AlphaQ & GammaQ) of the quantities-forecasting Holt procedure used here are to be optimized (1) or preset (0).
For forecasting the inter-event times only,
OptimizedGammaIter* 10 The number of values on the Gamma grid for parameter optimization.
Phi* 0.9 Trend damping coefficient for inter-event times, always manually set.
PhiQ* 0.9 Trend damping coefficient for quantities, always manually set.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

HOLT

HOLT (the Double Exponential Smoothing model) provides realization for the Holt damped two-parameter exponential smoothing algorithm. The forecast is a projection of the current level estimate shifted by damped trend estimate. The level estimates are computed recursively from data as weighted averages of the current series value and the value of the previous one-step-ahead forecast. The trend (change of level) estimates are computed as weighted averages of the currently predicted level change and damped previously predicted trend. The weights and the damping coefficient are either user-supplied or can be optimized. If the optimization of parameters is chosen, they will be set so that the MAPE (Mean Square Percentage Error) is minimized.

The HOLT model is suitable for modeling time series with a slowly changing linear trend. It is usually used only to model short series (for example, 52 or fewer data points for a weekly system).

Availability

HOLT can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes*
DP mode Yes
*The HOLT model is used on promotional nodes only if no other models can be used. See "Summary of the Forecasting Process".

Causal Factors Used by This Model

None.

Parameters Used by This Model

Parameter Default Description
Alpha* 0.1 The manually set level renovation coefficient, valid only when OptimizedHolt* = 0.
Gamma* 0.3 The manually set trend renovation coefficient, valid only when OptimizedHolt* = 0.
OptimizedAlphaIter* 3 The number of values on the Alpha grid for parameters optimization.
OptimizedGammaIter* 10 The number of values on the Gamma grid (default) for parameters optimization.
OptimizedHolt* 0 Specifies whether the parameter values (Alpha & Gamma) are to be optimized (1) or preset (0).
Phi* 0.9 The trend damping coefficient, always set manually.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

ICMREGR

ICMREGR (the Intermittent CMREGR model) is an extension of both CMREGR and IREGR models.

Availability

ICMREGR can be used with the following engine modes:

Engine Mode Supported?
PE mode No (disable model if using this mode)
DP mode Yes

Causal Factors Used by This Model

ICMREGR uses the long causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
ChainLength* 500 Length of the generated Markov Chain, that is the number of models considered for averaging.
need_lag*   Specifies whether to use the Lag as a causal factor. Lag - the previous actual observation explains the next one.
reset_seed* 1 Specifies whether to reset the seed for random numbers generation at each run or simulation. If the seed is not reset,
there will be different results for each run; also the simulation results will differ from batch results. 1= reset_seed; 0 = do not reset seed.
Theoretically the model assumes that the seed is not reset.
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

IREGR

IREGR (the Intermittent Regression model) is useful because the Croston model fails to consider the obvious interdependency between quantities and times between occurrences of demands in intermittent series. Moreover, due to the nature of the Holt model used by Croston, causalities and seasonality are not modeled. IREGR spreads the data into a continuous series and fits to it a regression model with unequal variances. The resulting fit and forecast may be lumped back to form spikes, after being processed by the Bayesian blending procedure.

Availability

IREGR can be used with the following engine modes:

Engine Mode Supported?
PE mode No (disable model if using this mode)
DP mode Yes

Causal Factors Used by This Model

IREGR uses the short causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.

LOG

LOG (the Multiple Logarithmic Regression model) performs a logarithmic regression. Using logarithms is often a good way to find linear relationships in non-linear data.

This model fits to data a linear function of the form:

ln(Series+ones*Shift) = Causals*Coeff + Resid

Where:

Forecast values are obtained by back-transforming the projected regression, while considering the theoretical form of the expectation of a log-normal random variable

Availability

LOG can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes
DP mode Yes

Causal Factors Used by This Model

LOG uses the short causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
LogCorrection 1 Specifies whether to use (1) or not (0) the correct form of the expectation of a lognormal variable.
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
UseEnvelope no Specifies whether Demantra will use the envelope function described in "Causal Factor Testing (Envelope Function)".
ENVELOPE_RESET_SEED* 0 Specifies whether to reset the randomization seed for the envelope function, which evaluates different sets of causal factors for different engine models.
ENVELOPE_CHAIN_LENGTH* 50 Specifies the number of variations of causal factors to try, for each model.
BestOrMix* 0 Specifies whether to use the best set of causal factors (0) or to use a mix of the causal factors (1).
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

LOGISTIC

LOGISTIC runs logistic regression on the causal factors.

Availability

LOGISTIC can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes
DP mode Yes

Causal Factors Used by This Model

LOGISTIC uses the short causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

The LOGISTIC parameters also apply to the ARLOGISTIC model.

Parameter Default Description
Potential* 1.5 Specifies the upper bound of market effort effect, as a multiple of maximum historical sales.
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

Moving Average

The Moving Average model considers the most recent time buckets, computes the average, and uses that for the forecast, resulting in a flat line. This forecast is generally suitable only in the near future.

This model is provided as a possible substitute for the NAIVE model, for use when all other models have failed. It does not generally interact well with other models and so is recommended only for use if no other forecast models have worked.

See "Forecast Failure", and also see "NAIVE".

Availability

The Moving Average model can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes (no lift is generated, however)
DP mode Yes

Causal Factors Used by This Model

None.

Parameters Used by This Model

Parameter Default Description
NaiveEnable   Specifies what to do at the highest forecast level, upon failure of all models.
  • no (0): Do not enable either NAIVE or Moving Average models. Do not generate a forecast.

  • yes (1): Enable use of the NAIVE model.

  • 2 or higher: Enable use of the Moving Average model. In this case, the setting of NaiveEnable specifies the number of recent time buckets to use in calculating the moving average.

MRIDGE

MRIDGE (the Modified Ridge Regression model) produces regression coefficients of moderate magnitude, thus assuring that lifts associated with events are of moderate size. This is equivalent to imposing a set of constraints on the coefficients in a spherical region centered at zero. In the literature, this model is of the shrinkage family.

Availability

MRIDGE can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes
DP mode Yes

Causal Factors Used by This Model

MRIDGE uses the long causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
RIDGEK* 1 The larger the value of RIDGEK, the more shrinkage occurs. When RIDGEK=0, the model is equivalent to REGR.
METRIC NORM* 2 Chooses the norm for scaling the input causal factors.
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
UseEnvelope no Specifies whether Demantra will use the envelope function described in "Causal Factor Testing (Envelope Function)".
BestOrMix* 0 Specifies whether to use the best set of causal factors (0) or to use a mix of the causal factors (1).
*This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.

NAIVE

The NAIVE model is used only at the highest forecast level, and is used only if all other models (including HOLT) have failed. See "Forecast Failure", and also see "Moving Average".

It uses a simple averaging procedure.

Availability

NAIVE can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes (no lift is generated, however)
DP mode Yes

Causal Factors Used by This Model

None.

Parameters Used by This Model

Parameter Default Description
NaiveEnable   Specifies what to do at the highest forecast level, upon failure of all models.
no (0): Do not enable either NAIVE or Moving Average models. Do not generate a forecast.
yes (1): Enable use of the NAIVE model.
2 or higher: Enable use of the Moving Average model. In this case, the setting of NaiveEnable specifies the number of recent time buckets to use in calculating the moving average.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.

Note: When generating naive forecast at the highest forecast level, one of two methods are used. If Holt was not attempted for this node, a simplified version of the Holt model will be used and the combination will be marked with the letter T. If Holt was previously attempted a moving average based model is used instead and the node is marked with N for Naive.

REGR

REGR (the Multiple Regression model) fits to data a linear function of the form:

Series = Causals * Coeff + Resid

Where:

Using this additive model, we are assuming that a linear relationship exists. The dependent variable is linearly related to each of the independent variables.

The regression parameters estimates are obtained by using the method of least square error.

Regression coefficients that are not statistically significant are identified by special tests and assigned the value 0.

Note: All regression-based models use REGR implicitly.

Availability

REGR can be used with the following engine modes:

Engine Mode Supported?
PE mode Yes
DP mode Yes

Causal Factors Used by This Model

REGR uses the short causal factors; see "Flags on Causal Factors".

Parameters Used by This Model

Parameter Default Description
UseNonNegRegr no Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation.
AllowNegative no Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero.
UseEnvelope no Specifies whether Demantra will use the envelope function described in "Causal Factor Testing (Envelope Function)".
ENVELOPE_RESET_SEED* 0 Specifies whether to reset the randomization seed for the envelope function, which evaluates different sets of causal factors for different engine models.
ENVELOPE_CHAIN_LENGTH* 50 Specifies the number of variations of causal factors to try, for each model.
BestOrMix* 0 Specifies whether to use the best set of causal factors (0) or to use a mix of the causal factors (1).
* This parameter is model-specific and is not displayed in the Business Modeler; see the Parameters table.