This chapter contains reference information for the theoretical models that the Analytical Engine uses.
This chapter covers the following topics:
Note: Oracle provides two different modes for the Analytical Engine:
In PE mode, the engine is suitable for use with Promotion Effectiveness.
In DP mode, the engine is suitable for use in demand planning applications.
For each model, this chapter indicates which engine modes that model can be used with.
You use the Business Modeler to apply the following flags to the causal factors; see "Configuring Global and Local Causal Factors" and "Configuring Promotional Causal Factors":
Flag* | Meaning |
---|---|
short | For use by the short models (BWINT, IREGR, LOGREGR, LOGISTIC, and REGR). These models use all causal factors that they are given. |
long | For use by the long models (ARLOGISTIC, CMREGR, ELOG, ICMREGR, and MRIDGE). These models examine all the causal factors they are given, but choose the ones that give the best results. |
non-seasonal | For use by the non -seasonal models (ARIX and ARX). The only causal factors that should be flagged as non-seasonal are ones that are not a predictable function of time. For example, price varies with time, but randomly, so price should be flagged as non-seasonal. |
multiplicative group 1 | For use only by the DMULT model. If you are using this model, each causal factor should use one of these flags. See "DMULT". |
multiplicative group 2 | |
*Name of flag as displayed in the Causal Factors screen or in the Promotional Causal Factors screen. |
Models not listed here use other mechanisms to choose their causal factors or do not use causal factors at all.
ARIX includes integrated auto-regression terms at lag 1 and an unknown seasonal lag k, and linear causal factors.
The value of k is chosen from set of possible seasonal indexes to produce the best fit. Causal factors include the constant and events (without seasonal causal factors and without time).
Availability
ARIX can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes* |
DP mode | Yes |
*The ARIX model is never used on promotional nodes. See "Summary of the Forecasting Process". |
Causal Factors Used by This Model
ARIX uses the non-seasonal causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
The ARIX parameters also apply to the ARX model.
ARLOGISTIC is an extension of the LOGISTIC model and includes auto-regression and logistic regression terms.
Availability
ARLOGISTIC can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | No (disable model if using this mode) |
DP mode | Yes |
Causal Factors Used by This Model
ARLOGISTIC uses the long causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
ARLOGISTIC uses the same parameters as LOGISTIC; see "Parameters Used by This Model".
This model includes auto-regression terms at lag 1 and an unknown seasonal lag k, and linear causal factors. The value of k is chosen from set of possible seasonal indexes to produce the best fit. Causal factors include the constant and events (without seasonal causal factors and without time).
Availability
ARX can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes* |
DP mode | Yes |
*The ARX model is never used on promotional nodes. See "Summary of the Forecasting Process". |
Causal Factors Used by This Model
ARX uses the non-seasonal causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
ARX uses the same parameters as ARIX; see "ARIX".
BWINT (the Multiplicative Regression-Winters model) runs multiplicative regression on the causal factors, then exponentially smooths the resulting residuals in HOLT manner and then runs multiple regression of the smoothed residuals. BWINT models trend, seasonality and causality.
Availability
BWINT can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | No (disable model if using this mode) |
DP mode | Yes |
Causal Factors Used by This Model
BWINT uses the short causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
CMREGR (the Markov Chain Monte-Carlo model) fits to data an assortment of linear functions of the form: Series= Causals*Coeff + Resid.
Where:
Causals are various subsets of causal factors, chosen by a random process from all possible combinations of factors.
The first set of causal factors consists of a collection of factors along with the lagged time series. Then, for a given length, a chain of that length is generated, and that path of that Markov chain is traveled. The states of the chain are subsets of factors, the transition probabilities for neighboring states are based on the ratio of BICs (Bayesian Information Criteria) and are zero for non- neighboring states. Neighboring states are states that differ only by one member. At each pass, a new factor is chosen randomly. If the current model does not contain this factor, it joins, with the calculated transition probability, the group to form the next model. Thus, the greater the improvement in the model (as measured by BIC), the more probability has the model to be employed. If the current model already contains this factor, then, with the calculated transition probability, it leaves the group.
Also, a special causal factor Lag is used; this is merely the original series lagged back by one time period. When the procedure finds this causal factor useful for modeling, the meaning is that there is a significant autoregressive component in the data, which indicates the presence of random trends. If the influence of Lag is dominant over other factors, which is indicated by a large Lag coefficient, the fit will inhere the lagging effect and when plotted on the same graph as the original series, will seem to "echo" previous observations. This means that the model was unable to pick up any systematic behavior in the series, and the best it can do is to highly correlate fitted values with lagged data.
Availability
CMREGR can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes |
DP mode | Yes |
Causal Factors Used by This Model
CMREGR uses the long causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
DMULT, the Multiplicative Multi-Seasonal Regression model, divides causal factors into two groups and combines them in a multiplicative linear function of the following form:
(sum of values in causal factor group 1) * (sum of values in causal factor group 2)
This function can be used, for example, to combine daily and monthly seasonality.
Availability
DMULT can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes |
DP mode | Yes |
Causal Factors Used by This Model
When you define causal factors and promotional causal factors, the Causal Factors screen and the Promotional Causal Factors screen enable you to place each factor into multiplicative group 1 or multiplicative group 2.
These options correspond to the DAILY_VAL (multiplicative group 1) and MONTHLY_VAL (multiplicative group 2) columns in the causal_factors and the promotional_causal_factors tables.
Typically, one group contains daily causal factors such as the days of the week D1,D2,..,D7. The other group contains the remaining causal factors. Each group should include at least one causal factor, and each causal factor should be in only one group.
Parameters Used by This Model
ELOG (the Logarithmic CMREGR model) performs the CMREGR procedure on the log-transformed time series.
As with the CMREGR model, this model uses a special causal factor (Lag); see "CMREGR".
Availability
ELOG can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes |
DP mode | Yes |
Causal Factors Used by This Model
ELOG uses the long causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
FCROST (the Croston Model for Intermittent Demand) is useful for intermittent demand, which can be viewed as the demand by a distributor that supplies the product to end customers. What is visible to the demand planner is the bulk demand by the distributor, while the periodic demand of retailers is unknown. Thus, the quantities most probably reflect replenishment orders, rather than demand. Visually the data consists of peaks of random height with random intervals between the peaks.
This model is useful for data involving substantial number of zeros, and is particularly relevant for forecasting demand of slow moving parts. The model utilizes the Holt procedure for forecasting both quantities and inter-event times.
Availability
FCROST can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | No (disable model if using this mode) |
DP mode | Yes |
Causal Factors Used by This Model
None.
Parameters Used by This Model
HOLT (the Double Exponential Smoothing model) provides realization for the Holt damped two-parameter exponential smoothing algorithm. The forecast is a projection of the current level estimate shifted by damped trend estimate. The level estimates are computed recursively from data as weighted averages of the current series value and the value of the previous one-step-ahead forecast. The trend (change of level) estimates are computed as weighted averages of the currently predicted level change and damped previously predicted trend. The weights and the damping coefficient are either user-supplied or can be optimized. If the optimization of parameters is chosen, they will be set so that the MAPE (Mean Square Percentage Error) is minimized.
The HOLT model is suitable for modeling time series with a slowly changing linear trend. It is usually used only to model short series (for example, 52 or fewer data points for a weekly system).
Availability
HOLT can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes* |
DP mode | Yes |
*The HOLT model is used on promotional nodes only if no other models can be used. See "Summary of the Forecasting Process". |
Causal Factors Used by This Model
None.
Parameters Used by This Model
ICMREGR (the Intermittent CMREGR model) is an extension of both CMREGR and IREGR models.
Availability
ICMREGR can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | No (disable model if using this mode) |
DP mode | Yes |
Causal Factors Used by This Model
ICMREGR uses the long causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
IREGR (the Intermittent Regression model) is useful because the Croston model fails to consider the obvious interdependency between quantities and times between occurrences of demands in intermittent series. Moreover, due to the nature of the Holt model used by Croston, causalities and seasonality are not modeled. IREGR spreads the data into a continuous series and fits to it a regression model with unequal variances. The resulting fit and forecast may be lumped back to form spikes, after being processed by the Bayesian blending procedure.
Availability
IREGR can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | No (disable model if using this mode) |
DP mode | Yes |
Causal Factors Used by This Model
IREGR uses the short causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
Parameter | Default | Description |
---|---|---|
UseNonNegRegr | no | Specifies whether to constrain the regression coefficients to nonnegative values, within the core least squares estimation. |
AllowNegative | no | Specifies whether negative values of fit and forecast are allowed. If negative values are not allowed, then any non-positive fitted and forecasted values are set to zero. |
LOG (the Multiple Logarithmic Regression model) performs a logarithmic regression. Using logarithms is often a good way to find linear relationships in non-linear data.
This model fits to data a linear function of the form:
ln(Series+ones*Shift) = Causals*Coeff + Resid
Where:
Resid is the vector of residuals.
ones is a column vector of ones.
Shift is a calculated value to shift the series away from non-positive values, before the logarithmic transformation.
Forecast values are obtained by back-transforming the projected regression, while considering the theoretical form of the expectation of a log-normal random variable
Availability
LOG can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes |
DP mode | Yes |
Causal Factors Used by This Model
LOG uses the short causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
LOGISTIC runs logistic regression on the causal factors.
Availability
LOGISTIC can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes |
DP mode | Yes |
Causal Factors Used by This Model
LOGISTIC uses the short causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
The LOGISTIC parameters also apply to the ARLOGISTIC model.
The Moving Average model considers the most recent time buckets, computes the average, and uses that for the forecast, resulting in a flat line. This forecast is generally suitable only in the near future.
This model is provided as a possible substitute for the NAIVE model, for use when all other models have failed. It does not generally interact well with other models and so is recommended only for use if no other forecast models have worked.
See "Forecast Failure", and also see "NAIVE".
Availability
The Moving Average model can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes (no lift is generated, however) |
DP mode | Yes |
Causal Factors Used by This Model
None.
Parameters Used by This Model
MRIDGE (the Modified Ridge Regression model) produces regression coefficients of moderate magnitude, thus assuring that lifts associated with events are of moderate size. This is equivalent to imposing a set of constraints on the coefficients in a spherical region centered at zero. In the literature, this model is of the shrinkage family.
Availability
MRIDGE can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes |
DP mode | Yes |
Causal Factors Used by This Model
MRIDGE uses the long causal factors; see "Flags on Causal Factors".
Parameters Used by This Model
The NAIVE model is used only at the highest forecast level, and is used only if all other models (including HOLT) have failed. See "Forecast Failure", and also see "Moving Average".
It uses a simple averaging procedure.
Availability
NAIVE can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes (no lift is generated, however) |
DP mode | Yes |
Causal Factors Used by This Model
None.
Parameters Used by This Model
Note: When generating naive forecast at the highest forecast level, one of two methods are used. If Holt was not attempted for this node, a simplified version of the Holt model will be used and the combination will be marked with the letter T. If Holt was previously attempted a moving average based model is used instead and the node is marked with N for Naive.
REGR (the Multiple Regression model) fits to data a linear function of the form:
Series = Causals * Coeff + Resid
Where:
Causals is a matrix with the independent variables (causal factors) as its columns.
Coeff is a column vector of regression coefficient.
Resid are the (additive) residuals (errors).
Using this additive model, we are assuming that a linear relationship exists. The dependent variable is linearly related to each of the independent variables.
The regression parameters estimates are obtained by using the method of least square error.
Regression coefficients that are not statistically significant are identified by special tests and assigned the value 0.
Note: All regression-based models use REGR implicitly.
Availability
REGR can be used with the following engine modes:
Engine Mode | Supported? |
---|---|
PE mode | Yes |
DP mode | Yes |
Causal Factors Used by This Model
REGR uses the short causal factors; see "Flags on Causal Factors".
Parameters Used by This Model