Oracle® Retail Demand Forecasting Cloud Service User Guide Release 19.0 F24922-17 |
|
![]() Previous |
![]() Next |
This chapter describes the functionality of the Data Cleansing for Seasonality Estimation Task which includes:
Preprocessing Administration, see About Preprocessing Administration
Source Measure Maintenance see About Source Measure Maintenance
The following table lists the workspaces, steps, and views for the Forecast Review task.
This section describes how the preprocessing functionality is implemented in RDF using the Preprocess Administration Workspace.
Use the Preprocess Administration Workspace to perform this step:
Note: Similar to the Preprocess Administration Workspace is the Source Measure Maintenance Workspace. For functionality differences, see Source Measure Maintenance Functionality. |
About Preprocessing
Preprocessing is a module that is used to correct historical data prior to forecast generation when history does not represent general demand patterns. It is meant to automatically make adjustments to the raw POS (Point Of Sales) data so the next demand forecasts do not replicate undesired patterns.
Data Preprocessing is commonly used to:
Correct for lost sales due to stock-outs
Cleanse data for effects of promotions and short-term price changes (optional)
Correct for outliers – unusually high or low values introduced by human error or special events (hurricane that left a store closed for a week)
Scrub data manually to fake history and override user history
Adjust demand for the occasional 53rd calendar week
Manage demand created during events and holidays that do not occur in the same period every year, for example, Back to School.
Preprocessing runs after the data has been loaded from the host system and prior to forecast generation. Use the Preprocess Administration Workspace to select the techniques used to transform sales to unconstrained demand. It is common for an environment to require preprocessing to run multiple times to properly smooth the history. Commonly, there are up to three or four runs to go from raw sales to the data source that is used to generate the forecast. For RDF, a maximum of six runs is allowed for one data source. For example, if there is one baseline and one causal level, there can be up to six preprocessing runs allowed to create the data source for the baseline forecast, and up to six runs allowed to create the data source for the causal forecast.
Preprocessing Data in the RDF Workflow
Preprocessing offers a variety of algorithm methods to support the business requirements. The main reason for preprocessing is to transform the raw sales data into a measure that gets as close as possible to unconstrained demand.
The preprocessing step is most often implemented in batch, by invoking a preprocessing special expression. The special expression takes several measures as input. For instance, one needs to specify the measure to be corrected, the desired algorithm, and the number of periods to be considered. However, you can go into more detail, and specify several filter window lengths, or exponential smoothing parameters.
To build the Preprocess Administration workspace, perform these steps:
From the left sidebar menu, click the Task Module to view the available tasks.
Click the Estimate Historical Demand activity and then click Seasonality to access the available workspaces.
Click Preprocess Administration. The Preprocess Administration wizard opens.
You can open an existing workspace, but to create a new workspace, click Create New Workspace.
Enter a name for your new workspace in the label text box and click OK.
The Workspace wizard opens. Select the locations you want to work with and click Next.
Select the products you want to work with and click Finish.
The wizard notifies you that your workspace is being prepared. Successful workspaces are available from the Dashboard.
The available views are:
These views make available the preprocessing parameters for four rounds of preprocessing runs, necessary to calculate the data source for baseline forecasting and promotional forecasting.
In RDF, preprocessing is configured to create the data sources for baseline forecasting, as well as, causal forecasting. The creation of each of the sources can go through at most six runs of preprocessing. For example, to generate the Causal Data Source, it is configured for three runs and for the baseline data source, it is configured for four runs.
The Preprocess Admin Promo view allows you to define the scope of the preprocessing run, as well as filter out item/locations where preprocessing does not make sense because of lack of enough historical sales.
The Preprocess Admin Seasonality view contains the following measures:
Min Number of Weeks in System
This parameter defines the number of periods from when an item was introduced in the system. Usually the introduction time is considered to be the date when the item first sold. This check is also introduced to stop making data corrections for items that are very new, and where cleansing would be unreliable.
Min Number of Weeks with Sales
This parameter defines the number of weeks with sales that an item/store combination needs to have to qualify for data cleansing. The reasoning behind this check is that for items without enough data, corrections may not be reliable. Once there is enough data, and trends become clearer, corrections can be made.
There are actually two views with the same set of measures, where, you can enter values for parameters specific for some of the preprocessing methods available in the special expression to create the data sources for the baseline and causal forecasts. There is a set of parameters for each of the maximum six runs allowed. However, very likely not all six are configured. For example, when generating the Causal Data Source, three runs are configured. The parameters are entered at the class/store intersection
The Preprocess Method Parameters for Seasonality view contains the following measures:
Alpha
Exponential smoothing coefficient used to calculate past and future velocities.
Future Weeks
This represents the maximum number of data points to calculate the future velocity, when using the Standard Exponential Smoothing or Lost Sales Standard Exponential Smoothing preprocessing methods.
Last Date
This represents the end date of the preprocessing window; it is typically today's date, but can be any date in the past.
Past Weeks
This represents the maximum number of data points to calculate the past velocity, when using the Standard Exponential Smoothing or Lost Sales Standard Exponential Smoothing preprocessing methods.
Preprocessing Window
Number of historical data points that are preprocessed.
Short Event Max Length
This measure is related to a new method used to de-promote the sales. The legacy way of de-promoting is by using the Standard ES method. If the promolift is non-zero and the promotion length is less than or equal to short event max length, Standard Exponential Smoothing is performed on the input.
If the new method is specified and the promotion window is longer than the value stored in this measure, the input data is divided by the promolift value to remove promo lift.
Standard Median Window
Filter window length for the Standard Median preprocessing method.
Partial Outage
A scalar parameter indicating if the period immediately following an out-of-stock period should be adjusted. The default behavior is for the flag to be True.
Stop at Event
This parameter determines which periods are included in the calculation of past/future velocities.
If the flag is set to True, then the algorithm only includes periods before the first event flag or event indicator.
If the flag is False, then all available, non-flagged periods, within the windows defined by Past Weeks and Future Weeks, are used in the calculation of the past and future velocities.
The default setting for the flag is False.
Window#
Additionally, there are five measures, Window1 thru Window5. These measures define the lengths of the five Standard Median filter windows that are run as part of the Retail Median preprocessing method.
In this view at the item/store level, you can override values for parameters specific for some of the preprocessing methods available in the special expression. There are two views, corresponding to the preprocessing runs necessary to create the data sources for the baseline and causal forecasts.
After all parameters are set and committed back to the domain, usually a batch job will run the pre-processing steps and prepare the data source s for forecast generation.
The Preprocess Method Parameters Override for Seasonality view contains the following measures:
Alpha Override
Exponential smoothing coefficient used to calculate past and future velocities.
Future Weeks Override
This represents the maximum number of data points to calculate the future velocity, when using the Standard Exponential Smoothing or Lost Sales Standard Exponential Smoothing preprocessing methods.
Last Date Override
This represents the end date of the preprocessing window; it is typically today's date, but can be any date in the past.
Past Weeks Override
This represents the maximum number of data points to calculate the past velocity, when using the Standard Exponential Smoothing or Lost Sales Standard Exponential Smoothing preprocessing methods.
Preprocessing Window Override
Number of historical data points that are preprocessed.
Short Event Max Length
This measure is related to a new method used to de-promote the sales. The legacy way of de-promoting is by using the Standard ES method. If the promolift is non-zero and the promotion length is less than or equal to short event max length, Standard Exponential Smoothing is performed on the input.
If the new method is specified and the promotion window is longer than the value stored in this measure, the input data is divided by the promo lift value to remove promo lift.
Standard Median Window Override
Filter window length for the Standard Median preprocessing method.
Partial Outage Override
A scalar parameter indicating if the period immediately following an out-of-stock period should be adjusted. The default behavior is for the flag to be True.
Stop at Event Override
This parameter determines which periods are included in the calculation of past/future velocities.
If the flag is set to True, then the algorithm only includes periods before the first event flag or event indicator.
If the flag is False, then all available, non-flagged periods, within the windows defined by nfut and npast, are used in the calculation of the past and future velocities.
The default setting for the flag is False.
Window#
Additionally, there are five measures, Window1 thru Window5. These measures define the lengths of the five Standard Median filter windows that are run as part of the Retail Median preprocessing method.
The view displays the measures necessary to create the data source for baseline forecasting. This involves four rounds of preprocessing that run in batch or online.in this order:
Correcting for stockouts
Correcting for outliers
Depromoting sales
Smooth sales
Note: Measures are replicated for each round of preprocessing. |
The Preprocess Panel for Seasonality view contains the following measures:
Input Data Source
Indicates the measure that will be corrected. This is the input to the first preprocessing run. There are no inputs available for other runs other than the first run.
First Time-Phased Parameter
This measure stores the first time-phased measure that is required for some preprocessing methods. For instance, for the Std ES LS method, this measure would store the measure name of the outage flag. Or for the Std ES method, it could store the name of the outlier flag.
Preprocess Method
Name of the preprocessing method to be used for each run. This method is selected in the Configuration Tools
Output Data Measure
Indicates the measure that stores the result of the last configured preprocessing run. For instance, for the Preprocess Panel for Baseline, the output comes from run 4.
Run Label
A label denoting the purpose of the preprocessing run, for example, Correct Outliers, or Depromote Sales.
Run Preprocess Flag
Boolean measure indicating if this run should be enabled or skipped.
Second Time-Phased Parameter
This measure stores the second time-phased measure that is required for some preprocessing methods. For instance, for the Forecast Sigma method, this measure would store measure name of the confidence intervals. Or for the Override method it could store the measure name of the outage flag.
This section describes how the preprocessing functionality is implemented in RDF using the Source Measure Maintenance Workspace.
The functionality in the Source Measure Maintenance Workspace is a superset of the functionality in the Preprocess Administration Workspace. The purpose and functionality between the two is described in Source Measure Maintenance Functionality.
Use the Source Measure Maintenance Workspace to perform these steps:
The Preprocess Administration Workspace and the Source Measure Maintenance Workspace have a large set of common content. The main difference is that while the Source Measure Maintenance Workspace has the calendar hierarchy, on top of the product and location, and the Preprocess Administration Workspace has only the product and location.The additional hierarchy allows the review of the time-phased preprocessing measures, as well as the calculated forecasting data sources.runs.
Due to their additional dimension of week, these measures add to the size of the workspace, and also make workspace operations slower. For instance, workspace build, refresh, commit, and so on, take longer than in the otherwise similar Preprocess Administration Workspace.
Preprocess Administration Workspace
The Preprocess Administration Workspace is at the product/location intersection, so it can be built with a lot of positions, without experiencing poor performance. The purpose is to set preprocessing parameters, which are inputs to the special expression that is run in batch.
Source Measure Maintenance Workspace
The Source Measure Maintenance Workspace, described in this chapter, is at the product/location/calendar intersection, and is a lot more data intensive. The purpose is to set preprocessing parameters and run the data filtering online, with the ability to review the results without having to wait for an overnight batch. If the results are not as expected, or you want to experiment with different settings, you can make changes to the parameters and rerun the custom menus. To achieve this it is expected that only a small subset of the available product/locations is included in the workspace.
To build the Source Measure Maintenance workspace, perform these steps:
From the left sidebar menu, click the Task Module to view the available tasks.
Click the Estimate Historical Demand activity and then click Seasonality to access the available workspaces.
Click Source Measure Maintenance. The Source Measure Maintenance wizard opens.
You can open an existing workspace, but to create a new workspace, click Create New Workspace.
Enter a name for your new workspace in the label text box and click OK.
The Workspace wizard opens. Select the products you want to work with and click Next.
Note: It is important to include all products that are members of the Merchandise dimensions in the forecast levels to be analyzed. For example, if you select to view a forecast level that is defined at subclass/store/week, you must include all items that are members of the particular subclass to be analyzed. It is recommended that Position Query functionality or selection from aggregate levels in the Merchandise hierarchy is employed if the task supports an AutoTask build. |
Select the locations you want to work with and click Next.
Note: It is important to include all locations that are members of the location dimensions in the forecast levels to be analyzed. For example, if you select to view a forecast level that is defined at item/chain/week, you should include all locations that are members of the particular chain to be analyzed. It is recommended that Position Query functionality or selection from aggregate levels in the location hierarchy is employed if the task supports an AutoTask build. |
Select the weeks of the forecast you wish to review and click Finish.
The wizard notifies you that your workspace is being prepared. Successful workspaces are available from the Dashboard.
The Source Measure Maintenance workspace is built and includes these steps:
The available views are:
These views make available the preprocessing parameters for four rounds of preprocessing runs, necessary to calculate the data source for baseline forecasting and promotional forecasting.
In RDF, preprocessing is configured to create the data sources for baseline forecasting, as well as, causal forecasting. The creation of each of the sources can go through at most six runs of preprocessing. For example, to generate the Causal Data Source, it is configured for three runs and for the baseline data source, it is configured for four runs.
This step contains the Preprocess Admin view that allows you to define the scope of the preprocessing run, as well as filter out item/locations where preprocessing does not make sense because of lack of enough historical sales.
The Preprocessing Panel for Baseline view contains the following measures:
Min Number of Weeks in System
This parameter defines the number of periods from when an item was introduced in the system. Usually the introduction time is considered to be the date when the item first sold. This check is also introduced to stop making data corrections for items that are very new, and where cleansing would be unreliable.
Min Number of Weeks with Sales
This parameter defines the number of weeks with sales that an item/store combination needs to have to qualify for data cleansing. The reasoning behind this check is that for items without enough data, corrections may not be reliable. Once there is enough data, and trends become clearer, corrections can be made.
There are actually two views with the same set of measures, where, you can enter values for parameters specific for some of the preprocessing methods available in the special expression to create the data sources for the baseline and causal forecasts. There is a set of parameters for each of the maximum six runs allowed. However, very likely not all six are configured. For example, when generating the Causal Data Source, three runs are configured. The parameters are entered at the class/store intersection
The Preprocess Method Parameters for Seasonality view contains the following measures:
Alpha
Exponential smoothing coefficient used to calculate past and future velocities.
Future Weeks
This represents the maximum number of data points to calculate the future velocity, when using the Standard Exponential Smoothing or Lost Sales Standard Exponential Smoothing preprocessing methods.
Last Date
This represents the end date of the preprocessing window; it is typically today's date, but can be any date in the past.
Past Weeks
This represents the maximum number of data points to calculate the past velocity, when using the Standard Exponential Smoothing or Lost Sales Standard Exponential Smoothing preprocessing methods.
Preprocessing Window
Number of historical data points that are preprocessed.
Short Event Max Length
This measure is related to a new method used to de-promote the sales. The legacy way of de-promoting is by using the Standard ES method. If the promo lift is non-zero and the promotion length is less than or equal to short event max length, Standard Exponential Smoothing is performed on the input.
If the new method is specified and the promotion window is longer than the value stored in this measure, the input data is divided by the promo lift value to remove promo lift.
Standard Median Window
Filter window length for the Standard Median preprocessing method.
Partial Outage
A scalar parameter indicating if the period immediately following an out-of-stock period should be adjusted. The default behavior is for the flag to be True.
Stop at Event
This parameter determines which periods are included in the calculation of past/future velocities.
If the flag is set to True, then the algorithm only includes periods before the first event flag or event indicator.
If the flag is False, then all available, non-flagged periods, within the windows defined by nfut and npast, are used in the calculation of the past and future velocities.
The default setting for the flag is False.
Window#
Additionally, there are five measures, Window1 thru Window5. These measures define the lengths of the five Standard Median filter windows that are run as part of the Retail Median preprocessing method.
In this view at the item/store level, you can override values for parameters specific for some of the preprocessing methods available in the special expression. There are two views, corresponding to the preprocessing runs necessary to create the data sources for the baseline and causal forecasts.
After all parameters are set and committed back to the domain, usually a batch job will run the pre-processing steps and prepare the data source s for forecast generation.
The Preprocess Method Parameters Override for Seasonality view contains the following measures:
Alpha Override
Exponential smoothing coefficient used to calculate past and future velocities.
Future Weeks Override
This represents the maximum number of data points to calculate the future velocity, when using the Standard Exponential Smoothing or Lost Sales Standard Exponential Smoothing preprocessing methods.
Last Date Override
This represents the end date of the preprocessing window; it is typically today's date, but can be any date in the past.
Past Weeks Override
This represents the maximum number of data points to calculate the past velocity, when using the Standard Exponential Smoothing or Lost Sales Standard Exponential Smoothing preprocessing methods.
Preprocessing Window Override
Number of historical data points that are preprocessed.
Short Event Max Length
This measure is related to a new method used to de-promote the sales. The legacy way of de-promoting is by using the Standard ES method. If the promo lift is non-zero and the promotion length is less than or equal to short event max length, Standard Exponential Smoothing is performed on the input.
If the new method is specified and the promotion window is longer than the value stored in this measure, the input data is divided by the promo lift value to remove promo lift.
Standard Median Window Override
Filter window length for the Standard Median preprocessing method.
Partial Outage Override
A scalar parameter indicating if the period immediately following an out-of-stock period should be adjusted. The default behavior is for the flag to be True.
Stop at Event Override
This parameter determines which periods are included in the calculation of past/future velocities.
If the flag is set to True, then the algorithm only includes periods before the first event flag or event indicator.
If the flag is False, then all available, non-flagged periods, within the windows defined by nfut and npast, are used in the calculation of the past and future velocities.
The default setting for the flag is False.
Window#
Additionally, there are five measures, Window1 thru Window5. These measures define the lengths of the five Standard Median filter windows that are run as part of the Retail Median preprocessing method.
The view displays the measures necessary to create the data source for baseline forecasting. This involves four rounds of preprocessing that run in batch or online.in this order:
Correcting for stockouts
Correcting for outliers
Depromoting sales
Smooth sales
Note: Measures are replicated for each round of preprocessing. |
The Preprocess Panel for Seasonality view contains the following measures:
Input Data Source
Indicates the measure that will be corrected. This is the input to the first preprocessing run. There are no inputs available for other runs other than the first run.
First Time-Phased Parameter
This measure stores the first time-phased measure that is required for some preprocessing methods. For instance, for the Std ES LS method, this measure would store the measure name of the outage flag. Or for the Std ES method, it could store the name of the outlier flag.
Preprocess Method
Name of the preprocessing method to be used for each run. This method is selected in the Configuration Tools
Output Data Measure
Indicates the measure that stores the result of the last configured preprocessing run. For instance, for the Preprocess Panel for Baseline, the output comes from run 4.
Run Label
A label denoting the purpose of the preprocessing run, for example, Correct Outliers, or Depromote Sales.
Run Preprocess Flag
Boolean measure indicating if this run should be enabled or skipped.
Second Time-Phased Parameter
This measure stores the second time-phased measure that is required for some preprocessing methods. For instance, for the Forecast Sigma method, this measure would store measure name of the confidence intervals. Or for the Override method it could store the measure name of the outage flag.
The main purpose of this step is to display time-phased measures that represent input and output to the preprocessing stages, run in batch based on the settings selected in Preprocessing Admin.
This step contains the Source Measure Maintenance for Seasonality View.
This view can show either the Baseline View or the Causal View.
Baseline View
The baseline view displays the measures necessary to create the data source for baseline forecasting. This involves four rounds of preprocessing that run in batch or online in this order:
Correcting for stockouts
Correcting for outliers
Depromoting sales
Smooth sales
Causal View
The causal view displays the measures necessary to create the data source for causal forecasting. This involves three rounds of preprocessing that are run in batch:
Correcting for stockouts
Correcting for outliers
Deseasonalizing the measure to create the Causal Data Source
This view displays measures that represent input and output of the preprocessing runs, in table format.
The Source Measure Maintenance for Seasonality view contains the following measures:
User Adjustment
In this measure, you can enter values that are going to be added to the preprocessing adjustments to create the data sources.
The logic is: data source = weekly sales + preprocessing adjustments + user adjustment
This measure is read/write.
Weekly Sales
This measure stores the raw sales loaded in RDF. This is the input to the first run of preprocessing. This measure is read only.
Data Source
This measure represents the output of the preprocessed raw sales, as well as incorporates the user adjustments according to the formula:
data source = weekly sales + preprocessing adjustments + user adjustments.
Out of Stock Indicator
This measure is either loaded or calculated by the rules in the custom menu. It is used during the pre-processing run that corrects sales for lost sales.
Outliers Indicator
This measure is either loaded or calculated by the rules in the custom menu. It is used during the pre-processing run that corrects the sales for outliers.
Promotion Indicator
This measure is usually calculated as the or of all available Boolean promotional variables. It is used during the preprocessing run that removes promotional sales.