Time Series Use Case Scenario

You work in an electronic store, and sales of laptops and tablets have increased over the last two quarters. You want to forecast your product sales for the next four quarters using historical timestamped data. You forecast sales using the Exponential Smoothing algorithm, predicting changes over evenly spaced intervals of time using historical data.

Topic	Link
OML4SQL GitHub Example	Time Series - Exponential Smoothing
`CREATE_MODEL2` Procedure	CREATE_MODEL2 Procedure
Generic Model Settings	DBMS_DATA_MINING - Model Settings
Exponential Smoothing Model (ESM) Settings	DBMS_DATA_MINING — Algorithm Settings:Exponential Smoothing
Data Dictionary Settings	Oracle Machine Learning Data Dictionary Views
Exponential Smoothing Model - Model Detail Views	Model Detail Views for Exponential Smoothing
About Time Series	About Time Series

Load Data

Access the data set from the SH schema and explore the data to understand the attributes.

Remember:

The data set used for this use case is from the SH schema. The SH schema can be readily accessed in Oracle Autonomous Database. For on-premises databases, the schema is installed during the installation or can be manually installed by downloading the scripts. See Installing the Sample Schemas.

To understand the data, you will perform the following:

Access the data.
Examine the various attributes or columns of the data set.
Assess data quality (by exploring the data).

Access Data

You will use SALES table data from the SH schema.

Examine Data

The following table displays information about the attributes from SALES:

Attribute Name	Information
`PROD_ID`	The ID of the product
`CUST_ID`	The ID of the customer
`TIME_ID`	The timestamp of the purchase of the product in yyy-mm-dd hh:mm:ss format
`CHANNEL_ID`	The channel ID of the channel sales data
`PROMO_ID`	The product promotion ID
`QUANTITY_SOLD`	The number of items sold
`AMOUNT_SOLD`	The amount or sales data

Identify Target Variable

In this use case, the task is to train a model that predicts the amount sold. Therefore, the target variable is the attribute AMOUNT_SOLD.

Explore Data

Explore the data to understand and assess the quality of the data. At this stage assess the data to identify data types and noise in the data. Look for missing values and numeric outlier values.

If you are working with Oracle Autonomous Database, you can use the Oracle Machine Learning (OML) Notebooks for your data science project, including assessing data quality. If you are using an on-premise Oracle Database, you can use the Oracle SQL Developer to assess data quality. Query the SH schema as described.

Note:

Each record in the database is called a case and each case is identified by a case_id. In this use case TIME_ID is the case_id as it is an independent variable and you are forecasting the sales for evenly spaced time.

The following steps help you with exploratory analysis of the data.

View the data in the SH.SALES table by running the following statement:
```
SELECT * FROM SH.SALES;
```

To find the number of rows in SH.SALES table, run the following statement:

%script
SELECT COUNT(*) from SH.SALES;


COUNT(*)   
    918843 
---------------------------

Find the distinct users in the table, run the following query:

%sql SELECT COUNT (DISTINCT CUST_ID) FROM SH.SALES;


COUNT(DISTINCTCUST_ID)   
                    7059 
---------------------------

To view the datatype of the sales table, run the following query:

%script 
DESCRIBE SH.SALES;


Name          	Null?    	Type         
------------- 	-------- 	------------ 
PROD_ID 	NOT NULL 	NUMBER 
CUST_ID 	NOT NULL 	NUMBER 
TIME_ID  	NOT NULL 	DATE     
CHANNEL_ID 	NOT NULL 	NUMBER 
PROMO_ID 	NOT NULL 	NUMBER 
QUANTITY_SOLD 	NOT NULL 	NUMBER(10,2) 
AMOUNT_SOLD 	NOT NULL 	NUMBER(10,2) 

---------------------------

To view all the NULLs and missing values, run the following query:

%sql SELECT COUNT(*) FROM SH.SALES WHERE PROD_ID=NULL OR CUST_ID=NULL OR
    TIME_ID=NULL OR CHANNEL_ID=NULL OR PROMO_ID=NULL OR QUANTITY_SOLD=NULL OR
    AMOUNT_SOLD=NULL;


COUNT(*)   
         0 
---------------------------

NULLs, if found, are automatically handled by the OML algorithms.

Now, prepare a view called ESM_SH_DATA by selecting the necessary columns from SH.SALES table. For this use case, select TIME_ID and AMOUNT_SOLD.

%script
CREATE OR REPLACE VIEW ESM_SH_DATA AS 
  SELECT TIME_ID, AMOUNT_SOLD FROM SH.SALES;


View ESM_SH_DATA created.
---------------------------

Count the number of rows to ensure that we have the same amount of data. Run the following query:
```
%script
SELECT count(*) from ESM_SH_DATA;
```
```
COUNT(*)   
    918843 
---------------------------
```

This completes the data understanding and data exploration stage. Time series data can contain missing values. The setting EXSM_SETMISSING can be used to specify how to handle missing values. The special value EXSM_MISS_AUTO indicates that, if the series contains missing values it is to be treated as an irregular time series. The Automatic Data Preparation (ADP) setting does not impact this data for time series. See How ADP Transforms the Data to understand how ADP prepares the data for some algorithms.

Build Model

To build a model using the time series data, you will use Exponential Smoothing algorithm on the ESM_SH_DATA view that is generated during the exploratory stage.

Oracle offers the Exponential Smoothing algorithm for time series.

Exponential smoothing is a forecasting method for time series data. It is a moving average method where exponentially decreasing weights are assigned to past observations. Components of Exponential Smoothing Model (ESM) such as trend and seasonality extensions, can have an additive or multiplicative form. For additive forms, the amplitude of the variation is independent of the level, whereas for multiplicative forms, the variation is connected to the level. The simpler additive models assume that error or noise, trend, and seasonality are linear effects within the recursive formulation.

To build a model using a supervised learning algorithm you may use a subset of the data into training and test data. Time series models usually use historical data to predict the future. This is different from model validation for classification and regression, which normally involves splitting data randomly into training and test sets. In this use case, there is no need to split the data set because the model is always predicting the current value based on information from the past. This means that although it seems that you train and test on the same data set, but when the model is applied, the forecast is always based on the previous date. In this use case, you will use the ESM_SH_DATA view.

To see the data in the ESM_SH_DATA view, run the following statement:

%sql
SELECT * from ESM_SH_DATA;

TIME_ID     AMOUNT_SOLD   
20-JAN-98         1205.99 
05-APR-98         1250.25 
05-APR-98         1250.25 
05-APR-98         1250.25 
05-APR-98         1250.25 
05-APR-98         1250.25 
05-APR-98         1250.25 
05-APR-98         1250.25 
05-APR-98         1250.25 
05-JUL-98         1210.21 
05-JUL-98         1210.21 
05-JUL-98         1210.21 
05-JUL-98         1210.21 
05-JUL-98         1210.21 ...

Build a model with the ESM_SH_DATA table, run the following script:
```
 %script

    BEGIN DBMS_DATA_MINING.DROP_MODEL('ESM_SALES_FORECAST_2');
    EXCEPTION WHEN OTHERS THEN NULL; END;
    /
    DECLARE
        v_setlist DBMS_DATA_MINING.SETTING_LIST;
    BEGIN
        v_setlist('ALGO_NAME')            := 'ALGO_EXPONENTIAL_SMOOTHING';
        V_setlist('EXSM_INTERVAL')        := 'EXSM_INTERVAL_QTR';
        V_setlist('EXSM_PREDICTION_STEP') := '4';
        V_setlist('EXSM_MODEL')           := 'EXSM_WINTERS';
        V_setlist('EXSM_SEASONALITY')     := '4';
	 V_setlist('EXSM_SETMISSING')	:= 'EXSM_MISS_AUTO');

        DBMS_DATA_MINING.CREATE_MODEL2(
            MODEL_NAME          => 'ESM_SALES_FORECAST_1',
            MINING_FUNCTION     => 'TIME_SERIES',
            DATA_QUERY          => 'select * from ESM_SH_DATA',
            SET_LIST            => v_setlst,
            CASE_ID_COLUMN_NAME => 'TIME_ID',
            TARGET_COLUMN_NAME  =>'AMOUNT_SOLD');
    END; 
```
```
PL/SQL procedure successfully completed.
    ---------------------------
    PL/SQL procedure successfully completed.
```
Examine the script:
- v_setlist is a variable to store SETTING_LIST.
- SETTING_LIST specifies model settings or hyperparameters for the model.
- DBMS_DATA_MINING is the PL/SQL package used for machine learning. These settings are described in DBMS_DATA_MINING - Model Settings.
- ALGO_NAME specifies the algorithm name. Since you are using Exponential Smoothing as the algorithm, the value of the setting is ALGO_EXPONENTIAL_SMOOTHING.
- EXSM_INTERVAL indicates the interval of the data set or a unit of interval size. For example, day, week, month, and so on. You want to predict for quarterly sales. Hence, the setting is EXSM_INTERVAL_QTR. This setting applies only to the time column with datetime type.
- EXSM_PREDICTION_STEP specifies how many predictions to make. You want to display each value representing a quarter. Hence, a value of 4 gives four values ahead prediction.
- EXSM_MODEL specifies the type of exponential smoothing model to be used. Here the value is EXSM_HW. The Holt-Winters triple exponential smoothing model with additive trend and multiplicative seasonality is applied. This type of model considers various combinations of additive and multiplicative trend, seasonality and error, with and without trend damping. Other options are EXSM_SIMPLE, EXSM_SIMPLE_MULT, EXSM_HOLT, EXSM_HOLT_DMP, EXSM_MUL_TRND, EXSM_MULTRD_DMP, EXSM_SEAS_ADD, EXSM_SEAS_MUL, EXSM_HW, EXSM_HW_DMP, EXSM_HW_ADDSEA, EXSM_DHW_ADDSEA, EXSM_HWMT, EXSM_HWMT_DMP.
- EXSM_SEASONALITY indicates how long a season lasts. The parameter specifies a positive integer value as the length of seasonal cycle. The value it takes must be larger than 1. For example, 4 means that every group of four values forms a seasonal cycle.
- EXSM_SETMISSING specifies how to handle missing values. In time series, the special value EXSM_MISS_AUTO indicates that, if the series contains missing values it is to be treated as an irregular time series.
The CREATE_MODEL2 procedure has the following settings:
- MODEL_NAME: A unique name that you will give to the model. Name of the model in the form [schema_name.]model_name. If you do not specify a schema, then your own schema is used. Here, the model name is ESM_SALES_FORECAST_1.
- MINING_FUNCTION: Specifies the machine learning function. Since it is a time series problem, select TIME_SERIES.
- DATA_QUERY: A query that provides training data for building the model. Here, the query is SELECT * FROM ESM_SH_DATA.
- SET_LIST: Specifies SETTING_LIST.
- CASE_ID_COLUMN_NAME: A unique case identifier column in the training data. In this use case, case_id is TIME_ID. If there is a composite key, you must create a new attribute before creating the model.
- TARGET_COLUMN_NAME: Specifies the column that is to be predicted. Also referred to as the target variable of the model. In other words, the value the model predicts. In this use case, you are predicting the sale of products in terms of their dollar price. Therefore, in this use case, the TARGET_COLUMN_NAME is AMOUNT_SOLD.
  
  Note:
  Any parameters or settings not specified are either system-determined or default values are used.

Evaluate

Evaluate your model by viewing diagnostic metrics and performing quality checks.

Sometimes querying dictionary views and model detail views is sufficient to measure your model's performance. However, you can evaluate your model by computing test metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), confusion matrix, lift statistics, cost matrix, and so on. For Association Rules, you can inspect various rules to see if they reveal new insights for item dependencies (antecedent itemset implying consequent) or for unexpected relationships among items.

Dictionary and Model Views

To obtain information about the model and view model settings, you can query data dictionary views and model detail views. Specific views in model detail views display model statistics which can help you evaluate the model.

By examining various statistics in the model detail views, you can compare models to arrive at one model that satisfies your evaluation criteria.

The data dictionary views for Oracle Machine Learning are listed in the following table. A database administrator (DBA) and USER versions of the views are also available.

View Name	Description
ALL_MINING_MODELS	Provides information about all accessible machine learning models
ALL_MINING_MODEL_ATTRIBUTES	Provides information about the attributes of all accessible machine learning models
ALL_MINING_MODEL_SETTINGS	Provides information about the configuration settings for all accessible machine learning models
ALL_MINING_MODEL_VIEWS	Provides information about the model views for all accessible machine learning models
ALL_MINING_MODEL_XFORMS	Provides the user-specified transformations embedded in all accessible machine learning models.

Model detail views are specific to the algorithm. You can obtain more insights about the model you created by viewing the model detail views. The names of model detail views begin with DM$xx where xx corresponds to the view prefix. See Model Detail Views.

You can review the model settings by running the following query:

 %sql
 
SELECT SETTING_NAME, SETTING_VALUE
  FROM USER_MINING_MODEL_SETTINGS
  WHERE MODEL_NAME = UPPER('ESM_SALES_FORECAST_1')
  ORDER BY SETTING_NAME;


SETTING_NAME                   SETTING_VALUE               
ALGO_NAME                      ALGO_EXPONENTIAL_SMOOTHING  
EXSM_ACCUMULATE                EXSM_ACCU_TOTAL             
EXSM_CONFIDENCE_LEVEL          .95                         
EXSM_INTERVAL                  EXSM_INTERVAL_QTR           
EXSM_MODEL                     EXSM_WINTERS                
EXSM_NMSE                      3                           
EXSM_OPTIMIZATION_CRIT         EXSM_OPT_CRIT_LIK           
EXSM_PREDICTION_STEP           4                           
EXSM_SEASONALITY               4                           
EXSM_SETMISSING                EXSM_MISS_AUTO              
ODMS_DETAILS                   ODMS_ENABLE                 
ODMS_MISSING_VALUE_TREATMENT   ODMS_MISSING_VALUE_AUTO     
ODMS_SAMPLING                  ODMS_SAMPLING_DISABLE       
PREP_AUTO                      ON                          
 
14 rows selected.
---------------------------

To view the DM$VP model view, run the following statement:

The DM$VP view for time series contains the result of an ESM model. The output has a set of records such as partition, CASE_ID, value, prediction, lower, upper, and so on and ordered by partition and CASE_ID (time).

%script
SELECT CASE_ID, VALUE, PREDICTION, LOWER, UPPER FROM DM$VPESM_SALES_FORECAST_1
ORDER BY CASE_ID;


CASE_ID     VALUE                PREDICTION           LOWER   UPPER  
01-JAN-98     6480684.0000011446   6452375.7547333492                
01-APR-98     5593994.1400007578   5848724.7899219571                
01-JUL-98     6071823.1000010688   6214546.3092128271                
01-OCT-98     5937413.7100012964   5869219.4189072186                
01-JAN-99      6093747.209999715    6132016.410793812                
01-APR-99     4925471.6299999086   5385954.0785653945                
01-JUL-99     5827050.1500000218   5350240.2540956484                
01-OCT-99     5373678.6700002998   5304626.0456054937                
01-JAN-00     5984889.4899995513   5541123.2442497462                
01-APR-00     5371730.9200002486     5236126.09628068                
01-JUL-00     6121239.2899996703   5955258.7436284116                
01-OCT-00     6287646.9199997969   6089446.4024073323                
01-JAN-01     6547097.4400001625   6837567.1739504253                
01-APR-01     6922468.3900004178   6188944.0536819538                 ...
 
--------------------------------------------------------------------------------------

Examine the statement:

CASE_ID: Specifies the timestamp.
VALUE: Specifies the AMOUNT_SOLD.
PREDICTION: Indicates the predicted value for the model.
LOWER and UPPER: Indicate the confidence bounds.

To view the model diagonistic view, DM$VG, and evaluate the model, run the following query:

The DM$VG view for time series contains the global information of the model along with the estimated smoothing constants, the estimated initial state, and global diagnostic measures.

%sql
SELECT NAME, round(NUMERIC_VALUE,4), STRING_VALUE
  FROM DM$VGESM_SALES_FORECAST_1
  ORDER BY NAME;


NAME                ROUND(NUMERIC_VALUE,4)   STRING_VALUE  
-2 LOG-LIKELIHOOD                   450.7508               
AIC                                 466.7508               
AICC                                487.3223               
ALPHA                                 0.4525               
AMSE                       157764777942.4555               
BETA                                  0.4195               
BIC                                 472.9315               
CONVERGED                                    YES           
GAMMA                                 0.0001               
INITIAL LEVEL                   6110212.8741               
INITIAL SEASON 1                      0.9939               
INITIAL SEASON 2                      1.0231               
INITIAL SEASON 3                      0.9366               
INITIAL SEASON 4                      1.0465               
 
NAME            ROUND(NUMERIC_VALUE,4)   STRING_VALUE  
INITIAL TREND                 55478.0794               
MAE                               0.0424               
MSE                    104400146583.6485               
NUM_ROWS                          918843               
SIGMA                              0.054               
STD                          323110.1153               
 
 
20 rows selected.
 
---------------------------

NAME: Indicates the diagnostic attribute name.
NUMERIC_VALUE: Indicates the calculated statistical value for the model.
STRING_VALUE: Indicates alphanumeric values for the diagnostic parameter.
A few parameters to note for an exponential smoothing algorithm are:
- ALPHA: Indicates the smoothing constant.
- BETA: Indicates the trend smoothing constant.
- GAMMA: Indicates the seasonal smoothing constant.
- MAE: Indicates Mean Absolute Error.
- MSE: Indicates Mean Square Error.

Exponential smoothing assumes that a series extends infinitely into the past, but that influence of past on future, decays smoothly and exponentially fast. The smooth rate of decay is expressed by one or more smoothing constants. The smoothing constants are parameters that the model estimates. These smoothing constants are represented as α, β, and γ. Values of a smoothing constant near one put almost all weight on the most recent observations. Values of a smoothing constant near zero allow the distant past observations to have a large influence.

Note that α is associated with the error or noise of the series, β is associated with the trend, and γ is associated with the seasonality factors. The γ value is closest to zero which means seasonality has an influence on the data set.

The MAE and MSE values are low which means that the model is good. The MSE magnitude depends on the actual scale of your original data. In this case, the STD is around 10⁵. The square of it is roughly in the scale of 10¹⁰. The error percentage is low and hence, the model is good.

Score

You are ready to forecast sales for the next four quarters.

For a time series model, you can use the DM$VP view to perform scoring or prediction.

Query the DM$VP model detail view to see the forecast (sales for four quarters). Run the following statement:

%sql
SELECT TO_CHAR(CASE_ID,'YYYY-MON') DATE_ID,
       round(VALUE,2) ACTUAL_SOLD,
       round(PREDICTION,2) FORECAST_SOLD,
       round(LOWER,2) LOWER_BOUND, round(UPPER,2) UPPER_BOUND
  FROM DM$VPESM_SALES_FORECAST_1
  ORDER BY CASE_ID;

In this step, the prediction shows amount sold along with the case_id. The predictions display upper and lower confidence bounds showing that the estimates can vary between those values.

Examine the statement:

TO_CHAR(CASE_ID,'YYYY-MON') DATE_ID: The DATE_ID column has timestamp or case_id extracted in year-month (yyyy-mon) format.
round(VALUE,2) ACTUAL_SOLD: Specifies the AMOUNT_SOLD value as ACTUAL_SOLD rounded to two numericals after the decimal.
round(PREDICTION,2) FORECAST_SOLD: Specifies the predicted value as FORECAST_SOLD rounded to two numericals after the decimal.
round(LOWER,2) LOWER_BOUND, round(UPPER,2) UPPER_BOUND: Specifies the lower and upper confidence levels rounded to two numericals after the decimal.


DATE_ID    ACTUAL_SOLD   FORECAST_SOLD   LOWER_BOUND   UPPER_BOUND   
1998-JAN         6480684      6452375.75                             
1998-APR      5593994.14      5848724.79                             
1998-JUL       6071823.1      6214546.31                             
1998-OCT      5937413.71      5869219.42                             
1999-JAN      6093747.21      6132016.41                             
1999-APR      4925471.63      5385954.08                             
1999-JUL      5827050.15      5350240.25                             
1999-OCT      5373678.67      5304626.05                             
2000-JAN      5984889.49      5541123.24                             
2000-APR      5371730.92       5236126.1                             
2000-JUL      6121239.29      5955258.74                             
2000-OCT      6287646.92       6089446.4                             
2001-JAN      6547097.44      6837567.17                             
2001-APR      6922468.39      6188944.05                             

DATE_ID    ACTUAL_SOLD   FORECAST_SOLD   LOWER_BOUND   UPPER_BOUND   
2001-JUL      7195998.63      7663836.77                             
2001-OCT      7470897.52      7573926.96                             
2002-JAN                      8232820.51    7360847.49    9104793.54 
2002-APR                      7642694.94    6584565.24    8700824.63 
2002-JUL                      8648402.54    7019914.28   10276890.81 
2002-OCT                      8692842.46    6523676.33    10862008.6 


20 rows selected. 


---------------------------

To see a visual representation of the predictions in OML Notebooks, run the above same query with the following settings:
Click settings and drag DATE_ID to keys and FORECASTED_SOLD (avg), ACTUAL_SOLD (avge), LOWER_BOUND (avg), and UPPER_BOUND(avg) to values.
```
%sql
SELECT TO_CHAR(CASE_ID,'YYYY-MON') DATE_ID, VALUE ACTUAL_SOLD,
       round(PREDICTION,2) FORECAST_SOLD,
       round(LOWER,2) LOWER_BOUND, round(UPPER,2) UPPER_BOUND
  FROM DM$VPESM_SALES_FORECAST_1
  ORDER BY CASE_ID;
```
Description of the illustration timeseries_graph.png

This completes the prediction step. The model has successfully forecast sales for the next four quarters. This helps in tracking the sales and also gives us an idea on stocking our products.