Time Series Use Case Scenario
You work in an electronic store, and sales of laptops and tablets have increased over the last two quarters. You want to forecast your product sales for the next four quarters using historical timestamped data. You forecast sales using the Exponential Smoothing algorithm, predicting changes over evenly spaced intervals of time using historical data.
Related Content
Topic | Link |
---|---|
OML4SQL GitHub Example | Time Series - Exponential Smoothing |
CREATE_MODEL2 Procedure
|
CREATE_MODEL2 Procedure |
Generic Model Settings | DBMS_DATA_MINING - Model Settings |
Exponential Smoothing Model (ESM) Settings | DBMS_DATA_MINING — Algorithm Settings:Exponential Smoothing |
Data Dictionary Settings | Oracle Machine Learning Data Dictionary Views |
Exponential Smoothing Model - Model Detail Views | Model Detail Views for Exponential Smoothing |
About Time Series | About Time Series |
Before you start your OML4SQL use case journey, ensure that you have the following:
- Data Set
The data set used for this use case is from the SH schema. The SH schema can be readily accessed in Oracle Autonomous Database. For on-premises databases, the schema is installed during the installation or can be manually installed by downloading the scripts. See Installing the Sample Schemas.
You will use the
SALES
table from theSH
schema. You can access the table by running theSELECT
statements in OML Notebooks. - Database
Select or create database out of the following options:
- Get your FREE cloud account. Go to https://cloud.oracle.com/database and select Oracle Database Cloud Service (DBCS), or Oracle Autonomous Database. Create an account and create an instance. See Autonomous Database Quick Start Workshop.
- Download the latest version of Oracle Database (on premises).
- Machine Learning Tools
Depending on your database selection,
- Use OML Notebooks for Oracle Autonomous Database.
- Install and use Oracle SQL Developer connected to an on-premises database or DBCS. See Installing and Getting Started with SQL Developer.
- Other Requirements
Data Mining Privileges (this is automatically set for ADW). See System Privileges for Oracle Machine Learning for SQL.
Related Topics
Load Data
Access the data set from the SH
schema and explore the data to understand the attributes.
Remember:
The data set used for this use case is from the SH schema. The SH schema can be readily accessed in Oracle Autonomous Database. For on-premises databases, the schema is installed during the installation or can be manually installed by downloading the scripts. See Installing the Sample Schemas.
- Access the data.
- Examine the various attributes or columns of the data set.
- Assess data quality (by exploring the data).
Access Data
You will use SALES
table data from the SH
schema.
Examine Data
The following table displays information about the attributes from SALES
:
Attribute Name | Information |
---|---|
PROD_ID |
The ID of the product |
CUST_ID |
The ID of the customer |
TIME_ID |
The timestamp of the purchase of the product in yyy-mm-dd hh:mm:ss format |
CHANNEL_ID |
The channel ID of the channel sales data |
PROMO_ID |
The product promotion ID |
QUANTITY_SOLD |
The number of items sold |
AMOUNT_SOLD |
The amount or sales data |
Identify Target Variable
In this use case, the task is to train a model that predicts the amount sold. Therefore, the target variable is the attribute AMOUNT_SOLD
.
Explore Data
Explore the data to understand and assess the quality of the data. At this stage assess the data to identify data types and noise in the data. Look for missing values and numeric outlier values.
SH
schema as described.
Note:
Each record in the database is called a case and each case is identified by acase_id
. In this use case TIME_ID
is the case_id
as it is an independent variable and you are forecasting the sales for evenly spaced time.
The following steps help you with exploratory analysis of the data.
This completes the data understanding and data exploration stage. Time series data can contain missing values. The setting EXSM_SETMISSING
can be used to specify how to handle missing values. The special value EXSM_MISS_AUTO
indicates that, if the series contains missing values it is to be treated as an irregular time series. The Automatic Data Preparation (ADP) setting does not impact this data for time series. See How ADP Transforms the Data to understand how ADP prepares the data for some algorithms.
Build Model
To build a model using the time series data, you will use Exponential Smoothing algorithm on the ESM_SH_DATA
view that is generated during the exploratory stage.
Exponential smoothing is a forecasting method for time series data. It is a moving average method where exponentially decreasing weights are assigned to past observations. Components of Exponential Smoothing Model (ESM) such as trend and seasonality extensions, can have an additive or multiplicative form. For additive forms, the amplitude of the variation is independent of the level, whereas for multiplicative forms, the variation is connected to the level. The simpler additive models assume that error or noise, trend, and seasonality are linear effects within the recursive formulation.
To build a model using a supervised learning algorithm you may use a subset of the data into training and test data. Time series models usually use historical data to predict the future. This is different from model validation for classification and regression, which normally involves splitting data randomly into training and test sets. In this use case, there is no need to split the data set because the model is always predicting the current value based on information from the past. This means that although it seems that you train and test on the same data set, but when the model is applied, the forecast is always based on the previous date. In this use case, you will use the ESM_SH_DATA
view.
Evaluate
Evaluate your model by viewing diagnostic metrics and performing quality checks.
Sometimes querying dictionary views and model detail views is sufficient to measure your model's performance. However, you can evaluate your model by computing test metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), confusion matrix, lift statistics, cost matrix, and so on. For Association Rules, you can inspect various rules to see if they reveal new insights for item dependencies (antecedent itemset implying consequent) or for unexpected relationships among items.
Dictionary and Model Views
To obtain information about the model and view model settings, you can query data dictionary views and model detail views. Specific views in model detail views display model statistics which can help you evaluate the model.
The data dictionary views for Oracle Machine Learning are listed in the following table. A database administrator (DBA) and USER versions of the views are also available.
View Name | Description |
---|---|
ALL_MINING_MODELS | Provides information about all accessible machine learning models |
ALL_MINING_MODEL_ATTRIBUTES | Provides information about the attributes of all accessible machine learning models |
ALL_MINING_MODEL_SETTINGS | Provides information about the configuration settings for all accessible machine learning models |
ALL_MINING_MODEL_VIEWS | Provides information about the model views for all accessible machine learning models |
ALL_MINING_MODEL_XFORMS | Provides the user-specified transformations embedded in all accessible machine learning models. |
Model detail views are specific to the algorithm. You can obtain more insights about the model you created by viewing the model detail views. The names of model detail views begin with DM$xx where xx corresponds to the view prefix. See Model Detail Views.
Exponential smoothing assumes that a series extends infinitely into the past, but that influence of past on future, decays smoothly and exponentially fast. The smooth rate of decay is expressed by one or more smoothing constants. The smoothing constants are parameters that the model estimates. These smoothing constants are represented as α, β, and γ. Values of a smoothing constant near one put almost all weight on the most recent observations. Values of a smoothing constant near zero allow the distant past observations to have a large influence.
Note that α is associated with the error or noise of the series, β is associated with the trend, and γ is associated with the seasonality factors. The γ value is closest to zero which means seasonality has an influence on the data set.
The MAE and MSE values are low which means that the model is good. The MSE magnitude depends on the actual scale of your original data. In this case, the STD is around 105. The square of it is roughly in the scale of 1010. The error percentage is low and hence, the model is good.
Score
You are ready to forecast sales for the next four quarters.
For a time series model, you can use the DM$VP
view to perform scoring or prediction.
This completes the prediction step. The model has successfully forecast sales for the next four quarters. This helps in tracking the sales and also gives us an idea on stocking our products.