Skip Headers
Oracle® Retail Data Model Reference
Release 11.3.2

Part Number E20361-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

10 Data Mining Models in Oracle Retail Data Model

This chapter provides reference information about the data mining models in Oracle Retail Data Model.

This chapter includes the following sections:

Note:

For instructions on setting up and loading the data mining source, and executing the data mining models, see Oracle Retail Data Model Implementation and Operations Guide.

About Data Mining in Oracle Retail Data Model

Oracle Retail Data Model includes data mining packages. The data mining portion of Oracle Retail Data Model consists of source tables that are populated by detail data for use by the data mining packages. This data is organized to be compatible with the data mining modules so they can properly analyze and mine the data. Data mining packages pull in the source data and feed it into the data mining packages, and populate the target tables with the results. The data in the target tables can be presented in Oracle Business Intelligence Suite Enterprise Edition reports.

Tip:

Changed or new data models are not supported by Oracle Retail Data Model. Consequently, do not change the data models that are defined and delivered with Oracle Retail Data Model, but, instead, copy a delivered data model to create a new one.

Oracle Retail Data Model creates mining models using the following Oracle Data Mining algorithms:

For more information about these algorithms, see Oracle Data Mining Concepts.

Each package (analysis) builds models using one or two of these algorithms. The models built depend on the analysis being performed. The output of the model build is a view containing rules generated by the model.

As shown in Table 10-1, the Oracle Retail Data Model mining models use the specified algorithms for the specific problem.

Table 10-1 Oracle Retail Data Model Mining Algorithm Types Used by Model

Model Algorithms Used by Data Mining Model

Model 1: Employee Basket Analysis

Classification and Regression: Support Vector Machine (SVM) and Decision Tree (DT)

Model 2: Employee Sales Analysis

Classification and Regression: SVM and DT

Model 3: Customer Loyalty Analysis

Classification: SVM and DT

Model 4: Store Loss Analysis

Classification and Regression: SVM and DT

Model 5: Item POS Loss Analysis

Classification and Regression: SVM and DT

Model 6: Product Category Mix Analysis

Association Rules: Apriori

Model 7: Product Price Elasticity Analysis

Regression: SVM

Model 8: Employee Combination Analysis

Classification: SVM and DT

Model 9: Customer Segmentation Analysis

Clustering: K-Means

Model 10: Customer Life Time Value Analysis

Regression: SVM

Model 11: Customer Churn Analysis

Classification: SVM and DT

Model 12: Customer Sentiment Analysis

Classification: SVM


Mining Model Overview

For each data mining model, there is:

  • A Mining model procedure in the mining package: Given source data, to generate mined rules, predict results, and so on.

  • Mining model source MVs (Materialized Views in the schema ordm_sys)

  • Mining model Target tables (tables in schema ordm_sys)

  • Mining model Support tables (Algorithm settings tables)

Figure X shows the mining model components:

Figure 10-1 Mining Model Components

Description of Figure 10-1 follows
Description of "Figure 10-1 Mining Model Components"

Steps to Build Mining Models

Use the following steps to build the mining models:

  1. Create mining source and apply MVs over ordm_sys base, reference, derived and lookup tables. Each source MV will have data till last month. Mining MVs will have data for current month.

  2. Create the mining package.

  3. Call pkg_ordm_mining.refresh_mining_source procedure to refresh all mining source and apply MVs.

  4. Call pkg_ordm_mining.refresh_model to build all the mining models.

Using the Mining Model Refresh Procedure

Over time, the customer information and the customer behavior may change. Therefore, you may want to refresh the trained mining models based on the latest customer and usage data. The mining model refresh process is generally divided into three tasks:

  1. Data Preparation: Load and transform the data into a format that the mining algorithms can understand. Also a customer must prepare two sets of data corresponding to next two tasks:

    • Training Data

    • Scoring data

  2. Training: Based on part of customer data, user can run certain algorithms and then a mining model is generated.

  3. Scoring (applying): The trained model can be applied onto other customer data. This applies the model to do the prediction or other missions the model is designed to perform.

For more information about the Oracle Mining training and Scoring (applying) process, see Oracle Data Mining Concepts.

To refresh all six mining models based on latest customer data, call the procedure named pkg_ordm_mining.refresh_model. This procedure performs the following tasks for each model:

  1. Refreshes the mining source materialized views based on latest data from ordm_sys.

  2. Trains each mode again using the new training data.

  3. Applies each model onto the new apply data set.

The errors that occur during mining model refresh are saved into the table named: DWC_INTRA_ETL_ACTIVITY as is other standard Oracle Retail Data Model Intra-ETL package errors and information.

Data Mining Target Tables

Table 10-2 lists the Oracle Retail Data Model data mining target tables.

Table 10-2 Oracle Retail Data Model Data Mining Target Tables

Table Name Associated Data Mining Model

DWD_CLASS_PRC_ELSTY_SVM_FACTOR

Model 7: Product Price Elasticity Analysis

DWD_CUST_CHRN_DT_RULES

Model 11: Customer Churn Analysis

DWD_CUST_CHRN_SVM_FACTOR

Model 11: Customer Churn Analysis

DWD_CUST_LTV_DT_RULES

Model 10: Customer Life Time Value Analysis

DWD_CUST_LTV_SVM_FACTOR

Model 10: Customer Life Time Value Analysis

DWD_CUST_LYLTY_RULES

Model 3: Customer Loyalty Analysis

DWD_CUST_LYLTY_SVM_FACTOR

Model 3: Customer Loyalty Analysis

DWD_CUST_MNNG

Model 12: Customer Sentiment Analysis

DWD_EMP_BSKT_RULES

Model 1: Employee Basket Analysis

DWD_EMP_BSKT_SVM_FACTOR

Model 1: Employee Basket Analysis

DWD_EMP_CMBNTN_DT_RULES

Model 8: Employee Combination Analysis

DWD_EMP_CMBNTN_MNNG

Model 8: Employee Combination Analysis

DWD_EMP_SLS_RULES

Model 2: Employee Sales Analysis

DWD_EMP_SLS_SVM_FACTOR

Model 2: Employee Sales Analysis

DWD_ITEM_POS_LOSS_RULES

Model 5: Item POS Loss Analysis

DWD_ITEM_POS_LOSS_SVM_FACTOR

Model 5: Item POS Loss Analysis

DWD_PROD_DEPTMIX_ASSOC_RULES

Model 6: Product Category Mix Analysis

DWD_SKU_PRC_ELSTY_SVM_FACTOR

Model 7: Product Price Elasticity Analysis

DWD_STORE_LOSS_RULES

Model 4: Store Loss Analysis

DWD_STORE_LOSS_SVM_FACTOR

Model 4: Store Loss Analysis

DWR_CUST_SGMNT

Model 9: Customer Segmentation Analysis

DWR_CUST_SGMNT_DTL

Model 9: Customer Segmentation Analysis


Oracle Retail Data Model Data Mining Models

Each description of the Oracle Retail Data Model data mining models provides the following information:

Model 1: Employee Basket Analysis

The business problem is to build a profile of employees to explain their basket KPIs, such as Total baskets, Average Basket Value, and other statistics and predict KPIs of employees.

The KPIs are converted into categorical variables using standard database binning operations for Decision Tree algorithm. The categorical variables are modeled as a classification model to identify or predict the impact of various independent variables (attributes) on the dependent target variable (KPI - categorical). Using Oracle Data Mining, the binned KPIs are modeled using classification algorithm Decision Tree (DT) and the actual KPIs are modeled using classification algorithm Support Vector Machines (SVM).

This analysis identifies which key attributes of an employee influence his or her number of baskets sold, average basket value, and basket size. This model mines the various attributes of employees. The model takes the binned variables one at a time for the Total Basket Count, Average Basket Value, and Average Basket Size as the target variable of a Decision Tree (DT) with a single feature and discovers rules described in terms of employee attributes. The SVM model takes one of three KPIs (Total Basket Count, Average Basket Value, and Average Basket Size) at a time as a target variable.

The output from the model is two fold:

  1. The discovered rules provide correlation between the basket KPIs and employee attributes.

  2. The prediction can be made on new employees data using the model built on past data.

Employee Basket Analysis Target Variables

The rules are designed to be generated monthly. Therefore, three SVM and nine DT models are created every month across all the employees using the following variables as targets:

Target variables for Decision Tree (DT) are:

  1. Total Basket Count Quartile (TBCQR)

  2. Total Basket Count Quintile (TBCQN)

  3. Total Basket Count Decile (TBCDE)

  4. Average Basket Value Quartile (ABVQR)

  5. Average Basket Value Quintile (ABVQN)

  6. Average Basket Value Decile (ABVDE)

  7. Average Basket Size Quartile (ABSQR)

  8. Average Basket Size Quintile (ABSQN)

  9. Average Basket Size Decile (ABSDE)

Note: Employees are grouped into N-Tiles according to their sales performance figures.

Target variables for Support Vector Machines (SVM) are:

  1. Total Basket Count (TBC)

  2. Average Basket Value (ABV)

  3. Average Basket Size (ABS)

Employee Basket Analysis Source Table

The attributes shown in Table 10-3 for employees are identified from the Oracle Retail Data Model data warehouse tables as source variables for the models (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-3 Employee Basket Analysis Source Attributes

Attribute Name

Case Id Alt (PK)

Month Code

Employee Id

Designation Name

Designation Title

Designation Level

Nationality

Gender

Marital Status

Age

Net Income

Demographics Code

Title

Total Months of Job

Employee Type

Correspondence Language

Disability Indicator

Rehire Recommendation Indicator

HR Based Salary Eligibility Indicator

Overtime Hours Salary Eligibility Indicator

Commission Eligibility Indicator

Spiff Allowed Flag

Total Hours Worked

Total Overtime Hours


Employee Basket Analysis Target Tables

The DT rules are stored in target table, dwd_emp_bskt_rules.

The SVM factor details of source attributes are stored in target table, dwd_emp_bskt_svm_factor.

Employee Basket Analysis Example of Desired DT Rules

Desired Rules Examples:

  1. IF SALARY ELIGIBILITY IS (N) AND EMPLOYEE_TYPE IS (TEMPORARY)

    THEN

    NUMBER OF BASKETS IS THE HIGHEST

  2. IF SALARY ELIGIBLITY IS (N) AND EMPLOYEE_TYPE IS (TEMPORARY)

    THEN

    NUMBER OF BASKETS IS THE LOWEST

Employee Basket Analysis Attribute Ranking with SVM Factors

Table 10-4 shows sample Employee Basket Analysis Attribute Ranking with SVM Factors.

Table 10-4 Employee Basket Analysis Attribute Ranking with SVM Factors

Attribute Name Attribute Value Coefficient Rank

DESIGNATION_NAME

TEMPORARY CASHIER

0.89

1

COMMISSION_ELIGIBILITY_IND

N

0.45

2

HR_BASED_SALARY_ELGBLTY_IND

Y

0.37

3

SPIFF_ALLOWED_FLAG

N

0.21

4

DESIGNATION_NAME

JUNIOR SUPERVISOR

0.19

5


Employee Basket Analysis Employee KPIs Prediction using DT

Table 10-5 shows sample Employee Basket Analysis Employee KPIs Prediction using DT.

Table 10-5 Employee Basket Analysis Employee KPIs Prediction using DT Sample 1

Employee ID Employee Name Target Variable DT Prediction DT Probability

10001

Chloe Waite

AVG_BASKET_SIZE_DECILE

7

0.65

10002

Delora Walker

AVG_BASKET_SIZE_DECILE

4

0.87

10003

Max Gerber

AVG_BASKET_SIZE_DECILE

8

0.94

10004

Glen Christian

AVG_BASKET_SIZE_DECILE

3

0.82

10005

Mason Murray

AVG_BASKET_SIZE_DECILE

10

0.96


Table 10-6 shows sample Employee Basket Analysis Employee KPIs Prediction using DT.

Table 10-6 Employee Basket Analysis Employee KPIs Prediction using DT Sample 2

Employee ID Employee Name Target Variable DT Prediction DT Probability

10001

Chloe Waite

TOTAL_BASKET_COUNT_QUINTILE

5

0.65

10002

Delora Walker

TOTAL_BASKET_COUNT_QUINTILE

3

0.87

10003

Max Gerber

TOTAL_BASKET_COUNT_QUINTILE

1

0.94

10004

Glen Christian

TOTAL_BASKET_COUNT_QUINTILE

3

0.82

10005

Mason Murray

TOTAL_BASKET_COUNT_QUINTILE

4

0.96


Employee Basket Analysis Employee KPIs Prediction using SVM

Table 10-7 shows sample Employee Basket Analysis Employee KPIs Prediction using SVM.

Table 10-7 Employee Basket Analysis Employee KPIs Prediction using SVM Sample 1

Employee ID Employee Name Target Variable SVM Prediction

10001

Chloe Waite

AVG_BASKET_SIZE

35

10002

Delora Walker

AVG_BASKET_SIZE

45

10003

Max Gerber

AVG_BASKET_SIZE

60

10004

Glen Christian

AVG_BASKET_SIZE

30

10005

Mason Murray

AVG_BASKET_SIZE

50


Table 10-8 shows sample Employee Basket Analysis Employee KPIs Prediction using SVM.

Table 10-8 Employee Basket Analysis Employee KPIs Prediction using SVM Sample 2

Employee ID Employee Name Target Variable SVM Prediction

10001

Chloe Waite

TOTAL_BASKET_COUNT

125

10002

Delora Walker

TOTAL_BASKET_COUNT

110

10003

Max Gerber

TOTAL_BASKET_COUNT

95

10004

Glen Christian

TOTAL_BASKET_COUNT

115

10005

Mason Murray

TOTAL_BASKET_COUNT

100


Model 2: Employee Sales Analysis

The business problem is to build a profile of employees to explain their sales, cost, and profit KPIs, such as Sales Amount, Cost Amount, Profit Amount, and other statistics and predict employee KPIs.

The KPIs are converted into categorical variables using standard database binning operations for Decision Tree algorithm. The categorical variables are modeled as a classification model to identify or predict the impact of various independent variables (attributes) on the dependent target variable (KPI - categorical). Using Oracle Data Mining, the binned KPIs are modeled using classification Algorithm Decision Tree (DT) and the actual KPIs are modeled using classification algorithm Support Vector Machines (SVM).

This analysis identifies which key attributes of an employee influence sales amount, cost amount, and profit amount. This model mines the various attributes of employees. The model takes the binned variables one at a time for the Sales Amount, Cost Amount, and Profit Amount as the target variable of a Decision Tree (DT) model with a single feature and discovers rules described in terms of employee attributes. The SVM model takes one of three KPIs (Sales Amount, Cost Amount, and Profit Amount) at a time as a target variable.

The output from the model is twofold:

  1. The discovered rules provide correlation between the Sales, Cost, and Profit KPIs and employee attributes.

  2. A prediction can be made on new employees' data using the model built on past data.

Employee Sales Analysis Target Variables

The rules are designed to be generated monthly. Therefore, three SVM and nine DT models are created every month across all the employees using the following variables as targets:

Target variables for Decision Tree (DT) are:

  1. Sales Amount Quartile (SAQR)

  2. Sales Amount Quintile (SAQN)

  3. Sales Amount Decile (SADE)

  4. Cost Amount Quartile (CAQR)

  5. Cost Amount Quintile (CAQN)

  6. Cost Amount Decile (CADE)

  7. Profit Amount Quartile (PAQR)

  8. Profit Amount Quintile (PAQN)

  9. Profit Amount Decile (PADE)

Target variables for Support Vector Machines (SVM) are:

  1. Sales Amount (SA)

  2. Cost Amount (CA)

  3. Profit Amount (PA)

Employee Sales Analysis Source Table

Table 10-9 shows the attributes for employees that are identified from the Oracle Retail Data Model data warehouse tables as source variables for the models (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-9 Employee Sales Analysis Source Table

Attribute Name

Case Id Alt (PK)

Month Code

Employee Id

Designation Name

Designation Title

Designation Level

Nationality

Gender

Marital Status

Age

Net Income

Demographics Code

Title

Total Months of Job

Employee Type

Correspondence Language

Disability Indicator

Rehire Recommendation Indicator

HR Based Salary Eligibility Indicator

Overtime Hours Salary Eligibility Indicator

Commission Eligibility Indicator

Spiff Allowed Flag

Total Hours Worked

Total Overtime Hours


Employee Sales Analysis Target Tables

The DT rules are stored in target table, dwd_emp_sls_rules.

The SVM factor details of source attributes are stored in target table, dwd_emp_sls_svm_factor.

Employee Sales Analysis Example of Desired DT Rules

Desired Rules Examples:

  1. IF EMPLOYEE IS NOT ELIGIBLE FOR SPIFF AND EMPLOYEE IS ELIGIBLE FOR SALARY

    AND EMPLOYEE IS NOT ELIGIBLE FOR COMMISSION

    THEN

    EMPLOYEE PROFIT IS THE LOWEST

  2. IF EMPLOYEE IS ELIGIBLE FOR SPIFF AND EMPLOYEE IS NOT ELIGIBLE FOR SALARY

    AND EMPLOYEE IS NOT ELIGIBLE FOR COMMISSION

    THEN

    EMPLOYEE PROFIT IS THE HIGHEST

Employee Sales Analysis Attribute Ranking with SVM Factors

Table 10-10 shows sample Employee Sales Analysis Attribute Ranking with SVM Factors.

Table 10-10 Employee Sales Analysis Attribute Ranking with SVM Factors

Attribute Name Attribute Value Coefficient Rank

DESIGNATION_NAME

TEMPORARY CASHIER

0.89

1

COMMISSION_ELIGIBILITY_IND

N

0.45

2

HR_BASED_SALARY_ELGBLTY_IND

Y

0.37

3

SPIFF_ALLOWED_FLAG

N

0.21

4

DESIGNATION_NAME

JUNIOR SUPERVISOR

0.19

5


Employee Sales Analysis Employee KPIs Prediction using DT

Table 10-11 shows sample Employee Sales Analysis Employee KPIs Prediction using DT.

Table 10-11 Employee Sales Analysis Employee KPIs Prediction using DT Sample 1

Employee ID Employee Name Target Variable DT Prediction DT Probability

10001

Chloe Waite

SALES_AMOUNT_DECILE

7

0.65

10002

Delora Walker

SALES_AMOUNT_DECILE

4

0.87

10003

Max Gerber

SALES_AMOUNT_DECILE

8

0.94

10004

Glen Christian

SALES_AMOUNT_DECILE

3

0.82

10005

Mason Murray

SALES_AMOUNT_DECILE

10

0.96


Table 10-12 shows sample Employee Sales Analysis Employee KPIs Prediction using DT.

Table 10-12 Employee Sales Analysis Employee KPIs Prediction using DT Sample 2

Employee ID Employee Name Target Variable DT Prediction DT Probability

10001

Chloe Waite

COST_AMOUNT_QUINTILE

5

0.65

10002

Delora Walker

COST_AMOUNT_QUINTILE

3

0.87

10003

Max Gerber

COST_AMOUNT_QUINTILE

1

0.94

10004

Glen Christian

COST_AMOUNT_QUINTILE

3

0.82

10005

Mason Murray

COST_AMOUNT_QUINTILE

4

0.96


Employee Sales Analysis Employee KPIs Prediction using SVM

Table 10-13 shows sample Employee Sales Analysis Employee KPIs Prediction using SVM.

Table 10-13 Employee Sales Analysis Employee KPIs Prediction using SVM Sample 1

Employee ID Employee Name Target Variable SVM Prediction

10001

Chloe Waite

SALES_AMOUNT

3500

10002

Delora Walker

SALES_AMOUNT

4500

10003

Max Gerber

SALES_AMOUNT

6000

10004

Glen Christian

SALES_AMOUNT

3000

10005

Mason Murray

SALES_AMOUNT

5000


Table 10-14 shows sample Employee Sales Analysis Employee KPIs Prediction using SVM.

Table 10-14 Employee Sales Analysis Employee KPIs Prediction using SVM Sample 2

Employee ID Employee Name Target Variable SVM Prediction

10001

Chloe Waite

COST_AMOUNT

1250

10002

Delora Walker

COST_AMOUNT

1100

10003

Max Gerber

COST_AMOUNT

950

10004

Glen Christian

COST_AMOUNT

1150

10005

Mason Murray

COST_AMOUNT

1000


Model 3: Customer Loyalty Analysis

The business problem is to build a profile of customers to explain impact of customers' characteristics on their loyalty to a store. Using Oracle Data Mining, the KPIs are modeled using two popular Classification Algorithms - Decision Tree (DT) and Support Vector Machines (SVM). This analysis identifies which key attributes of a customer influence his loyalty to a store. This model mines the various attributes of customers.

The output from the model is twofold:

  1. The discovered rules provide correlation between the customer loyalty to a store and customer attributes.

  2. A prediction can be made on new customers' data using the model built on historical data.

Customer Loyalty Analysis Target Variable

The rules are designed to be generated monthly. Therefore, one SVM and one DT models are created every month across all customers using the following variables as targets:

Target variable for Decision Tree (DT) is:

Customer Loyalty Code

Target variable for Support Vector Machines (SVM) is:

Customer Loyalty Code

Customer Loyalty Analysis Source Table

Table 10-15 shows the attributes for customers that are identified from the Oracle Retail Data Model data warehouse tables as source variables for the models (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-15 Customer Loyalty Analysis Source Table

Attribute Name Attribute Description

Case Id Alt (PK)

 

Month Code

 

Customer Number

 

Customer Type

 

Frequent Shopper Indicator

 

Mail Allowed Indicator

 

Customer District

 

Customer City

 

Customer State

 

Customer Country

 

Primary Occasion

 

Primary Preference

 

Primary Status Code

 

Primary Status Reason Code

 

Primary Effective Date

 

No. Of Survival months

 

Primary Effective Date

 

Living at Current Address Since

 

INDIVIDUAL CUSTOMER ATTRIBUTES

 

Registered as Gift Receiver

 

Registered as Gift Giver

 

Customer Identity Required Indicator

 

Customer Identity Type Name

 

Age

 

Marital Status

 

Gender

 

Income

 

Race

 

Education

 

Profession

 

Household Size

 

Dwelling Size

 

Years of Residence

 

Demography Group Name

 

Ethnic Background

 

Income Group

 

ORGANIZATION CUSTOMER ATTRIBUTES

 

Organization Type

 

Year of Establishment

 

Total Employee Strength

 

SIC Code

 

Industry Code

 

Pubic Indicator

 

OTHER ATTRIBUTES

 

Customer Occasion Type This Month

 

Campaign This Month

 

Membership Account Type Code

None if the customer does not have any account; the last used account if the customer has multiple accounts

Life-To-Date Points

 

Available Points

 

Customer Account Type

None if the customer does not have any account; the last used account if the customer has multiple accounts

Customer Group Code

None if the customer does not belong to any group

Number of Return items

 

Percentage of Return items

 

Total Revenue

 

Total Orders Placed

 

Total Orders Cancelled

 

Revenue this month

 

Orders this month

 

Channel Type Code

 

Customer Loyalty Analysis Target Tables

The DT rules are stored in target table, dwd_cust_lylty_rules.

The SVM factor details of source attributes are stored in target table, dwd_cust_lylty_svm_factor.

Customer Loyalty Analysis Example of Desired DT Rules

Desired Rules Examples:

  1. IF YEARS OF RESIDENCE IS (8 - 10) AND HOUSEHOLD_SIZE IS (3+)

    THEN

    CUSTOMER IS GROUP A

  2. IF YEARS OF RESIDENCE IS (1 - 3) AND HOUSEHOLD_SIZE IS LESS THAN 3

    THEN

    CUSTOMER IS GROUP E

Customer Loyalty Analysis Attribute Ranking with SVM Factors

Table 10-16 shows sample Customer Loyalty Analysis Attribute Ranking with SVM Factors.

Table 10-16 Customer Loyalty Analysis Attribute Ranking with SVM Factors

Attribute Name Attribute Value Coefficient Rank

HOUSEHOLD_SIZE

3

1.5

1

MARITAL_STATUS

SINGLE

0.9

2

MARITAL_STATUS

MARRIES

0.8

3

EDUCATION

12th

0.65

4

EDUCATION

PhD

0.34

5


Customer Loyalty Analysis Customer Loyalty Prediction (By SVM & DT)

Table 10-17 shows sample Customer Loyalty Analysis Customer Loyalty Prediction (By SVM & DT).

Table 10-17 Customer Loyalty Analysis Customer Loyalty Prediction (By SVM & DT)

Customer ID Customer Name SVM Prediction SVM Probability DT Prediction DT Probability

20001

Sunil Milenova

MOST LOYAL

0.65

MOST LOYAL

0.65

20002

Buzz Krishnan

PRETTY LOYAL

0.87

PRETTY LOYAL

0.87

20003

Helena Lamar

MARGINALLY LOYAL

0.94

MARGINALLY LOYAL

0.94

20004

Uraih Konur

PRETTY LOYAL

0.82

PRETTY LOYAL

0.82

20005

Bonnibelle Goode

MOST LOYAL

0.96

MOST LOYAL

0.96


Model 4: Store Loss Analysis

The business problem is to build a profile of a store for Shrinkage, Theft, and to predict store KPIs. The KPIs are converted into categorical variables using standard database binning operations for Decision Tree (DT) algorithm. The categorical variables are modeled as a classification model to identify or predict the impact of various independent variables (attributes) on the dependent target variable (KPI - categorical). Using Oracle Data Mining, the binned KPIs are modeled using the classification algorithm Decision Tree (DT) and actual KPIs are modeled using the classification algorithm Support Vector Machines (SVM).

This analysis identifies which key attributes of a store influence Shrinkage and Theft at that store. This model mines the various attributes of store. The model takes the binned variables one at a time for the Total Shrink Count, Total Shrink Amount, Shrink as a percentage, Total Theft Count, Total Theft Amount, Theft as a percentage of Sales as the target variable of a Decision Tree (DT) model with a single feature and discovers rules described in terms of store attributes. The SVM model takes one of six KPIs (Total Shrink Count, Total Shrink Amount, Shrink as a percentage of Sales, Total Theft Count, Total Theft Amount, Theft as a percentage of Sales) at a time as a target variable.

The output from the model is twofold:

  1. The discovered rules provide correlation between the Shrinkage, Theft KPIs and store attributes.

  2. A prediction can be made on new store data using the model built with past data.

Target Variables

The rules are designed to be generated monthly. Therefore the SVM and DT models are created every month across all the employees using the following variables as targets:

Target variables for Decision Tree (DT) are:

  1. Total Shrink Count Quartile (TSCQR)

  2. Total Shrink Count Quintile (TSCQN)

  3. Total Shrink Count Decile (TSCDE)

  4. Total Shrink Amount Quartile (TSAQR)

  5. Total Shrink Amount Quintile (TSAQN)

  6. Total Shrink Amount Decile (TSADE)

  7. Shrink as a percentage of Sales Quartile (STSQR)

  8. Shrink as a percentage of Sales Quintile (STSQN)

  9. Shrink as a percentage of Sales Decile (STSDE)

  10. Total Theft Count Quartile (TTCQR)

  11. Total Theft Count Quintile (TTCQN)

  12. Total Theft Count Decile (TTCDE)

  13. Total Theft Amount Quartile (TTAQR)

  14. Total Theft Amount Quintile (TTAQN)

  15. Total Theft Amount Decile (TTADE)

  16. Theft as a percentage of Sales Quartile (TTSQR)

  17. Theft as a percentage of Sales Quintile (TTSQN)

  18. Theft as a percentage of Sales Decile (TTSDE)

Target variables for Support Vector Machines (SVM) are.

  1. Total Shrink Count (TSC)

  2. Total Shrink Amount (TSA)

  3. Shrink as a percentage of Sales (STS)

  4. Total Theft Count Quartile (TTC)

  5. Total Theft Amount Quintile (TTA)

  6. Theft as a percentage of Sales (TTS)

Store Loss Analysis Source Table

Table 10-18 shows attributes for stores that are identified from the Oracle Retail Data Model data warehouse tables as source variables for the models (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-18 Store Loss Analysis Source Table

Attribute Name

Case Id Alt (PK)

Month Code

Store ID

Store Name

Store Manager Name

Store Usage (Store, Store within a store, Department, Kiosk, and others)

Store Status (Under construction, New, and others)

Total Open Hours

Store Location Type (Free standing, Shopping Center, CBD, SBD, NBD, and others)

Primary Trade Area Code

Trade Area Coverage

Market Area Code

Market Area Type (Urban, Suburban, Rural, and others)

Market Area Population

Pull Factor

Total Commuter Population

Peak Season Population

Tourist Population

Average Drive Time

Number of Households

Average Household Size

Average Family Size

Per Capita Income

Average Number of Vehicles per Household

Shopping Center Type (Strip Center, Mall, and others)

Store Concept (Convenience, General Merchandise, Fashion oriented, and others)

Terrain (Mountain, Inland, Desert)

Total Built-up Area

Total Super Built-up Area

Number of Functional Months

Usable Area

Inventory Area

Selling Area

New Store Indicator

Store Price Index

Number of Levels of Floors

Number of Window Displays

Area of Window Displays

Fitting Rooms Available

Number of External Signs

Rest Rooms Available

Type Of Parking

Distance to Nearest Cross

Distance from Market Area Center

Store County or District

Store City

Store State or Province

State Population

State Sales

Store Country

Store World Region


Store Loss Analysis Target Tables

The DT rules are stored in target table, dwd_store_loss_rules.

The SVM factor details of source attributes are stored in target table, dwd_store_loss_svm_factor.

Store Loss Analysis Examples of Desired Rules

Desired Rules Examples:

  1. IF STORE IS NEW and NUMBER OF WINDOW DISPLAYS IS (4 - 5)

    and STORE DEPARTMENT IS (RETURN)

    THEN

    STORE THEFT AMOUNT IS THE HIGHEST

  2. IF STORE IS NEW and NUMBER OF WINDOW DISPLAYS IS (4 - 5)

    and STORE DEPARTMENT IS (GIFT)

    THEN

    STORE THEFT AMOUNT IS THE LOWEST

Store Loss Analysis Attribute Ranking with SVM Factor

Table 10-19 shows sample Store Loss Analysis Attribute Ranking with SVM Factor.

Table 10-19 Store Loss Analysis Attribute Ranking with SVM Factor

Attribute Name Attribute Value Coefficient Rank

STATE_POPULATION

MN

1.6

1

STATE_POPULATION

VA

1.2

2

STORE_MANAGER_NAME

FRANK KRATKY

0.95

3

STORE_USAGE

PICKUP COUNTER

0.9

4

NEW_STORE_INDICATOR

N

0.55

5


Store Loss Analysis Store KPIs Prediction using DT

Table 10-20 shows sample Store Loss Analysis Store KPIs Prediction using DT.

Table 10-20 Store Loss Analysis Store KPIs Prediction using DT Sample 1

Store ID Store Name Target Variable DT Prediction DT Probability

601

Chloe Waite

TOTAL_SHRINK_COUNT_DECILE

7

0.65

602

Delora Walker

TOTAL_SHRINK_COUNT_DECILE

4

0.87

603

Max Gerber

TOTAL_SHRINK_COUNT_DECILE

8

0.94

604

Glen Christian

TOTAL_SHRINK_COUNT_DECILE

3

0.82

605

Mason Murray

TOTAL_SHRINK_COUNT_DECILE

10

0.96


Table 10-21 shows sample Store Loss Analysis Store KPIs Prediction using DT.

Table 10-21 Store Loss Analysis Store KPIs Prediction using DT Sample 2

Store ID Store Name Target Variable DT Prediction DT Probability

601

STORE_601

TOTAL_THEFT_AMOUNT_QUINTILE

5

0.65

602

STORE_602

TOTAL_THEFT_AMOUNT_QUINTILE

3

0.87

603

STORE_603

TOTAL_THEFT_AMOUNT_QUINTILE

1

0.94

604

STORE_604

TOTAL_THEFT_AMOUNT_QUINTILE

3

0.82

605

STORE_605

TOTAL_THEFT_AMOUNT_QUINTILE

4

0.96


Store Loss Analysis Store KPIs Prediction Using SVM

Table 10-22 shows sample Store Loss Analysis Store KPIs Prediction Using SVM.

Table 10-22 Store Loss Analysis Store KPIs Prediction Using SVM

Store ID Store Name Target Variable SVM Prediction

601

STORE_601

TOTAL_SHRINK_COUNT

35

602

STORE_602

TOTAL_SHRINK_COUNT

45

603

STORE_603

TOTAL_SHRINK_COUNT

60

604

STORE_604

TOTAL_SHRINK_COUNT

30

605

STORE_605

TOTAL_SHRINK_COUNT

50


Table 10-23 shows sample Store Loss Analysis Store KPIs Prediction Using SVM.

Table 10-23 Store Loss Analysis Store KPIs Prediction Using SVM

Store ID Store Name Target Variable SVM Prediction

601

STORE_601

TOTAL_THEFT_AMOUNT

1250

602

STORE_602

TOTAL_THEFT_AMOUNT

1100

603

STORE_603

TOTAL_THEFT_AMOUNT

950

604

STORE_604

TOTAL_THEFT_AMOUNT

1150

605

STORE_605

TOTAL_THEFT_AMOUNT

1000


Model 5: Item POS Loss Analysis

The business problem is to build a profile of an item (product) regarding POS losses and predict item KPIs. The KPIs are converted into categorical variables using standard database binning operations for Decision Tree (DT) algorithm. The categorical variables are modeled as a classification model to identify or predict the impact of various independent variables (attributes) on the dependent target variable (KPI - categorical). Using Oracle Data Mining, the binned KPIs are modeled using the classification algorithm Decision Tree (DT) and the actual KPIs are modeled using the classification algorithm Support Vector Machines (SVM).

This analysis identifies which key attributes of an item influence its Shrink and Theft. This model mines the various attributes of items. It takes the binned variables one at a time for the Total Shrink Count, Total Shrink Amount, Shrink as a percentage of Sales, Total Theft Count, Total Theft Amount, Theft as a percentage of Sales as the target variable of a Decision Tree (DT) model with a single feature and discovers rules described in terms of employee attributes. The SVM model takes one of six KPIs (Total Shrink Count, Total Shrink Amount, Shrink as a percentage of Sales, Total Theft Count, Total Theft Amount, Theft as a percentage of Sales) at a time as a target variable.

The output from the model is twofold:

  1. The discovered rules provide correlation between POS loss and item attributes.

  2. The prediction can be made on new item data after the model is trained.

Item POS Loss Analysis Target Variables

The rules are designed to be generated monthly. Therefore, the SVM and DT models are created every month across all the stores using the following variables as targets:

Target variables for Decision Tree (DT) are:

  1. Total Shrink Count Quartile (TSCQR)

  2. Total Shrink Count Quintile (TSCQN)

  3. Total Shrink Count Decile (TSCDE)

  4. Total Shrink Amount Quartile (TSAQR)

  5. Total Shrink Amount Quintile (TSAQN)

  6. Total Shrink Amount Decile (TSADE)

  7. Shrink as a percentage of Sales Quartile (STSQR)

  8. Shrink as a percentage of Sales Quintile (STSQN)

  9. Shrink as a percentage of Sales Decile (STSDE)

  10. Total Theft Count Quartile (TTCQR)

  11. Total Theft Count Quintile (TTCQN)

  12. Total Theft Count Decile (TTCDE)

  13. Total Theft Amount Quartile (TTAQR)

  14. Total Theft Amount Quintile (TTAQN)

  15. Total Theft Amount Decile (TTADE)

  16. Theft as a percentage of Sales Quartile (TTSQR)

  17. Theft as a percentage of Sales Quintile (TTSQN)

  18. Theft as a percentage of Sales Decile (TTSDE)

Target variables for Support Vector Machines (SVM) are:

  1. Total Shrink Count (TSC)

  2. Total Shrink Amount (TSA)

  3. Shrink as a percentage of Sales (STS)

  4. Total Theft Count Quartile (TTC)

  5. Total Theft Amount Quintile (TTA)

  6. Theft as a percentage of Sales (TTS)

Item POS Loss Analysis Source Table

Table 10-24 shows attributes for POS and Item that are identified from the Oracle Retail Data Model data warehouse tables as source variables for the models (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-24 Item POS Loss Analysis Source Table

Attribute Name

Case Id Alt (PK)

Store Id

Month Code

Item ID

Brand Name

Category Name

Department Name

Customer Pickup Type Code

Discount Indicator

Hazardous Material Type Code

Perishable Indicator

Kit Set Code

Order Collection Code

Price Audit Flag

Sale Weight or Unit Count Code

Security Required Type Code

Sell Unit Landed Cost Amount

Sell Unit Last Received Base Cost Amount

Sell Unit Last Received Net Cost Amount

Item Sale Unit Price Amount

Shrink Flag

Substitute Identified Indicator

Swell Flag

Item Usage Code

Vendor Item Number

Max Shipping Capability

Min Order Quantity

Sale Unit per Packet Unit Count

Shipping Capability Units

Store Order Allowed Flag

Store Receipt Allowed Flag

Style Description

Terms Code

Vendor Number

Vendor Class Code

Buy Status Indicator

Credit Limit Offered

Inform Government Indicator

Vendor Number of Years in Business

Pay Status Indicator

Competitor Retail Item Name

Competitor Name

Competitor Item Local Advertising Flag

Competitor Item On Promotion Flag

Competitor Item Promotion Store Coupon Indicator

Competitor Sale Unit Price Amount

Allow Coupon Multiply Indicator

Allow Food Stamp Indicator

Coupon Restricted Indicator

Electronic Coupon Flag

Employee Discount Allowed Flag

Frequent Shopper Points

Frequent Shopper Points Eligibility Indicator

Give Away Flag

Item Tender Restriction Group Code

Manufacturer

Manufacturer Family Code

Maximum Sale Unit Count

Price Entry Required Flag

Prohibit Repeat Key Flag

Prohibit Return Flag

Selling Status Code

Visual Verify Price Flag

Weight Entry Required Flag

Retail Transaction Measures

Total Number of Retail Transactions For Item

Total Amount of Retail Transactions For Item

Average Amount Per Retail Transaction For Item

Number of Distinct Currency Used For Item

Total Units Sold For Item

Average Units Sold Per Retail Transaction For Item

Total Idle Interval For Item (This is the sum of idle intervals of all transactions that contain this Item)

Average Idle Interval Per Retail Transaction For Item

Total Ring Interval For Item (This is the sum of ring intervals of all transactions that contain this item)

Average Ring Interval Per Retail Transaction For Item

Total Tender Interval For Item (This is the sum of tender intervals of all transactions that contain this item)

Average Tender Interval Per Retail Transaction For Item

Total Lock Interval For Item (This is the sum of lock intervals before or after all transactions that contain this item)

Average Lock Interval Per Retail Transaction For Item

Total Line Items Scanned For Item (This is the total number of times this item is scanned)

Average Line Items Scanned Per Units Sold For Item

Total Line Items Keyed For Item (This is the total number of times this item is keyed)

Average Line Items Keyed Per Units Sold

Total Key Department Count For Item (This is the total number of times this item is keyed by the department)

Average Key Department Count Per Units Sold

Total Service Charge For Item

Average Service Charge Per Retail Transaction For Item

Total Tax Amount For Item

Average Tax Amount Per Retail Transaction For Item

Total Number of Voided Transactions For Item

Average Number of Voided Transactions Per Retail Transaction For Item

Total Amount of Voided Transactions For Item

Average Amount of Voided Transactions Per Retail Transaction For Item

Average Amount of Voided Transaction as Percentage of Total Retail Transaction Amount For Item

Total Number of Discount Line Items For Item

Average Number of Discount Line Items Per Retail Transaction For Item

Total Amount of Discount Line Items For Item

Average Amount of Discount Line Items Per Retail Transaction For Item

Average Amount of Discount Line Items as Percentage of Total Retail Transaction Amount For Item

Total Number of Return Line Items For Item

Average Number of Return Line Items Per Retail Transaction For Item

Total Amount of Return Line Items For Item

Average Amount of Return Line Items Per Retail Transaction For Item

Average Amount of Return Line Items as Percentage of Total Retail Transaction Amount For Item

Total Number of Miscellaneous Fee Line Items For Item

Average Number of Miscellaneous Fee Line Items Per Retail Transaction For Item

Total Amount of Miscellaneous Fee Line Items For Item

Average Amount of Miscellaneous Fee Line Items Per Retail Transaction For Item

Average Amount of Miscellaneous Fee Line Items as Percentage of Total Retail Transaction Amount For Item

Total Number of Promotional Line Items For Item

Average Number of Promotional Line Items Per Retail Transaction For Item

Total Amount of Promotional Line Items For Item

Average Amount of Promotional Line Items Per Retail Transaction For Item

Average Amount of Promotional Line Items as Percentage of Total Retail Transaction Amount For Item

Total Number of Deposit Redemption Line Items For Item

Average Number of Deposit Redemption Line Items Per Retail Transaction For Item

Total Amount of Deposit Redemption Line Items For Item

Average Amount of Deposit Redemption Line Items Per Retail Transaction For Item

Average Amount of Deposit Redemption Line Items as Percentage of Total Retail Transaction Amount For Item

Control Transaction Measures

Total Tax Exempt Transaction Count For Item

Average Tax Exempt Transaction Count Per Retail Transaction For Item

Tax Exempt Total Amount For Item

Average Tax Exempt Amount Per Retail Transaction For Item

Tax Exempt Total Amount as a Percentage of Total Retail Transaction Amount For Item

Total Number of Store Coupons For Item

Average Number of Store Coupons Per Retail Transaction For Item

Average Number of Store Coupons Per Retail Transaction For Item

Total Amount of Store Coupons For Item

Average Amount of Store Coupons Per Retail Transaction For Item

Average Amount of Store Coupons as Percentage of Total Retail Transaction Amount For Item

Total Markdown Count For Item

Average Markdown Count per Retail Transaction For Item

Markdown Total Amount For Item

Average Markdown Amount Per Retail Transaction For Item

Average Markdown Amount as a Percentage of Total Retail Transaction Amount For Item

Total Employee Discount Count For Item

Average Employee Discount Per Retail Transaction For Item

Total Employee Discount Amount For Item

Average Employee Discount Amount Per Retail Transaction For Item

Average Employee Discount Amount as a Percentage of Retail Transaction

Amount For Item

Total Weighed Line Item Count For Item

Average Weighed Line Item Count Per Retail Transaction For Item

Total Weighed Line Item Amount For Item

Average Weighed Line Item Amount Per Retail Transaction For Item

Average Weighed Line Item Amount as a Percentage of Total Retail Transaction

Amount For Item

Total Layaway Payments Collected Count For Item

Average Layaway Payments Collected Count Per Retail Transaction For Item

Total Layaway Payments Collected Amount For Item

Average Layaway Payments Collected Amount Per Retail Transaction For Item

Average Layaway Payments Collected Amount as a Percentage of Total Retail

Transaction Amount For Item

Total Container Deposit Count For Item

Average Container Deposit Count Per Retail Transaction For Item

Total Container Deposit Amount For Item

Average Container Deposit Amount Per Retail Transaction For Item

Average Container Deposit Amount as a Percentage of Total Retail Transaction

Amount For Item

Total Redeemed Container Deposit Count For Item

Average Redeemed Container Deposit Count Per Retail Transaction For Item

Total Redeemed Container Deposit Amount For Item

Average Redeemed Container Deposit Amount Per Retail Transaction For Item

Average Redeemed Container Deposit Amount as a Percentage of Total Retail

Transaction Amount For Item

Total Cash Tender Count For Item

Average Cash Tender Count Per Retail Transaction For Item

Total Cash Tender Amount For Item

Average Cash Tender Amount Per Retail Transaction For Item

Average Cash Tender Amount as a Percentage of Total Retail Transaction Amount

For Item

Total Check Tender Count For Item

Average Check Tender Count Per Retail Transaction For Item

Total Check Tender Amount For Item

Average Check Tender Amount Per Retail Transaction For Item

Average Check Tender Amount as a Percentage of Total Retail Transaction

Amount For Item

Total Credit Card Tender Count For Item

Average Credit Card Tender Count Per Retail Transaction For Item

Total Credit Card Tender Amount For Item

Average Credit Card Tender Amount Per Retail Transaction For Item

Average Credit Card Tender Amount as a Percentage of Total Retail Transaction

Amount For Item

Total Debit Card Tender Count For Item

Average Debit Card Tender Count Per Retail Transaction For Item

Total Debit Card Tender Amount For Item

Average Debit Card Tender Amount Per Retail Transaction For Item

Average Debit Card Tender Amount as a Percentage of Total Retail Transaction

Amount For Item

Total Customer Account Tender Count For Item

Average Customer Account Tender Count Per Retail Transaction For Item

Total Customer Account Tender Amount For Item

Average Customer Account Tender Amount Per Retail Transaction

Average Customer Account Tender Amount as a Percentage of Total Retail

Transaction Amount

Total Gift Certificate Tender Count For Item

Average Gift Certificate Tender Count Per Retail Transaction For Item

Total Gift Certificate Tender Amount For Item

Average Gift Certificate Tender Amount Per Retail Transaction For Item

Amount For Item

Total Coupon Tender Count For Item

Average Coupon Tender Count Per Retail Transaction For Item

Total Coupon Tender Amount For Item

Average Coupon Tender Amount Per Retail Transaction For Item

Average Coupon Tender Amount as a Percentage of Total Retail Transaction

Amount For Item


Item POS Loss Analysis Target Tables

The DT rules are stored in target table, dwd_item_pos_loss_rules.

The SVM factor details of source attributes are stored in target table, dwd_item_pos_loss_svm_factor.

Item POS Loss Analysis Examples of Desired Rules

Desired Rules Examples:

  1. IF EMP_DISCOUNT_ALLOWED_IND IS 'Y' and SCANNED_PER_UNITS_SOLD IS (0.8-1)

    THEN

    THEFT_AMOUNT_TO_SALES_AMOUNT_QUARTILE IS THE HIGHEST

  2. IF SCANNED_PER_UNITS_SOLD IS (0.8-1) and DEPARTMENT_NAME IN ('BEVERAGE', 'CIGARETTES')

    THEN

    SHRINK_AMOUNT_TO_SALES_AMOUNT_QUARTILE IS THE HIGHEST

Item POS Loss Analysis Attribute Ranking with SVM Factor

Table 10-25 shows sample Item POS Loss Analysis Attribute Ranking with SVM Factor.

Table 10-25 Item POS Loss Analysis Attribute Ranking with SVM Factor

Attribute Name Attribute Value Coefficient Rank

ENVIRONMENT_TYPE_CODE

EXTRA LIGHTING

1.63

1

STORE_ORDER_ALLOWED_IND

N

1.26

2

HAZARDOUS_MATERIAL_TYPE_CODE

201

0.93

3

COUPON_RESTRICTED_IND

N

0.87

4

ALLOW_FOOD_STAMP_IND

Y

0.76

5


Item POS Loss Analysis Item KPIs Prediction using DT

Table 10-26 shows sample Item POS Loss Analysis Item KPIs Prediction using DT.

Table 10-26 Item POS Loss Analysis Item KPIs Prediction using DT Sample 1

Item ID Item Name Target Variable DT Prediction DT Probability

2000001

Item_2000001

TOTAL_SHRINK_COUNT_DECILE

7

0.65

2000002

Item_2000002

TOTAL_SHRINK_COUNT_DECILE

4

0.87

2000003

Item_2000003

TOTAL_SHRINK_COUNT_DECILE

8

0.94

2000004

Item_2000004

TOTAL_SHRINK_COUNT_DECILE

3

0.82

2000005

Item_2000005

TOTAL_SHRINK_COUNT_DECILE

10

0.96


Table 10-27 shows sample Item POS Loss Analysis Item KPIs Prediction using DT.

Table 10-27 Item POS Loss Analysis Item KPIs Prediction using DT Sample 2

Item ID Item Name Target Variable DT Prediction DT Probability

2000001

Item_2000001

TOTAL_THEFT_AMOUNT_QUINTILE

5

0.65

2000002

Item_2000002

TOTAL_THEFT_AMOUNT_QUINTILE

3

0.87

2000003

Item_2000003

TOTAL_THEFT_AMOUNT_QUINTILE

1

0.94

2000004

Item_2000004

TOTAL_THEFT_AMOUNT_QUINTILE

3

0.82

2000005

Item_2000005

TOTAL_THEFT_AMOUNT_QUINTILE

4

0.96


Item POS Loss Analysis Item KPIs Prediction using SVM

Table 10-28 shows sample Item POS Loss Analysis Item KPIs Prediction using SVM.

Table 10-28 Item POS Loss Analysis Item KPIs Prediction using SVM Sample 1

Item ID Item Name Target Variable SVM Prediction SVM Probability

2000001

Item_2000001

TOTAL_SHRINK_COUNT

35

0.65

2000002

Item_2000002

TOTAL_SHRINK_COUNT

45

0.87

2000003

Item_2000003

TOTAL_SHRINK_COUNT

60

0.94

2000004

Item_2000004

TOTAL_SHRINK_COUNT

30

0.82

2000005

Item_2000005

TOTAL_SHRINK_COUNT

50

0.96


Table 10-29 shows sample Item POS Loss Analysis Item KPIs Prediction using SVM.

Table 10-29 Item POS Loss Analysis Item KPIs Prediction using SVM Sample 2

Item ID Item Name Target Variable SVM Prediction SVM Probability

2000001

Item_2000001

TOTAL_THEFT_AMOUNT

1250

0.65

2000002

Item_2000002

TOTAL_THEFT_AMOUNT

1100

0.87

2000003

Item_2000003

TOTAL_THEFT_AMOUNT

950

0.94

2000004

Item_2000004

TOTAL_THEFT_AMOUNT

1150

0.82

2000005

Item_2000005

TOTAL_THEFT_AMOUNT

1000

0.96


Model 6: Product Category Mix Analysis

This model addresses the business problem of discovering product categories that are frequently bought together by customers. The model is used to understand the Categories purchased by a Customer in a typical transaction in terms of the components like the Categories in the Basket, Target Category in a Basket and additional information like Basket Significance (Sales Value), Target Category Significance (Sales Value) which are generated from regular Customer Transactional data.

Using Oracle Data Mining, the KPIs are modeled with the APRIORI algorithm utilized by the Association Rules model. The model type used for Association Rules with Apriori Algorithm is APASS. This is an example of Unclassified Learning since the Categories (or Target Category) which make up the Category Basket are not inferred or guided (as part of data preparation) but are generated by the model itself.

The output from the model is, the purchase patterns are designed to be generated monthly. Therefore, the APASS models are created every month.

Product Category Mix Analysis Source Table

Table 10-30 shows the item attributes that are identified as source variables for this Association Rules model.

Table 10-30 Product Category Mix Analysis Source Table

Attribute Name

Case Id Alt (PK)

Store ID

Month Code

ID

Name

Value


Note: If mining has to performed at multiple levels, such as category, subcategory, item, there may be multiple source tables.

Product Category Mix Analysis Target Table

The mined patterns/rules are stored in target table, dwd_prod_deptmix_assoc_rules.

Product Category Mix Analysis Examples of Desired Rules

Desired Rules Example1

IF CUSTOMER HAS BOUGHT 'BABY', 'GRAB AND GO'

THEN

CUSTOMER IS LIKELY TO BUY ' PACKAGED BEVERAGES' ( Support: 36%, Probability: 56%)

Category Basket Significance of ('BABY', 'GRAB AND GO', 'PACKAGED BEVERAGES') is 45% of Sales Value => The Sales from the 3 categories in Category Basket ('BABY', 'GRAB AND GO', 'PACKAGED BEVERAGES') account for 45% of the Total Sales across all categories in that particular store.

The Category Basket Significance (Sales Value) KPI allows us to filter out Rules which may be insignificant from a Basket Sales Value perspective.

Target Category Significance of ('PACKAGED BEVERAGES') is 60% of the Basket Sales Value => The Sales from the Target Category ('PACKAGED BEVERAGES') account for 60% of the Total Sales from the Category Basket ('BABY', 'GRAB AND GO', 'PACKAGED BEVERAGES') in that particular store.

The Target Category Significance (Sales Value) KPI allows us to filter out Rules determining insignificant Customer Purchases (insignificant Target Category). In other words, it helps us to extract Rules which relate to significant Customer Purchases, where the Target Category is significant within the Basket of Categories (from a Sales Value perspective). This could be useful from a campaign/promotion/upsell perspective.

Desired Rules Example2

IF

CUSTOMER HAS BOUGHT 'FLORAL', 'PHARMACY', 'HOT FOODS'

THEN

CUSTOMER IS LIKELY TO BUY 'BABY' (Support: 36%, Probability: 62%)

Model 7: Product Price Elasticity Analysis

This model addresses the business problem of predicting price elasticity of a product or a group of products. Here, product is SKU item and a group of products can be either sub-class, class, and so on. Predicting price elasticity of a product can help calculate sales of a product across different stores.

A model is built for each product using promotion sales data till date across all stores as training data. The built model of each product is used to predict what would be the price elasticity of each product. Sales of a product in coming time period are calculated using price elasticity of the product, which will eventually help retailer to take a decision on promotion of the product.

Using Oracle Data Mining, the target variable (KPI) is modeled using Regression algorithm - Support Vector Machines (SVM).

The output from the model is, the price elasticity of each product to be predicted every month. Therefore, the SVM Regression models are created one time each month.

Product Price Elasticity Analysis Source Table

Table 10-31 shows the sample Product Price Elasticity Analysis Source Table.

Table 10-31 Product Price Elasticity Analysis Source Table

Attribute Name Attribute Description

case_id (PK)

 

mo_cd

 

bsns_unit_key

Store Key

bsns_unit_typ_cd

Store Type Code

sku_item_key

 

sku_item_nbr

 

sku_item_name

 

sku_item_desc

 

item_class_key

 

class_cd

 

class_name

 

wk_key

 

wk_cd

 

rtl_typ_cd

 

chnl_typ_cd

 

phs_key

 

prmtn_key

 

cmpgn_media_key

 

media_slng_item_key

 

pos_dept_key

 

sls_unit_cnt_last_wk

 

sls_trx_cnt_last_wk

 

sls_amt_last_wk

 

sls_unit_price_last_wk

 

sls_unit_cnt_curr_wk

 

sls_trx_cnt_curr_wk

 

sls_amt_curr_wk

 

sls_unit_price_curr_wk

 

prcntg_chg_in_sls_amt

 

price_mrkdwn_rate

 

sku_item_price_elstcty_dmnd (Target variable)

Price elasticity of demand


Product Price Elasticity Analysis Target Table

The SVM factor details of source attributes are stored in target tables, dwd_sku_prc_elsty_svm_factor and dwd_class_prc_elsty_svm_factor.

Product Price Elasticity Analysis Attribute Ranking with SVM Factor

Table 10-32 shows sample Product Price Elasticity Analysis Attribute Ranking with SVM Factor.

Table 10-32 Product Price Elasticity Analysis Attribute Ranking with SVM Factor

Attribute Name Attribute Value Coefficient Rank

SLS_UNIT_CNT_CURR_WK

null

0.89

1

BSNS_UNIT_TYP_CD

WareHouse

0.45

2

PRICE_MRKDWN_RATE

null

0.37

3

MO_CD

20040202

0.21

4

BSNS_UNIT_TYP_CD

RtlStore

0.19

5


Model 8: Employee Combination Analysis

The business problem is to identify combination of employees, which is likely to perform better at a store on a shift. The training data would include following attributes of combination, employee average, minimum, and maximum age, employee average, minimum, and maximum distance from store, employee average, minimum, and maximum total baskets, and so on. Those combination attributes are extracted from employee attributes. The performance of each combination, which is target attribute of classification model, is measured by total store sales amount for the shift combination of employees work.

The performance metric, total store sales amount, is converted into categorical variable using standard binning operations. The categorical variable is modeled as a classification model to identify the impact of combination attributes on the target variable. Using Oracle Data Mining, the categorical target attribute is modeled using one or both classification algorithms - Decision Tree (DT) and Support Vector Machines (SVM).

The retailer has to generate different combinations of employees, based on the availability of employees for a shift and also need to extract same attributes (used in training data) for each combination. The trained model is used to predict the categorical performance attribute value for all the combinations of employees generated. The combination which is predicted to perform better can be assigned to targeted store on a shift.

This classification analysis identifies which attributes of a combination influence the overall performance of a store on a shift. The output of this analysis is twofold:

  1. The rules discovered by DT model provide correlation between store performance and employees combination attributes.

  2. For future employee allocation, the trained model is applied on different combinations to predict what would be the performance of store.

Employee Combination Analysis Target Variables

The rules are designed to be generated monthly. Therefore, one SVM and one DT models are created every month across all the employees at each store using the following variable as target:

Target variable for Decision Tree (DT) and Support Vector Machines (SVM) is: PERFORMANCE INDICATOR

The values for target variable could be: VERY_HIGH, HIGH, MEDIUM, LOW, VERY_LOW

Employee Combination Analysis Source Table

Table 10-33 shows the attributes that are extracted for each combination from employees attributes. (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-33 Employee Combination Analysis Source Table

Attribute Name Attribute Description

Case id

need unique identified for mining analysis

Store key

 

Store name

 

Shift mode

 

Month mode

 

Derived Attributes

 

Emp count

 

Emp avg age

 

Emp min age

 

Emp max age

 

Avg household size

 

Min household size

 

Max household size

 

No. of emp with 10th education

 

No. of emp with 12th education

 

No. of emp with UG education

 

Emp house avg distance from store

 

Emp house min distance from store

 

Emp house max distance from store

 

Married count

 

Single count

 

Divorced count

 

Male emp count

 

Female emp count

 

Emp avg income

 

Emp min income

 

Emp max income

 

No. of emp with disability

 

No. of emp eligible for HR based salary

 

No. of emp eligible for overtime hours salary

 

No. of emp eligible for commission

 

No. of emp eligible for SPIFF

 

Emp avg total hours worked

 

Emp min total hours worked

 

Emp max total hours worked

 

Emp avg total overtime hours worked

 

Emp min total overtime hours worked

 

Emp max total overtime hours worked

 

Emp avg total sales amount

 

Emp min total sales amount

 

Emp max total sales amount

 

Emp avg basket size

 

Emp min basket size

 

Emp max basket size

 

Emp avg basket value

 

Emp min basket value

 

Emp max basket value

 

Employee Combination Analysis Target Tables

The DT rules are stored in target table, dwd_emp_cmbntn_dt_rules.

The SVM factor details of source attributes are stored in target table, dwd_emp_cmbntn_mnng.

Employee Combination Analysis Examples of Desired DT Rules

Desired Rules Example 1

IF emp avg age is 25 AND min household size is 2

THEN

PERFORMANCE INDICATOR IS HIGH

Desired Rules Example 2

IF number of emp eligible for HR based salaryis 10 AND married count is 8

THEN

PERFORMANCE INDICATOR IS LOW

Employee Combination Analysis Attribute Ranking with SVM Factor

Table 10-34 shows sample Employee Combination Analysis Attribute Ranking with SVM Factor.

Table 10-34 Employee Combination Analysis Attribute Ranking with SVM Factor

Attribute Name Attribute Value Coefficient Rank

Emp min age

21

1

1

Male emp count

10

0.87

2

No. of emp eligible for HR based salary

3

0.39

3

Emp avg total sales amount

$25000

0.39

4

Emp avg income

$3200

0.22

5

No. of emp with 12th education

15

0.17

6


Employee Combination Analysis Employee Combination Performance Prediction (By DT & SVM)

Table 10-35 shows sample Employee Combination Analysis Employee Combination Performance Prediction (By DT & SVM).

Table 10-35 Employee Combination Analysis Employee Combination Performance Prediction (By DT & SVM)

Combination ID SVM Prediction SVM Probability DT Prediction DT Probability

10001

HIGH

0.65

HIGH

0.56

10002

VERY HIGH

0.87

VERY HIGH

0.78

10003

LOW

0.94

LOW

0.94

10004

MEDIUM

0.82

MEDIUM

0.82

10005

HIGH

0.96

HIGH

0.69


Model 9: Customer Segmentation Analysis

The business problem is to group customers into generally homogeneous groups based on customer demographics, usage pattern and products they purchased (customer purchase history). Business Analysts can look into each segments to further understand the customer group discovered by the model and name each segment.

The customers are clustered using Clustering algorithms - O-cluster and K-Means. The discovered clustering rules draw the profile of customers.

Customer Segmentation Analysis Source Table

Table 10-36 shows the attributes for customers that are identified from the Oracle Retail Data Model data warehouse tables as source variables for the models (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-36 Customer Segmentation Analysis Source Table

Attribute Name Attribute Description

Customer Key (PK)

 

Month Code

 

Customer Number

 

Customer Type

 

Frequent Shopper Indicator

 

Mail Allowed Indicator

 

Customer District

 

Customer City

 

Customer State

 

Customer Country

 

Primary Occasion

 

Primary Preference

 

Primary Status Code

 

Primary Status Reason Code

 

Primary Effective From Date

 

No. Of Survival months

 

Primary Effective To Date

 

Living at Current Address Since

 

INDIVIDUAL CUSTOMER ATTRIBUTES

 

Registered as Gift Receiver

 

Registered as Gift Giver

 

Customer Identity Required Indicator

 

Customer Identity Type Name

 

Age

 

Marital Status

 

Gender

 

Income

 

Race

 

Education

 

Occupation

 

Household Size

 

Dwelling Size

 

Years of Residence

 

Demography Group Name

 

Ethnic Background

 

Income Group

 

ORGANIZATION CUSTOMER ATTRIBUTES

 

Organization Type

 

Year of Establishment

 

Total Employee Strength

 

SIC Code

 

Industry Code

 

Pubic Indicator

 

OTHER ATTRIBUTES

 

Customer Occasion Type This Month

 

Customer Preference Value This Month

 

Campaign This Month

 

Membership Account Type Code

None if the customer does not have any account; the last used account if the customer has multiple accounts

Life-To-Date Points

 

Available Points

 

Customer Account Type

None if the customer does not have any account; the last used account if the customer has multiple accounts

Customer Group Code

None if the customer does not belong to any group

Number of Return items

 

Percentage of Return items

 

Total Revenue

 

Total Orders Placed

 

Total Orders Cancelled

 

Revenue this month

 

Orders this month

 

Channel Type Code

 

Customer Segmentation Analysis Target Table

The clustering rules are stored in target table, dwr_cust_sgmnt.

Customer Segmentation Analysis Examples of desired Segment profile

Desired Rules Example 1

IF YEARS OF RESIDENCE IS (8 - 10) and HOUSEHOLD_SIZEIS (3+) and MARITAL_STATUS IS SINGLE

and EDUCATION 12th and INCOME LEVEL IS BETWEEN 1000 AND 2000

THEN

SEGMENT_ID IS 3

Desired Rules Example 2

IF YEARS OF RESIDENCE IS (1 - 3) and HOUSEHOLD_SIZEIS (3+) and MARITAL_STATUS IS SINGLE

and EDUCATION PHD and INCOME LEVEL IS (5000 - 10000)

THEN

SEGMENT_ID IS 8

Model 10: Customer Life Time Value Analysis

The business problem is to identify/predict the customers who are likely to represent the highest value of revenue over their life time based on criteria such as customer demographic information, purchase behavior, and service quality, and so on.

This analysis identifies which key attributes of a customer influence his or her Life TimeValue. Life Time Value is continuous value (total revenue contributed by the customer). The Life Time Value is converted into categorical values using standard binning operations. The categorical variables are modeled as a classification model to identify or predict the impact of various independent variables (attributes) on the dependent target variable (KPI - categorical). Using Oracle Data Mining, the target variable, Categorical Life Time Value, is modeled using classification algorithm, Decision Tree (DT).

The continuous Life Time Value is modeled as a regression model using regression algorithm, Support Vector Machines (SVM).

The mining models are built every month using the customer latest data and the mining models are applied on customers data to predict which customer is likely to represent the highest value of revenue over their life time.

The output from the model is two-fold:

  1. The discovered rules to outline the profile of customers who are most likely to represent the highest value of revenue over their life time.

  2. A prediction can be made on customer data after the model is trained.

Customer Life Time Value Analysis Target Variables

The rules are designed to be generated monthly. Therefore, two SVM and two DT models are created every month across all the customers using the following variables as targets:

Target variables for Decision Tree (DT) is:

  1. Life Time Value Code

  2. Life Time Survival Value Code

Target variables for Support Vector Machines (SVM) are:

  1. Life Time Value

  2. Life Time Survival Value

Customer Life Time Value Analysis Source Table

Table 10-37 shows the attributes for customers that are identified from the Oracle Retail Data Model data warehouse tables as source variables for the models (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-37 Customer Life Time Value Analysis Source Table

Attribute Name Attribute Description

Case Id Alt (PK)

 

Month Code

 

Customer Number

 

Customer Type

 

Frequent Shopper Indicator

 

Mail Allowed Indicator

 

Customer District

 

Customer City

 

Customer State

 

Customer Country

 

Primary Occasion

 

Primary Preference

 

Primary Status Code

 

Primary Status Reason Code

 

Primary Effective Date

 

No. Of Survival months

 

Primary Effective Date

 

Living at Current Address Since

 

INDIVIDUAL CUSTOMER ATTRIBUTES

 

Registered as Gift Receiver

 

Registered as Gift Giver

 

Customer Identity Required Indicator

 

Customer Identity Type Name

 

Age

 

Marital Status

 

Gender

 

Income

 

Race

 

Education

 

Profession

 

Household Size

 

Dwelling Size

 

Years of Residence

 

Demography Group Name

 

Ethnic Background

 

Income Group

 

ORGANIZATION CUSTOMER ATTRIBUTES

 

Organization Type

 

Year of Establishment

 

Total Employee Strength

 

SIC Code

 

Industry Code

 

Pubic Indicator

 

OTHER ATTRIBUTES

 

Customer Occasion Type This Month

 

Campaign This Month

 

Membership Account Type Code

None if the customer does not have any account; the last used account if the customer has multiple accounts

Life-To-Date Points

 

Available Points

 

Customer Account Type

None if the customer does not have any account; the last used account if the customer has multiple accounts

Customer Group Code

None if the customer does not belong to any group

Number of Return items

 

Percentage of Return items

 

Total Revenue

 

Total Orders Placed

 

Total Orders Cancelled

 

Revenue this month

 

Orders this month

 

Channel Type Code

 

Customer Life Time Value Analysis Target Tables

The DT rules are stored in target table, dwd_cust_ltv_dt_rules.

The SVM factor details of source attributes are stored in target table, dwd_cust_ltv_svm_factor.

Customer Life Time Value Analysis Examples of Desired DT Rules

Desired Rules Example 1

IF YEARS OF RESIDENCE IS (8 - 10) ANDHOUSEHOLD_SIZE IS (3+)

THEN

CUST_LTV IS HIGH

Desired Rules Example 2

IF YEARS OF RESIDENCE IS (1 - 3) AND HOUSEHOLD_SIZE IS (LESS THAN 3)

THEN

CUST_LTV IS VERY_LOW

Customer Life Time Value Analysis Attribute Ranking with SVM Factor

Table 10-38 shows sample Customer Life Time Value Analysis Attribute Ranking with SVM Factor.

Table 10-38 Customer Life Time Value Analysis Attribute Ranking with SVM Factor

Attribute Name Attribute Value Coefficient Rank

YEARS_OF_RESIDENCE

10

1.51

1

HOUSEHOLD_SIZE

3

1.32

2

MARITAL_STATUS

SINGLE

1.2

3

MARITAL_STATUS

MARRIED

0.98

4

EDUCATION

12th

0.84

5

EDUCATION

PhD

0.78

6

EDUCATION

Masters

0.66

7

MARITAL_STATUS

DIVORCED

0.25

8

INCOME_LEVEL

1500

0.24

9

INCOME_LEVEL

10000

0.22

10

RACE

White

0.16

11

CUST_OCCASIONTYPE_THIS_MONTH

ANNIVERSARY

0.15

12


Customer LTV Code Prediction (Using DT) - Classification

Table 10-39 shows sample Customer LTV Code Prediction (Using DT) - Classification.

Table 10-39 Customer LTV Code Prediction (Using DT) - Classification

Customer ID Customer Name DT Prediction DT Probability

20001

Sunil Milenova

LOW

0.56

20002

Buzz Krishnan

MEDIUM

0.78

20003

Helena Lamar

VERY LOW

0.94

20004

Uraih Konur

HIGH

0.82


Customer LTV Prediction (Using SVM)- Regression

Table 10-40 shows sample Customer LTV Prediction (Using SVM)- Regression.

Table 10-40 Customer LTV Prediction (Using SVM)- Regression

Customer ID Customer Name LTV Predicted Value - SVM

20001

Sunil Milenova

1400

20002

Buzz Krishnan

3500

20003

Helena Lamar

950

20004

Uraih Konur

7500


Customer LT Survival Value Code Prediction (Using) - Classification

Table 10-41 shows sample Customer LT Survival Value Code Prediction (Using) - Classification.

Table 10-41 Customer LT Survival Value Code Prediction (Using) - Classification

Customer ID Customer Name DT Prediction DT Probability

20001

Sunil Milenova

LOW

0.56

20002

Buzz Krishnan

MEDIUM

0.78

20003

Helena Lamar

VERY LOW

0.94

20004

Uraih Konur

HIGH

0.82


Customer LT Survival Value Prediction (Using SVM)- Regression

Table 10-42 shows sample Customer LT Survival Value Prediction (Using SVM) Regression.

Table 10-42 Customer LT Survival Value Prediction (Using SVM)- Regression

Customer ID Customer Name LT Survival Value Predicted - SVM (in Months)

20001

Sunil Milenova

23

20002

Buzz Krishnan

37

20003

Helena Lamar

11

20004

Uraih Konur

54


Model 11: Customer Churn Analysis

The business problem is to identify/predict the customers who are likely to leave the current retailer for a competitor based on criteria such as customer demographic information, purchase behavior, and service quality, and so on.

This analysis identifies which key attributes of a customer influence his or her churn indicator value. Using Oracle Data Mining, the target variable, churn indicator, is modeled using classification algorithms - Decision Tree (DT) and Support Vector Machines (SVM).

The mining models (DT & SVM) are built every month using the historical customer data and the mining models are applied on current base customers data to predict which customer is likely to leave (churn) the current retailer.

The output from the model is two fold:

  1. The discovered rules to outline the profile of customers who are most likely to churn. What type of customer is more likely to churn.

  2. The prediction can be made on customer data after the Churn model is trained.

Customer Churn Analysis Source Table

Table 10-43 shows the attributes for customers that are identified from the Oracle Retail Data Model data warehouse tables as source variables for the models (a few of these variables are unique identifiers and are treated as supplementary variables).

Table 10-43 Customer Churn Analysis Source Table

Attribute Name Attribute Description

Case Id Alt (PK)

 

Month Code

 

Customer Number

 

Customer Type

 

Frequent Shopper Indicator

 

Mail Allowed Indicator

 

Customer District

 

Customer City

 

Customer State

 

Customer Country

 

Primary Occasion

 

Primary Preference

 

Primary Status Code

 

Primary Status Reason Code

 

Primary Effective Date

 

No. Of Survival months

 

Primary Effective Date

 

Living at Current Address Since

 

INDIVIDUAL CUSTOMER ATTRIBUTES

 

Registered as Gift Receiver

 

Registered as Gift Giver

 

Customer Identity Required Indicator

 

Customer Identity Type Name

 

Age

 

Marital Status

 

Gender

 

Income

 

Race

 

Education

 

Profession

 

Household Size

 

Dwelling Size

 

Years of Residence

 

Demography Group Name

 

Ethnic Background

 

Income Group

 

ORGANIZATION CUSTOMER ATTRIBUTES

 

Organization Type

 

Year of Establishment

 

Total Employee Strength

 

SIC Code

 

Industry Code

 

Pubic Indicator

 

OTHER ATTRIBUTES

 

Customer Occasion Type This Month

 

Campaign This Month

 

Membership Account Type Code

None if the customer does not have any account; the last used account if the customer has multiple accounts

Life-To-Date Points

 

Available Points

 

Customer Account Type

None if the customer does not have any account; the last used account if the customer has multiple accounts

Customer Group Code

None if the customer does not belong to any group

Number of Return items

 

Percentage of Return items

 

Total Revenue

 

Total Orders Placed

 

Total Orders Cancelled

 

Revenue this month

 

Orders this month

 

Channel Type Code

 

Customer Churn Analysis Target Tables

The DT rules are stored in target table, dwd_cust_chrn_dt_rules.

The SVM factor details of source attributes are stored in target table, dwd_cust_chrn_svm_factor.

Customer Churn Analysis Examples of Desired DT Rules

Desired Rules Example 1

IF YEARS OF RESIDENCEIS (8 - 10) AND HOUSEHOLD_SIZE IS (3+)

THEN

CHURN_IND IS 1

Desired Rules Example 2

IF YEARS OF RESIDENCE IS (1 - 3) AND HOUSEHOLD_SIZE IS (LESS THAN 3)

THEN

CHURN_IND IS 0

Customer Churn Analysis Attribute Ranking with SVM Factor

Table 10-44 shows sample Customer Churn Analysis Attribute Ranking with SVM Factor.

Table 10-44 Customer Churn Analysis Attribute Ranking with SVM Factor

Attribute Name Attribute Value Coefficient Rank

YEARS_OF_RESIDENCE

10

1.51

1

HOUSEHOLD_SIZE

3

1.32

2

MARITAL_STATUS

SINGLE

1.2

3

MARITAL_STATUS

MARRIED

0.98

4

EDUCATION

12th

0.84

5

EDUCATION

PhD

0.78

6

EDUCATION

Masters

0.66

7

MARITAL_STATUS

DIVORCED

0.25

8

INCOME_LEVEL

1500

0.24

9

INCOME_LEVEL

10000

0.22

10

RACE

White

0.16

11

CUST_OCCASIONTYPE_THIS_MONTH

ANNIVERSARY

0.15

12


Customer Churn Prediction (By DT & SVM)

Table 10-45 shows sample Customer Churn Prediction (By DT & SVM).

Table 10-45 Customer Churn Prediction (By DT & SVM)

Customer ID Customer Name SVM Prediction SVM Probability DT Prediction DT Probability

20001

Sunil Milenova

1

0.65

0

0.56

20002

Buzz Krishnan

0

0.87

0

0.78

20003

Helena Lamar

0

0.94

0

0.94

20004

Uraih Konur

0

0.82

0

0.82

20005

Bonnibelle Goode

1

0.96

1

0.69


Model 12: Customer Sentiment Analysis

The business problem is to measure customer sentiment regarding the products and service quality according to any text message received from the customer. Those text messages may be emails from a customer, or written by call center agents during call center calls, and so on. Some companies use voice recognition technology and others have call center service agents write what the customer said.

This model leverages Text mining capability provided by Oracle database. The text messages are transformed using Oracle text. A classification model is built over the training data using the classification algorithm, Support Vector Machines (SVM). The built model is applied on new customers text messages and also on existing customers new text messages to predict the sentiment.

Customer Sentiment Analysis Target Variable

The sentiment mining model is trained once a month. Therefore one SVM model is created every month across all the customers with the following variable as target:

  1. Customer Sentiment

Customer Sentiment Analysis Source Table

Table 10-46 shows the attributes for the source table.

Table 10-46 Customer Sentiment Analysis Source Table

Attribute Name Attribute Description

Case Id

Primary Key

Customer Key

 

Month Code

 

Sentiment

Target Variable

Positive Sentiment Probability

 

Negative Sentiment Probability

 

Customer Comment

Text message from customer


Customer Sentiment Analysis Target Table

The following columns in the customer mining target table, DWD_CUST_MNNG, are populated with prediction results.

  • sntmnt_ctgry_cd

  • manual_sntmnt_ctgry_cd

  • sntmnt_prob