Skip Headers
Oracle® Communications Data Model Reference
11g Release 2 (11.2)

Part Number E15886-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

10 Oracle Communications Data Model Data Mining Models

This chapter provides reference information about the data mining models provided with Oracle Communications Data Model.

This chapter includes the following sections:

About Data Mining in Oracle Communications Data Model

Oracle Communications Data Model mining models include data mining packages, source tables (MV) and target tables. The source tables are extracted from Oracle Communications Data Model main schema and are used to train the models. The target tables contain the mining result data, for example, mined rules. Data mining packages pull in the source data, feed it into the data mining packages, and populate the target tables with the results. The data in the target tables can be presented in reports.

Note:

Oracle Communications Data Model does not support modified or new data models. Consequently, do not change the data models that are defined and delivered with Oracle Communications Data Model, but, instead, to create a data model copy a delivered data model.

As shown in Table 10-1, the Oracle Communications Data Model mining models use the specified algorithms for the specific problem.

Table 10-1 Oracle Communications Data Model Algorithm Types Used by Model

Model Algorithms Used by Data Mining Model

Model 1: Churn Prediction

Decision Tree (DT), Support Vector Machine (SVM)

Model 2: Customer Profiling

k-Means (KM)

Model 3: Customer Churn Factor

Support Vector Machine (SVM)

Model 4: Cross-Sell Opportunity

Support Vector Machine (SVM)

Model 5: Customer Sentiment Detection

Support Vector Machine (SVM)

Model 6: Life Time Value (LTV) Prediction

Generalized Linear Models (GLM)


Understanding the Mining Model Schema and Architecture

The Oracle Communications Data Model mining consists of two schemas: ocdm_mining and ocdm_sys. Figure 10-1 shows how these schemas function in Oracle Communications Data Model mining.

The ocdm_mining schema includes the following:

  • Mining Model Package (pkg_ocdm_mining): Given source data in the views, the mining package generates Mined Rules, Predicted Results, and additional information.

  • Mining Model Source Views: Materialized views transform the data from ocdm_sys schema and present them to Oracle Mining algorithms as multiple materialized views. All tables are implemented as MVs, not physical tables.

  • Mining Model Support Tables: The mining model support tables are primarily intermediate tables used during the mining model creation or testing process. Most of the mining model support tables have names that start with "DM$".

Note:

Do not delete the mining model support tables; the DM$ tables can be very difficult to reconstruct if they are deleted.

The ocdm_sys schema includes the following:

  • OCDM Main Model, which are all the base, reference, lookup, derived and aggregate tables.

  • Mining Result Tables: Mining Result Tables save the output from Mining models. This output is normally produced from mining apply process. The tables are created in ocdm_sys schema.

Figure 10-1 Oracle Communications Data Model Mining Schemas

Description of Figure 10-1 follows
Description of "Figure 10-1 Oracle Communications Data Model Mining Schemas"

Using the Mining Model Refresh Procedure

Over time, the customer information and the customer behavior may change. Therefore, you may want to refresh the trained mining models based on the latest customer and usage data. The mining model refresh process is generally divided into three tasks:

  1. Data Preparation: Load and transform the data into a format that the mining algorithms can understand. Also a customer needs to prepare two sets of data corresponding to next two tasks:

    • Training Data

    • Scoring data

  2. Training: Based on part of customer data, user can run certain algorithms and then a mining model is generated.

  3. Scoring (applying): The trained model can be applied onto other customer data. This applies the model to do the prediction or other missions the model is designed to perform.

For more information about the Oracle Mining training and Scoring (applying) process, see Oracle Data Mining Concepts.

To refresh all six mining models based on latest customer data, call the procedure named pkg_ocdm_mining.refresh_model. This procedure performs the following tasks for each model:

  1. Refreshes the mining source materialized views based on latest data from OCDM_SY.

  2. Trains each mode again using the new training data.

  3. Applies each model onto the new apply data set.

This procedure has been integrated into Oracle Communications Data Model Intra-ETL workflow.

The errors that occur during mining model refresh are saved into the table named: DWC_INTRA_ETL_ACTIVITY as is other standard Oracle Communications Data Model Intra-ETL package errors and information.

Oracle Communications Data Model Mining Result Tables

Table 10-2 shows the dwd_cust_mnng result table.

Table 10-2 dwd_cust_mnng Data Mining Result Table

Name Type Description

MO_CD

VARCHAR2(50),

Month code when the month was trained and applied. In current version it was set as Null.

CUST_KEY

VARCHAR2(50),

Customer key to uniquely identify the customer.

PRDCT_CHURN_SVM_IND

VARCHAR2(1),

Boolean value whether customer will churn in next three months according to SVM model.

PRDCT_CHURN_SVM_PROB

NUMBER(16,12),

The probability value of how likely customer will churn in next 3 months.

This is the probability that the SVM prediction is correct.

PRDCT_CHURN_DT_IND

VARCHAR2(1),

Boolean value whether customer will churn in next three months according to DT model.

PRDCT_CHURN_DT_ND_NBR

VARCHAR2(50),

The ID of the node in the decision tree where the customer is assigned.

clstr_sgmnt_code

VARCHAR2(50),

The k-Means algorithm divides the set of all customers into segments. This value identifies the segment that the customer belongs to.

LTV_band_cd

VARCHAR2(50),

The band code of customer lifetime value, predicted by LTV Generalized Linear Models Regression. For more information, see Oracle Data Mining Concepts.

LTV_value

NUMBER(16,2),

The real value of Customer Lifetime value, predicted by LTV (GLMR) Mode.

LT_srvvl_cd

VARCHAR2(50),

The band code of Customer Survival period (Life Expectancy), predicted by Life_Exp (GLMR) Model.

LT_srvvl_val

NUMBER(16,2),

The value of Customer Survival period (Life Expectancy), predicted by Life_Exp (GLMR) Mode.

sntmnt_ctgry_cd

VARCHAR2(50),

The customer sentiment category detected by Customer sentiment model (SVM + Text). This is an SVM model on transformed TEXTs (transformed into a words matrix).

MANUAL_sntmnt_ctgry

VARCHAR2(50),

The manual score applied by end user. The end user generates this model. For example, an employee from the operator might generate this model. Usually this is the call center agent. For example, when the message is recorded, there could be a manual tag associated with the message indicating that the customer is happy or upset.

sntmnt_prob

NUMBER(16,12)

The probability of which customer is in possible model (Happy). This is the probability that customer is happy with their service. For example, a value of 60% means there is 60% chance that customer is happy with the service and a 40% chance that customer is not happy.


Table 10-3 shows the dwd_cust_prod_affltn result table.

Table 10-3 dwd_cust_prod_affltn Data Mining Result Table

Name Type Description

mo_cd

VARCHAR2(50),

Month code when the month was trained and applied. In current version it was set as Null.

CUST_KEY

VARCHAR2(50),

Customer key to uniquely identify the customer.

PROD_CD

VARCHAR2(50),

The product code which was predicted against. This is target product for promotion.

AFFLTN_PROB

NUMBER(20,18),

The likelihood, predicted by the SVM model, that the customer will purchase the product.

BUY_IND

VARCHAR2(1)

Boolean value to indicate whether customer may purchase the product. This indicates that a value 1 is BUY and a value of 0 is "NOT to BUY".


Table 10-4 shows the dwd_chrn_svm_factor result table.

Table 10-4 dwd_chrn_svm_factor Data Mining Result Table

Name Type Description

ATTRIBUTE_NAME

VARCHAR2(4000)

Name of the factor.

ATTRIBUTE_SUBNAME

VARCHAR2(4000)

Subname of the factor if there is any. For example, if the ATTRIBUTE_NAME has the value, "Payment_Method", then the ATTRIBUTE_SUBNAME could be and of the following:

  • Debit_Card

  • Cash

Each ATTRIBUTE_SUBNAME has a different weight, coefficient, in the model.

ATTRIBUTE_VALUE

VARCHAR2(4000)

Value of the factor, if there is any.

For example, for payment method, value of "cash" and "direct debit" might have different influence and ranking.

COEFFICIENT

NUMBER

Importance of the factor. The factors are ranked according to this value.


Model 1: Churn Prediction

The churn prediction model identifies the characteristics of a customer likely to churn. When you apply the model you get a prediction of how likely a particular customer is to churn. This is based on customer information such as customer demographic information, service quality, last tariff plan, calling usage, and other factors. Using the patterns learned, the model can also perform the calculation over current customer base (called 'Apply') to predict who are the customers mostly like to churn in next few months. With this knowledge, operators can initiate certain retention programs to reduce the customer churn rate. However, the churn prediction produces a likely to churn value. Further processing may be required to determine if it is desirable to retain a customer that is likely to churn. For example, you may only want to initiate retention programs for high value customers.

Churn Prediction Churner Definition

There are several levels to define churn, namely Customer, Account, and subscription. For some operators with only limited business line, customer and account churn at same time, while subscription is at a lower level. Customer can stop using some products (termination of subscription) while continue to use the other products. In later case, operator still have the customer and may promote other products in the future. However, if customer completely stopped using any products from the operator, it is very difficult for operator to bring customer back.

In Oracle Communications Data Model, the churn was defined at Customer Level, which is, a customer is recognized as a churner only when he stop using any product from the operator.

If customers churn at a given month, we may receive the data only 3 months after the actual Churn. So time window should be adjusted.

Churn Prediction Source

Table 10-5 shows the attributes identified from the Foundation Data Warehouse as input source variables for the DT model.

Table 10-5 DMV_CUST_CHRN_SRC_ALL

Attribute Description

CUST_KEY

Primary Key for customer

CHRN_IND

Target column of churn model

FUTRE_CNRT_CNT_3MO

Number of future contract count in last 3 months

SBRP_CNT_3MO

Subscription count in last 3 months

SSPNSN_CNT_3MO

Suspension count in last 3 months

CNRT_CNT_3MO

Contract count in last 3 months

CMPLNT_MO_CNT_3MO

Complaint count in last 3 months

CMPLNT_CALL_MO_CNT_3MO

Complaint call count to call center in last 3 months

CMPLNT_CALL_LFTM_CNT_3MO

Complaint call count to call center in the life time in last 3 months

CNRT_LFT_DAYS_3MO

Contract left days in last 3 months

ACCT_LFT_VAL_3MO

Account left value in last 3 months

RMNG_CNRT_SUM_3MO

Remaining contract sum in last 3 months

DEBT_VAL_3MO

Debt total in last 3 months

LYLTY_PROG_BAL_3MO

Loyalty program balance in last 3 months

TOT_PYMT_RVN_3MO

Total payment revenue in last 3 months

MO_RVN_3MO

Monthly revenue (arpu) in last 3 months

CNRT_ARPU_3MO

Contract arpu amount in last 3 months

PRTY_TYP_CD

Party type code, individual or organizational in last 3 months

BSNS_LEGAL_STAT_CD

Business legal status

MRTL_STAT_CD

Marital status for individual user

HH_SZ

Household size

JB_CD

Job Code

NTNLTY_CD

Nationality code

EDU_CD

Education level

GNDR_CD

Gender

DRVR_LICNS_IND

Driver license indicator

JB_CNRT_TYP

Job contract type, it is permanent employee or contracted.

BARNG_RSN_CD

Barring reason code if in barring status

POST_CD

Post code

CITY

CITY

STATE

STATE

CNTRY

Country

NAME_PRFX

Name prefix such as, Dr, Ms, and so on.

NAME_OF_WKPLC

Name of workplace

PLC_OF_BRTH

Place of birth

JB_POSN

Job position

LEGAL_TTL_TO_HSNG

The customer's legal title to home (rents, owns, and so on)

ETHNIC_BCKGRND

ETHNIC BACKGROUND

PREV_EMPLYR_TAX_NBR

Previous employer tax number

NBR_OF_CHLDRN

Number of children

NBR_OF_DPNDNT

Number of dependents

DWLNG_TENR

Tenure of dwelling in month

DWLNG_SZ

Dwelling size

ETHNCTY

Ethnicity

FROM_OF_EMPMNT

Classifies the individual for minority reporting purposes.

DWLNG_TYP

Dwelling type

DWLNG_STAT

Dwelling Status

SRC_OF_INCM

Source of income

CUST_TYP_CD

Customer type code

CUST_SGMNT_KEY

Customer segment code

ADDR_LOC_KEY

Address loc key

CUST_SCR_KEY

Customer score key

PRMRY_STAT_CD

Primary status code

PRMRY_STAT_RSN_CD

Primary status reason code

SOC_JB_KEY

Job code in SOC classification

EXTRNL_ORG_TYP_CD

Organization type

LANG_CD

Language code

CNCT_ADDR_EFF_DT

For how long contact address is in effective, in days

BNKRPT_STRT_DT

Bankrupt status start date in days

BNKRPT_STAT

Bankrupt status

BLLG_ADDR_EFF_DT

For how long billing address is in effective, in days

PYMT_ACCT_OPEN_DT

For how long payment account is in effective, in days

MAIL_ALWD_IND

Mail allowed indicator

CUST_PYMT_RESPBL_IND

Whether the customer is responsible for payment

LVNG_AT_CURR_ADDR_SINCE

For how long customer lives in current location, in days

END_OF_JB_CNRT

End of job contract date

STRT_OF_EMPMNT

Start of job date

ECNMCLY_ACTV_IND

Economical active indicator

AGE_ON_NET_BND_CD

Age on net band code

AGE_ON_NET_NBR

Age on net number

CRDT_CTGRY_KEY

Credit category

AGE_BND_CD

Age band

DEBT_AGNG_BND_CD

Debt aging band

PYMT_MTHD_TYP_CD

Payment method type

ARPU_BND_CD

Arpu band code

SL_CHNL_KEY

Sales channel key

SL_CHNL_RPRSTV_KEY

Sales channel representative key

ORG_BSNS_UNIT_KEY

Organization business unit key

CUST_RVN_BND_CD

Customer revenue band code

FUTRE_CNRT_CNT

Number of future contract count in last 3 months

SBRP_CNT

Subscription count in last 3 months

SSPNSN_CNT

Suspension count

CNRT_CNT

Contract count

CMPLNT_LFTM_CNT

Complaint count in the life time

CMPLNT_MO_CNT

Complaint count

CMPLNT_CALL_MO_CNT

Complaint call count to call center

CMPLNT_CALL_LFTM_CNT

Complaint call count to call center in the life time

LIFE_SPN

Life Span in days

CNRT_LFT_DAYS

Contract left days

ACCT_LFT_VAL

Account left value

RMNG_CNRT_SUM

Remaining contract sum

DEBT_VAL

Debt total

LYLTY_PROG_BAL

Loyalty program balance

TOT_PYMT_RVN

Total payment revenue

TOT_RVN

Total Revenue as of current month

MO_RVN

Monthly revenue (arpu)

LFTM_ARPU

Life time revenue

CNRT_ARPU

Contract arpu amount

ESTMD_ACQSTN_COST

Estimated acquisition cost (optional attribute)

BRDBND_IND

Whether the customer uses the Broadband product

PAY_TV_IND

Whether the customer uses PayTV product

IDD_IND

Whether the customer uses IDD product

HOMTEL_IND

Whether the customer uses fixed line telephone product

WRLS_IND

Whether the customer uses wireless telephone product

NEW_ACCT_IND

Whether the customer is a new customer

NBR_4_CNT

Number of digit 4 in the customer phone number

NBR_13_CNT

Number of digit 13 in the customer phone number

NBR_6_CNT

Number of digit 6 in the customer phone number

NBR_9_CNT

Number of digit 9 in the customer phone number

NBR_RTNG

Score of customer number for customer specific rating program

PORT_IN_FROM

From which operator customer ported in from

PORT_IN_CNT

How many times customer ported in

PORT_OUT_CNT

How many times customer ported out


All the data from dmv_cust_chrn_src_all contains non-null value in the CHRN_IND column. This table is then divided into two tables: dmv_cust_chrn_src_prd and dmv_cust_chrn_src_tst. The table dmv_cust_chrn_src_prd has about 60% of the customers and dmv_cust_chrn_src_tst has the rest of the customers. The churn prediction model was trained on table dmv_cust_chrn_src_prd and then tested on dmv_cust_chrn_src_tst for its accuracy.

During the training process, a temporary prediction model OCDM_CHURN_DT_NEW is built and compared with the existing prediction model OCDM_CHURN_DT. If the new temporary model OCDM_CHURN_DT_NEW outperforms the existing model in accuracy, it replaces the existing model, otherwise, it is dropped.

The table dmv_cuts_chrn_src_all is derived from the tables:

  • ocdm_sys.DWR_CUST

  • ocdm_sys.DWD_ACCT_STTSTC

  • ocdm_sys.DWR_BSNS_MO

  • ocdm_sys.DWR_HH

  • ocdm_sys.DWR_JB

Churn Prediction Output

The mined results are saved into the target table with the following columns:

  • dwd_cust_mnng.PRDCT_CHURN_SVM_IND

  • dwd_cust_mnng.PRDCT_CHURN_SVM_PROB

  • dwd_cust_mnng.PRDCT_CHURN_DT_IND

  • dwd_cust_mnng.PRDCT_CHURN_DT_ND_NBR

For more information on these four columns, refer to the Mining target data dictionary.

Churn Prediction Algorithm

The two mining algorithms are used separately and the two mining models for churn prediction problem:

  • Decision Trees Classification

  • SVM Classification

For more information on mining algorithms, see Oracle Data Mining Concepts and Oracle Data Mining Application Developer's Guide.

Model 2: Customer Profiling

The business problem is to group customers into generally homogeneous groups (Segments) based on customer demographic value, usage pattern and list of telecom products they subscribe to (customer subscriber history).Business Analysts can look into each segment to further understand the customer group discovered by the model and name each segments.

The discovered clustering rules draw a profile of the customers along with their product subscription. Thus, the clustering rules generated for each profile group will show the most important similar characteristics in each group. For example, an operator may have a group having significantly more short message (SMS) usage than any other groups. Alternatively, there may be a group with extremely higher profit than any other group (covering high end customers).

Customer Profiling Source

Customer profiling model use source view DMV_CUST_PROFILE_SRC, which is a subset of table dmv_cust_chrn_apply_all. It contains those information:

Customer Profiling Output

The mined results are saved into target table with the following columns:

  • dwd_cust_mnng.clstr_sgmnt_code

Customer Profiling: Algorithm

K-means Clustering

Model 3: Customer Churn Factor

The business problem is to identify which factor may have with more influence on customer churn problem or Customer Revenue. The marketing department should leverage those information to better understand customer behavior. The major factors, namely, Geography Demography, Customer Segment/Group, VAS usage should be included. The attributes are mostly categorical for business user to understand customer profile.

This model is derived from the Churn Prediction model through the SVM algorithm, but due to it's usefulness, we still present this result as a separated model.

Customer Churn Factor Source

The source attributes for this model is same as churn model.

Customer Churn Factor Output

Table 10-6 shows the columns where the customer churn factor model saves results in the table dwd_chrn_svm_factor.

Table 10-6 Customer Churn Factor Output Columns in dwd_chrn_svm_factor Table

Attribute Datatype Description

ATTRIBUTE_NAME

VARCHAR2(4000)

Name of the factor.

ATTRIBUTE_SUBNAME

VARCHAR2(4000)

Subname of the factor if there is any. Each ATTRIBUTE_SUBNAME has a different weight, coefficient, in the model.

ATTRIBUTE_VALUE

VARCHAR2(4000)

Value of the factor, if there is any. For example, for payment method, value of "cash" and "direct debit" might have different influence and ranking.

COEFFICIENT

NUMBER

Importance of the factor. The factors are ranked according to this value.


Customer Churn Factor Algorithm

SVM

Model 4: Cross-Sell Opportunity

The business problem is to identify the patterns of which products are typically purchased together or one after another over the lifetime of a customer. This helps in providing recommendations about which products should be presented to customers according to their potential acceptance score. A typical scenario is call center can call certain customers with some specific purpose to cross-sell some products. Operators need the list of customers to save promotion cost and improve efficiency.

The trained model generates recommendations about promotion target products. This is done based on what products the customer has subscribed to taking into account other factors such as customers credit history and the risk involved in offering the particular product to the customer.

Cross-Sell Opportunity Source

Table dmv_prod_mix_src is the input into mining algorithm for model training. This table is derived from the following tables:

  • ocdm_mining.dmv_cust_chrn_src_all

  • ocdm_sys.dwd_vas_sbrp_qck_summ

Cross-Sell Opportunity Output

For a given product to do promotion, the model generates list of customer most likely to buy. The prediction was done by SVM algorithm. The result is saved into table dwd_cust_prod_affltn in the following columns.

Table 10-7 Cross-Sell Opportunity Output Columns in dwd_cust_prod_affltn Table

Attribute Datatype Description

mo_cd

VARCHAR2(50),

Month code when the month was trained and applied. In current version it was set as Null.

CUST_KEY

VARCHAR2(50),

Customer key to uniquely identify the customer.

PROD_CD

VARCHAR2(50),

The product code which was predicted against. This is target product for promotion.

AFFLTN_PROB

NUMBER

The probability output from SVM algorithm, serve as likelihood customer may purchase the product.

BUY_IND

VARCHAR2(1)

Boolean value to indicate whether the customer may purchase the product. This indicates that a value 1 is BUY and a value of 0 is "NOT to BUY"


Cross-Sell Model Algorithm

SVM.

Model 5: Customer Sentiment Detection

The business problem is to measure customer sentiment regarding the service quality according to any text message received from the customer. Those text messages may be emails from customer, or written down by call center agents during call center calls, and so on.

This model leverages Text mining capability provided by Oracle database. For more information, see Oracle Data Mining Concepts.

Customer Sentiment Detection Source

The source table into mining algorithm is: dm_cust_cmmnt, which has columns of:

Table 10-8 Data Mining Source Columns in dm_cust_cmmnt Table

Attribute Datatype Description

CUST_KEY

NUMBER(30)

Customer Key

manual_score

VARCHAR2(40)

Manual scores or manually adjusted after reading

sentiment

VARCHAR2(40)

Sentiment scored by Mining Model

plusprob

NUMBER(20,16)

The probability of customer belonging to happy group

minusprob

NUMBER(20,16)

The probability of customer belonging to un-happy group

CUST_CMMNT

VARCHAR2(4000)

The text messages all together from the customer.


The procedure pkg_ocdm_mining.create_sentiment_svm_model(month_code) refreshes data in the table dm_cust_cmmnt and then refreshes the sentiment mining model. This procedure also populates the sentiment mining result table DWD_CUST_MNNG.

For more information, see "Oracle Communications Data Model Mining Result Tables".

Customer Sentiment Detection Output

The mined results are saved into target table with the following columns:

  • dwd_cust_mnng.SNTMNT_CTGRY_CD

  • dwd_cust_mnng.MANUAL_SNTMNT_CTGRY

  • dwd_cust_mnng.SNTMNT_PROB

Customer Sentiment Detection Algorithms

Oracle Text option to transform the text

SVM algorithm to train the model

Model 6: Life Time Value (LTV) Prediction

We want to tell how long customer will likely to continue to use the service (Survival), rather than leaving. And also we want to know how much value customer is likely to bring into the operator along their lifetime. This is a regression model. The source data are those customers on net at least 5 years ago; the model target is the age of customer. For those customers churned in less than 5 years, we know the exact age, but for those still on net, the age will be total lifetime.

The difference of this model to the Model 1 (Churn Prediction) is that this is a regression model rather than classification. The target Lifetime is a continuous real value.

Life Time Value (LTV) Prediction Source

The dmv_cust_ltv_prdct_src is the source table for LTV prediction model. This table is subset of the churn model source table dmv_cust_chrm_src_all. The customer joined in less than three years are filtered out from the training data set to provide a valid input into the model.

Life Time Value (LTV) Prediction Output

Life Time Span and Life Time Value (LTV) are the two target measures to predict. The results are saved into table dwd_cust_mining:

  • dwd_cust_mnng.LTV_BAND_CD

  • dwd_cust_mnng.LTV_VALUE

  • dwd_cust_mnng.LT_SRVVL_CD

  • dwd_cust_mnng.LT_SRVVL_VAL

The LTV_value and LT_SRVVL_VAL are the predicted real value from the model, and then binned into ten categories and form the other two columns: LTV_BAND_CD and LT_SRVVL_CD.

Life Time Value (LTV) Prediction Algorithm

Generalized Linear Model (GLM)