Data Mining Deployment Guide > Setting Up a Modeling Environment with Siebel Data Mining >

Preparing Data for Data Mining Modeling


This process involves leveraging Siebel Analytics to prepare data for modeling purposes. For most data mining techniques, you need to define a single customer mining record (CMR) table. This table encapsulates all model input variables, including independent variables and, in case of predictive modeling, a dependent or target variable.

Data Preparation for Siebel Analytics Applications and Siebel Miner Users

Siebel Analytics and the Siebel Data Warehouse provide prebuilt metadata for data mining purposes. It includes a data mining subject area that exposes various Siebel Miner repository data entities. The metadata provided also connects batch scores generated with Siebel Miner to Siebel entities like Customer, Product and Asset. This allows you to analyze batch scores in depth with Siebel Analytics and use batch scores as criteria for marketing segmentation.

Define your problem-specific CMR table with Siebel Analytics. The Siebel Data Warehouse and Analytics applications include a large number of predefined measures that are useful for building your CMR. Siebel Analytics ships with a sample CMR named Customer Churn in the data mining subject area. See the Siebel Analytics Server Administration Guide for details on configuring Analytics metadata.

Siebel Answers is a Web-based, ad-hoc query tool that can analyze, format, and display data visible to the Analytics server. You can use Siebel Answers to gain insight into variable distributions, and to identify outliers and correlations. See the Siebel Analytics User Guide for details on using Siebel Answers.

Data Preparation for Siebel Data Mining Workbench Users

In addition to examining data quality with the help of Siebel Answers, you can use Siebel Data Mining Workbench to reveal statistical characteristics of model input variables. Use the profiling and visual exploration functionalities of Siebel Data Mining Workbench to evaluate variable characteristics, including a number of unique values, missing values, average, standard deviation, minimum, maximum, medians, and quartiles. See the KnowledgeStudio User Guide in the Angoss folder on the Siebel eBusiness Third-Party Bookshelf for details.

Example of Data Mining Modeling Data Preparation

In the churn management example, the wireless service provider uses basic customer and account profile information already present in the Siebel Data Warehouse, and enriches the CMR by adding cell phone usage information and third-party demographic data from its existing data mart. The provider uses both Siebel Answers and Siebel Data Mining Workbench to identify variables with a high count of missing values and unexpected distributions, and decides to discard these variables for modeling purposes.

Data Mining Deployment Guide