Data Mining Deployment Guide > Setting Up a Modeling Environment with Siebel Data Mining >

Identifying Data Mining Data Sources


Before your organization builds descriptive or predictive models, it must clearly articulate the business objectives and requirements of using data mining. These business requirements will determine what data is needed for modeling and will include the constraints that drive a closed-loop data mining process from modeling through deployment and performance measurement back to modeling.

Start by identifying all data sources that contain information relevant to your immediate data mining task. Choosing the right data sources is a crucial step in the entire data mining process. Look for data that captures information which might be helpful in building a meaningful model. Initially, more data is better, even data with uncertain predictive power. Most data mining techniques like Decision Trees and Neural Networks, are quite good at handling irrelevant or redundant data. Be aware that modeling is an iterative process; you can always ignore data in later stages of the data mining process.

Example of Data Mining Data Source Identification

In the churn management example, the wireless service provider has identified the reduction of customer churn rate as its business objective for using data mining. The provider wants to build predictive models of customer behavior that identify customers in danger of ending their relationship with the provider by taking their business to competitors. The company plans to use data from past churners and non-churners to analyze the root causes for customer churn and train a predictive churn model. It plans to generate churn propensity scores to drive marketing segmentation, devise retention campaigns, and deploy a predictive churn model in its Siebel Call Center for real-time churn propensity scoring. To model customer churn behavior, the provider has identified customer profile data from the Siebel Data Warehouse and transactional data plus third-party demographic data from an existing data mart as relevant data sources.

Siebel Analytics serves as a single logical data source. It lets you connect to several data sources without the need for data transformation and merging. The wireless service provider not only accesses the Siebel Data Warehouse through Siebel Analytics, but also its customer ledger database and a demographic data mart.

To use Siebel Analytics as your single data source for data mining, the client running Siebel Data Mining Workbench or Siebel Miner needs to set up an ODBC DSN pointing to Siebel Analytics. See the Siebel Analytics Server Administration Guide for details on installing the Siebel Analytics ODBC client software and configuring a Siebel Analytics ODBC DSN.

Data Mining Deployment Guide