Bookshelf v7.7: Choosing a Representative Data Sample for Modeling

Data Mining Deployment Guide > Setting Up a Modeling Environment with Siebel Data Mining >

Choosing a Representative Data Sample for Modeling

For model training purposes, you should always use a representative data sample. A representative sample is sufficient to capture the inherent patterns and rules in the data. Depending on your objectives and the immediate modeling task, try to limit the size of your modeling data (involving training, test and validation) to a few thousand records. Because data mining is an iterative and interactive discovery process, several models often need to be trained and validated. An excessively large data set for model training increases the time needed to build models and unnecessarily slows down the overall data mining process. The size of your training data set will determine your hardware requirements. See the KnowledgeStudio User Guide and the Siebel Miner documentation in the Angoss folder on the Siebel eBusiness Third-Party Bookshelf, and Release Notes on Siebel SupportWeb, for details and guidance on choosing a suitable hardware configuration for modeling purposes.

Data Mining Deployment Guide