Data Mining Wizard—Select an Algorithm Page

When creating a build task, you use this page to select the algorithm that is used in the data mining model.

The contents of the algorithm list may vary, depending upon whether third-party algorithms are registered. However, the following Data Mining Framework algorithms are always included in the list:

Association rules—Provides market basket analysis. Results are reported in the following form: nn% of users who buy Product X buy Product Y . For example, 80% of the people who buy root beer buy ice cream.
Clustering—Recognizes patterns and arranges items into groups. Clusters can provide information about market segmentation and can be used with various predictive tools. For example, clusters can predict the kinds of users that are most likely to respond to root beer ads.
Decision tree—Creates decision trees that can be used for classification and prediction. The algorithm results are the answers to a series of yes and no questions—A yes answer leads to one part of the tree, and a no answer leads to another part of the tree. For example, a decision tree can tell you to suggest ice cream to a customer who buys root beer.
Naive Bayes—Calculates conditional probabilities of outcome by examining the number of elements that occur with the target values. Naive Bayes algorithms assume that all elements are statistically independent. Naive Bayes generates a quick result. Therefore, you may want to use it before you use the clustering or decision-tree algorithms.
Neural net—Constructs a multi-layered set of nodes. The nodes of one layer connect to the nodes of the next layer and, thus, create a net that resembles human neurons (a net that models numerical dependencies between input and output data). Neural network algorithms generalize and learn from data. For example, neural networks can be used to predict financial results.
Regression—Identifies dependencies between one value and other values. For example, multilinear regression can determine how the amount of money spent on advertising and payroll affects sales values.

The following columns describe the listed algorithms, one row per algorithm:

Algorithm Name
Function Name—Functions performed by the algorithms
Java Class—Java classes that implemented the algorithms
Vendor—Vendors that supplied the algorithms
Information—Short descriptions of the algorithms
Owner—IDs of the people who added the algorithms to the system. !DM_Dispatcher indicates that an algorithm is part of Data Mining Framework.
Time—Dates and times that the algorithms were created

If you are working with a scoring task, you select the “Score on cube” option (if the model is applied to data within a database) or the “Score on data” option (if the model is applied to data that you provide).