You use the clustering algorithm for unsupervised classification.
This algorithm examines data and determines how to split the data into groups, or clusters, based on specific properties of the data. The input required to build the model consists of a collection of vectors with numeric coordinates. The algorithm organizes the vectors into clusters based on their proximity to each other. The basic assumption is that the clusters are relatively smaller than the distance between them, and, therefore, can be effectively represented by their respective centers. Hence, the model consists of coordinates of center vectors.
You specify the maximum number of clusters to generate. The algorithm assigns sequential IDs to the clusters. In the apply phase, for each input vector, the algorithm assigns the most appropriate cluster ID.
Sequential runs on the same training set may produce slightly different results due to the stochastic nature of the method.
The following table shows the parameters for the clustering algorithm.
Parameter Name | Description |
---|---|
ClusterCount defines the maximum number of clusters to generate. | The actual number of clusters generated might be under the maximum, depending on the data. Type: integer Default: 5 |
Aggregation defines the propensity of the algorithm to aggregate clusters. | This parameter determines how likely the algorithm is to group data into clusters. If aggregation is set relatively high, the algorithm is more likely to group data and hence is more likely to interpret this data as three clusters. On the other hand, if aggregation is set relatively low, the algorithm is more likely to interpret this data as four clusters. Type: double Range: 0 - 1, where the higher the number the more likely the algorithm is to group the data into a cluster. Hence, a higher setting produces fewer clusters and a lower setting produces more clusters. Example: Setting 0 turns each data point into a cluster (up to the maximum number of allowable clusters) and setting 1 returns a single cluster. Default: 1. Change the default to a realistic value. |
RandomSeed defines the random seed generator. | The stochastic, or random, nature of this algorithm means that by default, sequential runs of the algorithm generate different results. However, if you set this parameter to any value other than 0 (the default), while the output is random, it is the same each time you set the parameter to the same value. Type: integer Default: 0 |
The following table shows the required and optional predictors for the clustering algorithm. This algorithm uses unsupervised learning and does not require target values.
Member Expression | Sample Expression |
---|---|
Predictor.Predictor specifies member or member set to use for the predictor domain. | {[Fruit].Children}) |
Predictor.Sequence defines the sequence to be traversed for the predictor, generally a time dimension range. | {[Jan 1].Level.Members} |
Predictor.External (optional) defines the scope of the predictor. | {[East].Children} |
Predictor.Anchor (optional) specifies additional restrictions from other dimensions. | {([2001], [Actual], [Sales])} |
This information will be made available at a later release.
This information will be made available at a later release.