Clustering Algorithm

You use the clustering algorithm for unsupervised classification.

About the Algorithm

This algorithm examines data and determines how to split the data into groups, or clusters, based on specific properties of the data. The input required to build the model consists of a collection of vectors with numeric coordinates. The algorithm organizes the vectors into clusters based on their proximity to each other. The basic assumption is that the clusters are relatively smaller than the distance between them, and, therefore, can be effectively represented by their respective centers. Hence, the model consists of coordinates of center vectors.

You specify the maximum number of clusters to generate. The algorithm assigns sequential IDs to the clusters. In the apply phase, for each input vector, the algorithm assigns the most appropriate cluster ID.

Sequential runs on the same training set may produce slightly different results due to the stochastic nature of the method.

Parameter Values

The following table shows the parameters for the clustering algorithm.

Parameter Name	Description
ClusterCount defines the maximum number of clusters to generate.	The actual number of clusters generated might be under the maximum, depending on the data. Type: integer Default: 5
Aggregation defines the propensity of the algorithm to aggregate clusters.	This parameter determines how likely the algorithm is to group data into clusters. If aggregation is set relatively high, the algorithm is more likely to group data and hence is more likely to interpret this data as three clusters. On the other hand, if aggregation is set relatively low, the algorithm is more likely to interpret this data as four clusters. Type: double Range: 0 - 1, where the higher the number the more likely the algorithm is to group the data into a cluster. Hence, a higher setting produces fewer clusters and a lower setting produces more clusters. Example: Setting 0 turns each data point into a cluster (up to the maximum number of allowable clusters) and setting 1 returns a single cluster. Default: 1. Change the default to a realistic value.
RandomSeed defines the random seed generator.	The stochastic, or random, nature of this algorithm means that by default, sequential runs of the algorithm generate different results. However, if you set this parameter to any value other than 0 (the default), while the output is random, it is the same each time you set the parameter to the same value. Type: integer Default: 0

Parameter Name

Description

ClusterCount defines the maximum number of clusters to generate.

The actual number of clusters generated might be under the maximum, depending on the data.

Type: integer

Default: 5

Aggregation defines the propensity of the algorithm to aggregate clusters.

This parameter determines how likely the algorithm is to group data into clusters.

If aggregation is set relatively high, the algorithm is more likely to group data and hence is more likely to interpret this data as three clusters. On the other hand, if aggregation is set relatively low, the algorithm is more likely to interpret this data as four clusters.

Type: double

Range: 0 - 1, where the higher the number the more likely the algorithm is to group the data into a cluster. Hence, a higher setting produces fewer clusters and a lower setting produces more clusters.

Example: Setting 0 turns each data point into a cluster (up to the maximum number of allowable clusters) and setting 1 returns a single cluster.

Default: 1. Change the default to a realistic value.

RandomSeed defines the random seed generator.

The stochastic, or random, nature of this algorithm means that by default, sequential runs of the algorithm generate different results. However, if you set this parameter to any value other than 0 (the default), while the output is random, it is the same each time you set the parameter to the same value.

Type: integer

Default: 0

Accessor Values

The following table shows the required and optional predictors for the clustering algorithm. This algorithm uses unsupervised learning and does not require target values.

Member Expression	Sample Expression
Predictor.Predictor specifies member or member set to use for the predictor domain.	{[Fruit].Children})
Predictor.Sequence defines the sequence to be traversed for the predictor, generally a time dimension range.	{[Jan 1].Level.Members}
Predictor.External (optional) defines the scope of the predictor.	{[East].Children}
Predictor.Anchor (optional) specifies additional restrictions from other dimensions.	{([2001], [Actual], [Sales])}

Model Data

This information will be made available at a later release.

Result Data

This information will be made available at a later release.

Related Information