Description: If two clusters overlap (the record sets that each cluster maps to overlap), then the smaller one, as measured by the estimated size of the record set it maps to, can be removed, depending on how big this overlap is. This parameter dictates the overlap above which the smaller cluster is removed.

Clusters which overlap by more than this value are removed. Thus, the default setting of 10 means that clusters that overlap by more than 10 out of 10 records are removed. Since this is impossible, this means that setting of 10 disables cluster overlap filtering, which is most extreme level of coarseness for this filter. Tuning this parameter down makes the cluster overlap more and more fine-grained. Thus, a value of 9 removes only the clusters that greatly overlap; setting it to the recommended value of 5 removes only clusters overlapping half-way or so (remember that the overlap is merely estimated). Setting this parameter to lower values (less than 5) makes overlap filtering quite sensitive and removes clusters which overlap even by a small amount. Note that clusters that do not overlap are never filtered.

Range: Integer, 0-10 (default: 10)

Recommended value: 5


Copyright © Legal Notices