oraclesai.clustering.AgglomerativeClustering

class AgglomerativeClustering(n_clusters=2, metric='euclidean', linkage='ward', distance_threshold=None, n_jobs=None, spatial_weights_definition=None)

Agglomerative Clustering Algorithm. Each observation starts in its own cluster; then, the two closest clusters are merged to form one cluster; the process is repeated until a stopping condition is met or until one cluster remains. By defining spatial weights, the algorithm executes Regionalization, including a spatial constraint that causes elements of the same cluster to be geographically connected.

Parameters:
  • n_clusters – int, default=2. The number of clusters to form

  • metric – str or callable, default=”euclidean”. The metric to use when calculating the distance between observations.

  • linkage – {‘ward’, ‘complete’, ‘average’, ‘single’}, default=’ward’. Determines the distance to use. The algorithm merges pairs of cluster that minimize this criterion. ‘ward’ minimizes the variance of the clusters. ‘average’ uses the average of the distances of each observation of the two clusters. ‘complete’ uses the maximum distances between all observations of the two clusters. ‘single’ uses the minimum distances between all observations of the two clusters.

  • distance_threshold – float, default=None. The linkage distance threshold. If not None, then n_clusters must be None

  • n_jobs – int, default=None. The number of parallel jobs to run

  • spatial_weights_definition – SpatialWeightsDefinition, default=None. Spatial relationship specification. Defines the criteria used to identify neighbors, for example, KNNWeightsDefinition, DistanceBandWeightsDefinition, etc.

Methods

__init__([n_clusters, metric, linkage, ...])

fit(X[, y, geometries, spatial_weights, crs])

Initially, all observations are associated with a different cluster; then it merges the two closest clusters according to the linkage parameter; it continues doing this until the number of clusters is equal to n_clusters or until the distance between the two nearest clusters is greater than``distance_threshold``.

fit_predict(X[, y, geometries, ...])

Trains the clustering model and returns the labels assigned to each observation.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

Attributes

METRIC_PRECOMPUTED

NON_NEIGHBOR_DISTANCE

isoperimetric_quotient_

The Isoperimetric quotient (IPQ) for the resulting clusters.

labels_

Array indicating the cluster associated with each sample.

n_clusters_

The number of clusters.

silhouette_score_

The Silhouette score for the resulting clusters.