Oracle Machine Learning for SQL Enhanced k-Means

Oracle Machine Learning offers an enhanced k-Means algorithm with efficient initialization, scalable parallel model build, and detailed cluster properties.

Oracle Machine Learning for SQL implements an enhanced version of the k-Means algorithm with the following features:

  • Distance function: The algorithm supports Euclidean and Cosine distance functions. The default is Euclidean.

  • Scalable Parallel Model build: The algorithm uses a very efficient method of initialization based on Bahmani, Bahman, et al. "Scalable k-means++." Proceedings of the VLDB Endowment 5.7 (2012): 622-633.

  • Cluster properties: For each cluster, the algorithm returns the centroid, a histogram for each attribute, and a rule describing the hyperbox that encloses the majority of the data assigned to the cluster. The centroid reports the mode for categorical attributes and the mean and variance for numerical attributes.

This approach to k-Means avoids the need for building multiple k-Means models and provides clustering results that are consistently superior to the traditional k-Means.