Process Flow

At a high level, O-Cluster algorithm evaluates, splits the data into new partition, and searches for cutting planes inside the new partitions.

The O-Cluster algorithm evaluates possible splitting points for all projections in a partition, selects the best one, and splits the data into two new partitions. The algorithm proceeds by searching for good cutting planes inside the newly created partitions. Thus, O-Cluster creates a binary tree structure that divides the input space into rectangular regions with no overlaps or gaps.

The main processing stages are:

  1. Load the buffer. Assign all cases from the initial buffer to a single active root partition.

  2. Compute histograms along the orthogonal uni-dimensional projections for each active partition.

  3. Find the best splitting points for active partitions.

  4. Flag ambiguous and frozen partitions.

  5. When a valid separator exists, split the active partition into two new active partitions and start over at step 2.

  6. Reload the buffer after all recursive partitioning on the current buffer is completed. Continue loading the buffer until either the buffer is filled again, or the end of the data set is reached, or until the number of cases is equal to the data buffer size.

    Note:

    O-Cluster requires at most one pass through the data