About Distributed HNSW Index Availability and Performance During Cluster instance Failure

Distributed HNSW (Hierarchical Navigable Small World) indexes in a Real Application Clusters (RAC) environment are designed to stay reliable even when the cluster undergoes changes, such as instance failures or removals. The system continuously monitors cluster changes such as instance failures, additions, and removals. It tracks HNSW slice graph status, assignments, recovery tasks, and ownership changes, and ensures this information is consistently visible across all instances.

The following steps are automatically performed to handle cluster reconfiguration and recovery:
  • Disable index

    If an instance hosting a HNSW slice graph fails or becomes unreachable, the Oracle AI Database quickly detects which HNSW indexes are impacted. Affected HNSW indexes are temporarily disabled to prevent incomplete or incorrect query results.

  • Temporarily switch to a no-index plan

    Any open queries or processes using disabled indexes are invalidated. The affected queries are automatically switched to no-index plans such as full scans or alternative indexes. Until HNSW slice graphs from failed instances are reloaded on available instances, queries execute without using the distributed HNSW index. Oracle AI Database invalidates the cursors for queries that rely on disabled indexes and automatically re-compiles them once the HNSW slice graphs are available.

  • Redistribution of HNSW slice graphs and vector distribution units

    Once the cluster stabilizes, HNSW slice graphs from the failed instances are reassigned to available healthy instances based on memory availability.

    Vector distribution units are recomputed only when required. Since HNSW slice graphs are checkpointed, the Oracle AI Database can quickly reload the affected graphs on healthy instances without reprocessing the entire vector dataset.

    Note:

    In case of multiple concurrent instance failures or permanently reduced cluster size, the Oracle AI Database may not be able to reassign the HNSW slice graphs directly. If the available healthy instances cannot accommodate all HNSW slice graphs for an index, the Oracle AI Database falls back to rebuilding the index.