About Distributed HNSW Index Availability and Performance During Cluster instance Failure
Distributed HNSW (Hierarchical Navigable Small World) indexes in a Real Application Clusters (RAC) environment are designed to stay reliable even when the cluster undergoes changes, such as instance failures or removals. The system continuously monitors cluster changes such as instance failures, additions, and removals. It tracks instance local graph status, assignments, recovery tasks, and ownership changes, and ensures this information is consistently visible across all instances.
-
Disable index
If an instance hosting instance local graph fails or becomes unreachable, the system quickly detects which HNSW indexes are impacted. Affected HNSW indexes are temporarily disabled to prevent incomplete or incorrect query results.
-
Temporarily switch to a no-index plan
Any open queries or processes using disabled indexes are invalidated. The affected queries are automatically switched to no-index plans such as full scans or alternative indexes. Until instance local graphs from failed instances are reloaded on available nodes, queries execute without using the distributed HNSW index.
-
Redistribution of instance local graphs and vector distribution units
Once the cluster stabilizes, instance local graphs from the failed instances are reassigned to available healthy instances based on memory availability.
Vector distribution units are recomputed only when required. This is because instance local graphs are checkpointed, the system can quickly reload the affected graphs on healthy instances without reprocessing the entire vector dataset.
Note:
In case of multiple concurrent instance failures or permanently reduced cluster size, the system may not be able to reassign the graphs directly. If the available healthy instances cannot accommodate all instance local graphs for an index, the system falls back to rebuilding the index.
Parent topic: HNSW Distribution on Oracle RAC