HNSW Index Architecture: Transaction Support and Persistence
Hierarchical Navigable Small World (HNSW) indexes are specialized, memory-only structures designed for efficient vector search. Once an HNSW index is created, any subsequent inserts, updates, or deletes (DML operations) performed on the base table will not be reflected in the index.
To maintain accuracy, performance, and consistency with transactional data operations, Oracle employs several supporting data structures. This page outlines the key internal components used in maintaining transactional vector search and persistence for HNSW indexes.
Transaction Support Structures:
-
A private journal is an in-memory, per-transaction structure that resides in the vector memory pool. It captures all vectors inserted or deleted by an active transaction. This is comparable to transaction journals that are used to maintain the in-memory column store data (explained in Oracle AI Database In-Memory Guide).
-
The shared journal is an on-disk, table-backed component created alongside each HNSW index. It holds the history of committed transactions that modified the indexed vector columns. Each journal entry is associated with a commit SCN, and includes metadata such as inserted and deleted vector identifiers.
-
Every HNSW index includes a dedicated ROWID-to-VID mapping table that links each base table row ID (ROWID) to the corresponding vector ID (VID) used in the HNSW graph.
Graph Refresh and Persistence Structures:
Queries that come after an index is created would need to lookup the index as well as the DMLs that occurred after it to get the top-K result. With accumulating DMLs, queries become slower as the exact search on the shared journal vectors becomes more expensive than the approximate search on the currently indexed HNSW graph. To maintain performance and accuracy, Oracle provides automated graph refresh and persistence mechanisms.
-
An incremental snapshot represents an updated in-memory version of the HNSW graph associated with a specific SCN. It adds newly inserted vectors into the graph and tracks deletions using compact bitmaps. Snapshots reduce shared journal overhead, improve query performance, and minimize memory usage. They are especially effective when frequent small DML operations occur.
Note:
- Any query that comes below the build SCN for the current latest snapshot runs into error
ORA 51815 "INMEMORY NEIGHBOR GRAPH HNSW vector index snapshot is too old."
- Any query that comes below the build SCN for the current latest snapshot runs into error
- When incremental refresh becomes inefficient, usually due to the accumulation of
deleted vectors, a full repopulation is triggered to rebuild the HNSW
graph from scratch using the current base table. Unlike incremental snapshots, which
updates an existing graph, full repopulation builds a fresh graph while keeping the
old one active for ongoing queries.
Note:
At the time of full repopulation (until a new HNSW graph becomes available), if a query tries to access an older version of the HNSW graph that no longer exists, then the read consistency error
ORA-51815 "INMEMORY NEIGHBOR GRAPH HNSW vector index snapshot is too old."is raised. -
A checkpoint is a disk-based, serialized copy of the HNSW graph’s topology and metadata (not the actual vectors). It is created automatically during key events like index creation or repopulation.
A distributed HNSW index creates multiple checkpoints, one for each localized HNSW graph hosted by each RAC node. Checkpointed graphs are not associated with specific physical nodes. Instead, they are associated with the node that owns the HNSW graph. The system tracks which localized HNSW graph belongs to which nodes to enable easy redistribution of index during node failures or removals. On node failure, any RAC node mapped to the same localized HNSW graph as the failed node directly reuses the existing checkpointed graph, leading to faster recovery and less downtime.
You can disable or re-enable full HNSW checkpoints by using the ENABLE_CHECKPOINT and DISABLE_CHECKPOINT procedures of the
DBMS_VECTORPL/SQL package.
Explicitly Specifying a Graph Refresh
You can use the idx_rebuild_mode parameter of the DBMS_VECTOR.REBUILD_INDEX procedure to specify how the HNSW graph should be refreshed. This parameter accepts values : FULL or INCREMENTAL. This enables a full graph rebuild. By default, idx_rebuild_mode is set to NULL. In this case, the system follows the existing behavior: drop and recreates the index. The idx_rebuild_mode allows for a fine-grained control when managing index refresh operations.
An example which triggers a full repopulation:
execute dbms_vector.rebuild_index('galaxies_hnsw_idx',
'galaxies',
'embedding',
NULL,
NULL,
'INMEMORY NEIGHBOR GRAPH',
'EUCLIDEAN',
95,
FULL,
'{"type" : "HNSW", "neighbors" : 3, "efConstruction" : 4 }') ;
Note:
This code assumes that the indexgalaxies_hnsw_idx is already
created on the galaxies table. See the Oracle documentation on
Hierarchical Navigable Small World (HNSW) index syntax and parameters
for guidance on creating an HNSW index.
HNSW indexes rely on these structures to balance performance and accuracy while retrieving transactionally consistent top-K results. Together, they ensure scalable, low-latency vector search even as tables evolve through frequent updates and deletes.
Parent topic: About In-Memory Neighbor Graph Vector Index