HNSW Graph Persistence

In addition to the previously defined structures used mostly for transaction consistency, a full checkpoint on-disk structure can also be maintained, if enabled, for faster reload of HNSW indexes after instance restart.

HNSW Full Checkpoints

A full checkpoint is a serialized version of the HNSW graph, stored on disk and containing all the vertices and edges of the HNSW multi-layered graph. A full checkpoint is self-contained and has roughly the same footprint as the corresponding HNSW in-memory graph. As explained in Transaction Support for HNSW Indexes, a full checkpoint is created at both the index creation time, snapshot creation time and repopulation operation.

Note:

  • Full checkpoints are not created with every new snapshot. Instead, a full checkpoint is generated only after a certain number of snapshots have been created.
  • The vectors corresponding to the vertices are not stored in the checkpoint. As vectors are the primary consumers of space, not storing them in the checkpoint would ensure that storage is not doubled; only the necessary space for graph edges and other metadata is used.

HNSW full checkpoints are used to reduce the HNSW graph creation time when a new instance joins an Oracle RAC cluster or when an instance is restarted. The main advantage of using the full checkpoint over using the ROWID-to-VID mapping table and creating a new graph is that the neighbors for a particular vector have already been computed and persisted in the full checkpoint.

Note:

Although a HNSW full checkpoint might not reflect the very latest transactions, as it isn’t maintained for every DML, it will ultimately become up-to-date. Checkpoints are created following a certain number of graph refreshes, and each graph refresh happens after a defined number of DMLs.

When you create an HNSW index, the full checkpoint creation and maintenance is enabled by default.

Note:

The HNSW full checkpoint can only be maintained provided there is adequate space in the user's tablespace.

You can disable or re-enable full HNSW checkpoints by using the DBMS_VECTOR package:

  • Disable means drop existing full checkpoint for a particular index and do not create new full checkpoints:

    DBMS_VECTOR.DISABLE_CHECKPOINT(<schema owning indexes> [, <index name>])
  • Enable (default) means the next HNSW graph repopulation will create a full checkpoint for a particular HNSW index:

    DBMS_VECTOR.ENABLE_CHECKPOINT(<schema owning indexes> [, <index name>] [, <tablespace name>])

For more information, see the ENABLE_CHECKPOINT and DISABLE_CHECKPOINT procedures in Vector Index Status, Checkpoint, and Advisor Procedures.

You can query the V$VECTOR_GRAPH_INDEX_CHECKPOINTS view to track information about full checkpoints at the database level. See V$VECTOR_GRAPH_INDEX_CHKPT for more details.

A full checkpoint is used to reload the HNSW graph in memory for an instance if its creation SCN is not too old as compared to the current instance's SCN. If the SCN is too old, then that instance does a full repopulation of the index using the duplication mechanism.

Consider that an index was created at SCN 100, and subsequently due to increasing DMLs, incremental snapshots are created at SCN 200 and SCN 400 respectively. Considering the cost involved in creating a full checkpoint, they are not created for every snapshot. In this case, let us assume that a full checkpoint is created during the index creation at SCN 100 and then only at SCN 400. A graph reload at SCN 500, would now load the graph from the latest full checkpoint at SCN 400 and the remaining graph is recreated by reading the delta content from the shared journal table.