Vector Indexes
Hybrid Vector Index
The Hybrid Vector Index (HVI) is a new index that allows users to easily index and query their documents using a combination of full text search and semantic vector search to achieve higher quality search results.
The Hybrid Vector Index (HVI) simplifies the process of transforming documents into forms that are amenable both for vector similarity search and for textual search through a single index DDL.
The HVI provides a unified query API that allows users to run textual queries, vector similarity queries, or hybrid queries that leverage both of these approaches. This lets users easily customize the search experience and enhances search results.
Vector Indexes
Vector Indexes are a class of specialized indexing data structures that are used to efficiently store and search high-dimensional vector data. A vector index organizes vector data in a manner such that similar items (where similarity is defined by distance between two vectors) are grouped together, thus, making the search process extremely efficient. Unlike traditional database indexes, vector indexes are commonly used on large datasets to perform approximate similarity searches that can trade-off between query accuracy and query performance depending on the application's requirements.
This functionality enables efficient similarity searches and faster query performance for AI-driven applications. In addition, vector indexes scalability and support for high-dimensional data improve analytical insights and can lead to informed decision-making and a competitive business advantage.
Partition-Local Neighbor Partition Vector Index
This feature enables LOCAL indexing for Neighbor Partition Vector Indexes, optimizing search performance for partitioned tables. This feature conceptually creates a dedicated vector index for each partition, allowing queries with partition key filters to search only the relevant index partitions. As a result, vector searches are more efficient, leading to significantly lower response times when querying large partitioned datasets.
Large enterprise datasets are frequently partitioned by relational attributes to optimize performance. By enabling LOCAL Neighbor Partition Vector Indexes, users benefit from enhanced scalability and accelerated query performance through partition pruning. This approach also ensures more efficient data lifecycle management, making it ideal for handling large-scale enterprise workloads.
Persistent Neighbor Graph Vector Indexes
The HNSW vector index is an inmemory resident multi-layered graph index. The time taken to recreate the inmemory graph on a restart can be improved by having a disk checkpoint image of the graph. This feature adds the checkpoint format as well as the framework to take a disk checkpoint and then use it to recreate the inmemory resident graph structure.
Getting an index access plan after a restart can take a long time. A higher priority disk checkpoint based reload execution improves the time taken to get an index access plan after a restart.
Transactional Support for Neighbor Graph Vector Indexes
HNSW Index is an in-memory hierarchical graph index for vector data. In Oracle AI Database 26ai, Release Update 23.4 and 23.5, DMLs were not allowed on tables that have HNSW index built on their vector column(s). This feature enables transactions to be executed on such tables. Moreover, vector search queries that use the HNSW Index will see transactionally consistent results, based on their read snapshot. Transactional consistency is guaranteed even on Oracle RAC where the HNSW Index is duplicated on all instances in the Cluster, DMLs occur on one or more instances in the Cluster, and search queries can be executed on any instance in the Cluster.
HNSW Index is the fastest vector search index that Oracle offers in Oracle AI Database 26ai. Thus, customers want to use HNSW Index for search queries, while also issuing DML modifications on relational or vector columns in the underlying table. Since DMLs may render the in-memory HNSW Index structures stale, special protocols are added in this project to guarantee transactionally consistent results for customers.
Additional Predicate Support with Hybrid Vector Search
Hybrid vector search queries can now have additional WHERE
clause predicates on columns other than the indexed columns.
Hybrid vector search combines vector-distance and text-based search in a single query. There are situations where it is beneficial to add an additional filter predicate on columns that are not covered by the vector or text-based indexes. The FILTER_BY
field provides a method to supply additional filter predicates using standard SQL operators.
Duplicated HNSW Vector Indexes on RAC
HNSW Vector Indexes are now supported on RAC environments through full duplication on all instances of the cluster that have sufficient memory in the Vector Pool. On Oracle Autonomous AI Database Serverless deployments, the Vector Pool is autonomously managed.
All copies of the HNSW index across different RAC instances share the same ROWID-to-VID mapping table on disk. However, each instance builds its in-memory neighbor graph independently, and hence, its possible to get different results for approximate searches depending on which RAC instance is used to serve the query.
Enterprise customers often deploy Oracle AI Database in RAC environments. This feature enables creation of HNSW vector indexes for RAC through full duplication across all instances of the cluster. Queries directed at any instance of the RAC cluster can take advantage of HNSW vector index plans resulting in ultra-fast similarity searches.
Hybrid Vector Index for JSON
Hybrid Vector Indexes allow document retrieval by integrating full-text search capabilities with semantic vector search techniques, resulting in higher-quality search results. This powerful feature has now been extended to support JSON columns, offering greater flexibility in data indexing and querying.
Creating a Hybrid Vector Index on a JSON column offers a unified query API that enables users to execute various types of searches:
- Textual queries
- Vector similarity queries
- Hybrid queries that leverage both approaches
This versatile functionality allows users to:
- Easily customize the search experience
- Significantly enhance the quality and relevance of search results
IVF Index Online Reorganization
The quality of an IVF index may degrade over time if updates to the base table alter the general vector distribution. It is now possible to reorganize an IVF index while it remains available for DMLs and Queries.
IVF indexes can become unbalanced if the source table changes significantly from when the index was originally created. This can potentially impact the performance and quality of the index. With IVF Index online reorganization, it is possible to reorganize the structure of the index while the index remains online and available for DMLs and queries.
Included Columns Support for JSON, BLOB and CLOB Data Types
Included Columns in IVF (Neighbor Partition) vector indexes can now be of type JSON
, BLOB
or CLOB
.
Included Columns permit additional non-vector columns in a base table to be stored in an IVF (Neighbor Partition) vector index. By storing the extra columns in the index, the query execution no longer needs an additional table access to retrieve the underlying columns from the base table.
Included Columns in Neighbor Partition Vector Indexes
Included columns in vector indexes facilitate faster searches with attribute filters by incorporating non-vector columns within a Neighbor Partition Vector Index. This feature optimizes query execution by removing the need to access these columns from the base table.
Sophisticated workloads often combine business data search on relational columns with vector similarity search. Having included columns in Neighbor Partition Vector Indexes significantly enhances enterprise search capabilities by integrating attribute filters with vector-based similarity searches.
This integration facilitates efficient execution of complex queries by:
- Directly evaluating attribute filters in tandem with vector searches
- Eliminating the need for base table access through expensive join operations.
Moreover, when the index includes all columns required for a query as covering columns, data can be retrieved directly from the index, thereby accelerating query performance.
Partition Maintenance Operations and Direct Load with Global IVF and HNSW Indexes
Partition Maintenance Operations can now be performed on partitioned tables that have global IVF and HNSW Indexes. These operations can be applied to tables that have been partitioned using various methods, including RANGE
, LIST
, HASH
, and COMPOSITE
.
Partition maintenance operations such as adding, dropping, merging, and splitting partitions can be performed on tables with global IVF and HNSW Indexes. One of the key benefits of partitioning is the added flexibility of being able to perform maintenance operations on a subset (or partition) of a table in isolation, without impacting the rows in neighboring partitions. This includes tables with Vectors.