Perform Multi-Vector Similarity Search

Another major use-case of vector search is multi-vector search. Multi-vector search is typically associated with a multi-document search, where documents are split into chunks that are individually embedded into vectors.

A multi-vector search consists of retrieving top-K vector matches using grouping criteria known as partitions based on the documents' characteristics. This ability to score documents based on the similarity of their chunks to a query vector being searched is facilitated in SQL using the partitioned row limiting clause.

With multi-vector search, it is easier to write SQL statements to answer the following type of question:

If they exist, what are the four best matching sentences found in the three best matching paragraphs of the two best matching books?

For example, imagine if each book in your database is organized into paragraphs containing sentences which have vector embedding representations, then you can answer the previous question using a single SQL statement such as:

SELECT bookId, paragraphId, sentence
FROM books
ORDER BY vector_distance(sentence_embedding, :sentence_query_vector)
FETCH EXACT FIRST 2 PARTITIONS BY bookId, 3 PARTITIONS BY paragraphId, 4 ROWS ONLY;

You can also use an approximate similarity search instead of an exact similarity search as shown in the following example:

SELECT bookId, paragraphId, sentence
FROM books
ORDER BY vector_distance(sentence_embedding, :sentence_query_vector)
FETCH FIRST 2 PARTITIONS BY bookId, 3 PARTITIONS BY paragraphId, 4 ROWS ONLY
WITH TARGET ACCURACY 90;

Note:

All the rows returned are ordered by VECTOR_DISTANCE() and not grouped by the partition clause.

Note:

The APPROX and APPROXIMATE keywords are optional. If omitted while connected to an ADB-S instance, an approximate search using a vector index is attempted if one exists.

Semantically, the previous SQL statement is interpreted as:

Sort all records in the books table in descending order of the vector distance between the sentences and the query vector.
For each record in this order, check its bookId and paragraphId. This record is produced if the following three conditions are met:
1. Its bookId is one of the first two distinct bookId in the sorted order.
2. Its paragraphId is one of the first three distinct paragraphId in the sorted order within the same bookId.
3. Its record is one of the first four records within the same bookId and paragraphId combination.
Otherwise, this record is filtered out.

Multi-vector similarity search is not just for documents and can be used to answer the following questions too:

Return the top K closest matching photos but ensure that they are photos of different people.
Find the top K songs with two or more audio segments that best match this sound snippet.

Note:

This partition row-limiting clause extension is a generic extension of the SQL language. It does not have to apply just to vector searches.
Multi-vector search is currently supported with the IVF index.

Multi-Vector Search Using IVF Indexes
In many real-world use cases, multi-vector similarity searches must be performed on large datasets, requiring efficient indexing to improve performance. The Oracle AI Database offers support for accelerated similarity searches using IVF vector indexes, which can significantly reduce latency and resource utilization compared to scanning entire tables.
Set Up Schema and IVF Index for Multi-Vector Search
This section describes how to prepare your data model for multi-vector queries by defining a base table that stores vector embeddings alongside the business identifiers used for multi-vector grouping and optional filter columns. It also explains how to create an IVF vector index.