Set Up Schema and IVF Index for Multi-Vector Search
This section describes how to prepare your data model for multi-vector queries by defining a base table that stores vector embeddings alongside the business identifiers used for multi-vector grouping and optional filter columns. It also explains how to create an IVF vector index.
Define the Table Schema
Create a table to store your vector-embedded data along with the attributes
that you will use for partitioning (for example, book_id and
paragraph_id) and query filtering (for example,
create_date). In this example, each book contains multiple paragraphs,
and each row stores an individually embedded sentence vector.
CREATE TABLE books (
record_id NUMBER,
book_id NUMBER,
paragraph_id NUMBER,
sentence VECTOR,
create_date DATE
);
Create IVF Indexes
An IVF vector index speeds up vector similarity search by clustering vectors into centroids and searching only the most relevant centroid partitions at query time. For multi-vector queries, you can further improve performance by defining included columns in the index. Included columns are non-vector attributes that are stored with the IVF index structures, enabling the optimizer to satisfy eligible queries without reading the base table.
- project non-vector attributes (for example,
book_id,paragraph_id) - filter on those attributes (for example,
create_date) - use them in
PARTITIONS BYclauses
sentence
column to speed up the multi-vector similarity
search:CREATE VECTOR INDEX ivf_idx ON books (sentence)
ORGANIZATION NEIGHBOR PARTITIONS
WITH TARGET ACCURACY 95
DISTANCE COSINE
PARAMETERS (type IVF, neighbor partitions 100);The IVF index is created on the sentence column of the
books table, clustering vectors into 100 partitions for efficient
similarity search with 95% accuracy using squared Euclidean distance.
Note:
COSINE is used here as an example distance function. Multi-vector
similarity search works with any supported distance function.
Example: IVF Index With Included Columns
The following example creates an IVF vector index on the
sentence vector column and includes book_id,
paragraph_id, and create_date to support common
multi-vector grouping and filtering patterns:
CREATE VECTOR INDEX ivf_idx ON books (sentence)
INCLUDE (book_id, paragraph_id, create_date)
ORGANIZATION NEIGHBOR PARTITIONS
WITH TARGET ACCURACY 95
DISTANCE COSINE
PARAMETERS (type IVF, neighbor partitions 100);
Example: No-Filter Multi-Vector Query
SELECT book_id, paragraph_id
FROM books
ORDER BY vector_distance(sentence, :query_vector, COSINE)
FETCH APPROX FIRST 2 PARTITIONS BY book_id,
3 PARTITIONS BY paragraph_id,
4 ROWS ONLY;The query retrieves the two most relevant books (based on vector distance), then the three best paragraphs within each selected book, and finally the four closest sentences per (book, paragraph). The database can use the IVF index to identify the nearest centroids and search only the most relevant centroid partitions, improving response time by avoiding a full table scan and large global sort.
Example: Pre-Filter Multi-Vector Query (Filter on an Included Column)
create_date) in
place, you can add a filter on an included column while still using multi-level
partitioning.SELECT book_id, paragraph_id
FROM books
WHERE EXTRACT(YEAR FROM create_date) = 2020
ORDER BY VECTOR_DISTANCE(sentence, :query_vector, COSINE)
FETCH APPROX FIRST 2 PARTITIONS BY book_id,
3 PARTITIONS BY paragraph_id,
4 ROWS ONLY;This query restricts the search to rows from year 2020 first, then applies the same
multi-vector logic that return results from the two most relevant books, then the three best
paragraphs per selected book, and finally the four closest sentences per (book, paragraph).
Because create_date is an included column in the IVF index, the database
can apply the filter while leveraging the IVF index to focus on the most relevant centroid
partitions, reducing the need for a full table scan and large global sort.
Example: Multi-Vector Query with an Expression on a Partitioned Column
PARTITIONS BY clause to
group results at a derived level (for example, by year), while still ranking by vector
distance.SELECT book_id,
paragraph_id,
TRUNC(create_date, 'YYYY') AS create_year
FROM books
ORDER BY VECTOR_DISTANCE(sentence, :query_vector, COSINE)
FETCH APPROX FIRST 2 PARTITIONS BY book_id,
3 PARTITIONS BY TRUNC(create_date, 'YYYY'),
4 ROWS ONLY;Here, the IVF index is used to prune the search space early by restricting evaluation to the most relevant centroid partitions, and then the multi-vector logic applies expression-based partitioning to enforce grouping in the final output.
Example: Multi-Vector Similarity Search as a Subquery
The following example demonstrates how a full multi-vector partitioned similarity search, leveraging the IVF vector index for efficient top-K retrieval at multiple grouping levels, can be wrapped as a subquery:
SELECT *
FROM (
SELECT /*+ vector_index_transform(CHUNKDOC_TAB) */
pub_id, doc_id, chunk_id, sell_region, rating,
vector_distance(embedding, :query_vec_1, COSINE) AS distance
FROM chunkdoc_tab
ORDER BY distance
FETCH FIRST 2 PARTITIONS BY pub_id,
3 PARTITIONS BY doc_id,
4 PARTITIONS BY chunk_id,
2 ROWS ONLY
)
WHERE distance > 0.5
ORDER BY pub_id, doc_id;
The outer query then applies additional filtering (such as distance > 0.5) and sorting. This pattern makes it easy to filter or organize your results after performing a fast, flexible vector search across groups in your data.
Example: Multi-Vector Join Query
You can also perform multi-vector similarity search in combination with a join to enrich
your results with additional attributes from another table. For example, to join the results
of the vector search on the books table with publisher region information
from a publisher_info table:
SELECT /*+ VECTOR_INDEX_TRANSFORM(b) */
b.book_id,
b.paragraph_id,
b.sentence,
p.publisher_name,
p.region,
VECTOR_DISTANCE(b.sentence, :query_vector, COSINE) AS similarity
FROM books b
JOIN publisher_info p ON b.book_id = p.book_id
ORDER BY similarity
FETCH FIRST 3 PARTITIONS BY (p.region),
2 PARTITIONS BY (b.book_id),
2 ROWS ONLY;
This query finds, for each region, the two most relevant books, and for each book, the two
closest sentences, all ranked by vector similarity to the query. By joining with
publisher_info, you can include publisher metadata in your search
results.
Parent topic: Perform Multi-Vector Similarity Search