Set Up Schema and IVF Index for Multi-Vector Search

This section describes how to prepare your data model for multi-vector queries by defining a base table that stores vector embeddings alongside the business identifiers used for multi-vector grouping and optional filter columns. It also explains how to create an IVF vector index.

Define the Table Schema

Create a table to store your vector-embedded data along with the attributes that you will use for partitioning (for example, book_id and paragraph_id) and query filtering (for example, create_date). In this example, each book contains multiple paragraphs, and each row stores an individually embedded sentence vector.

CREATE TABLE books (
  record_id     NUMBER,
  book_id       NUMBER,
  paragraph_id  NUMBER,
  sentence      VECTOR,
  create_date   DATE
);

Create IVF Indexes

An IVF vector index speeds up vector similarity search by clustering vectors into centroids and searching only the most relevant centroid partitions at query time. For multi-vector queries, you can further improve performance by defining included columns in the index. Included columns are non-vector attributes that are stored with the IVF index structures, enabling the optimizer to satisfy eligible queries without reading the base table.

Use included columns when your multi-vector queries commonly:

project non-vector attributes (for example, book_id, paragraph_id)
filter on those attributes (for example, create_date)
use them in PARTITIONS BY clauses

The following example creates an IVF index on the sentence column to speed up the multi-vector similarity search:

CREATE VECTOR INDEX ivf_idx ON books (sentence)
  ORGANIZATION NEIGHBOR PARTITIONS
  WITH TARGET ACCURACY 95
  DISTANCE COSINE
  PARAMETERS (type IVF, neighbor partitions 100);

The IVF index is created on the sentence column of the books table, clustering vectors into 100 partitions for efficient similarity search with 95% accuracy using squared Euclidean distance.

Note:

COSINE is used here as an example distance function. Multi-vector similarity search works with any supported distance function.

Example: IVF Index With Included Columns

The following example creates an IVF vector index on the sentence vector column and includes book_id, paragraph_id, and create_date to support common multi-vector grouping and filtering patterns:

CREATE VECTOR INDEX ivf_idx ON books (sentence)
  INCLUDE (book_id, paragraph_id, create_date)
  ORGANIZATION NEIGHBOR PARTITIONS
  WITH TARGET ACCURACY 95
  DISTANCE COSINE
  PARAMETERS (type IVF, neighbor partitions 100);

Example: No-Filter Multi-Vector Query

With your schema and IVF index in place, you can perform efficient top-K queries with multi-level partitioning.

SELECT book_id, paragraph_id
FROM books
ORDER BY vector_distance(sentence, :query_vector, COSINE)
FETCH FIRST 2 PARTITIONS BY book_id,
                   3 PARTITIONS BY paragraph_id,
                   4 ROWS ONLY;

The query retrieves the two most relevant books (based on vector distance), then the three best paragraphs within each selected book, and finally the four closest sentences per (book, paragraph). The database can use the IVF index to identify the nearest centroids and search only the most relevant centroid partitions, improving response time by avoiding a full table scan and large global sort.

Example: Pre-Filter Multi-Vector Query (Filter on an Included Column)

With your schema and IVF index (including create_date) in place, you can add a filter on an included column while still using multi-level partitioning.

SELECT book_id, paragraph_id
FROM books
WHERE EXTRACT(YEAR FROM create_date) = 2020
ORDER BY VECTOR_DISTANCE(sentence, :query_vector, COSINE)
FETCH FIRST 2 PARTITIONS BY book_id,
                   3 PARTITIONS BY paragraph_id,
                   4 ROWS ONLY;

This query restricts the search to rows from year 2020 first, then applies the same multi-vector logic that return results from the two most relevant books, then the three best paragraphs per selected book, and finally the four closest sentences per (book, paragraph). Because create_date is an included column in the IVF index, the database can apply the filter while leveraging the IVF index to focus on the most relevant centroid partitions, reducing the need for a full table scan and large global sort.

Example: Multi-Vector Query with an Expression on a Partitioned Column

You can also use an expression in the PARTITIONS BY clause to group results at a derived level (for example, by year), while still ranking by vector distance.

SELECT book_id,
       paragraph_id,
       TRUNC(create_date, 'YYYY') AS create_year
FROM books
ORDER BY VECTOR_DISTANCE(sentence, :query_vector, COSINE)
FETCH FIRST 2 PARTITIONS BY book_id,
                   3 PARTITIONS BY TRUNC(create_date, 'YYYY'),
                   4 ROWS ONLY;

Here, the IVF index is used to prune the search space early by restricting evaluation to the most relevant centroid partitions, and then the multi-vector logic applies expression-based partitioning to enforce grouping in the final output.

Example: Multi-Vector Similarity Search as a Subquery

The following example demonstrates how a full multi-vector partitioned similarity search, leveraging the IVF vector index for efficient top-K retrieval at multiple grouping levels, can be wrapped as a subquery:

SELECT *
 FROM (
        SELECT /*+ vector_index_transform(CHUNKDOC_TAB) */
               pub_id, doc_id, chunk_id, sell_region, rating,
               vector_distance(embedding, :query_vec_1, COSINE) AS distance
        FROM   chunkdoc_tab
        ORDER BY distance
        FETCH FIRST 2 PARTITIONS BY pub_id, 
                    3 PARTITIONS BY doc_id, 
                    4 PARTITIONS BY chunk_id, 
                    2 ROWS ONLY
    )
 WHERE distance > 0.5
 ORDER BY pub_id, doc_id;

The outer query then applies additional filtering (such as distance > 0.5) and sorting. This pattern makes it easy to filter or organize your results after performing a fast, flexible vector search across groups in your data.

Example: Multi-Vector Join Query

You can also perform multi-vector similarity search in combination with a join to enrich your results with additional attributes from another table. For example, to join the results of the vector search on the books table with publisher region information from a publisher_info table:

SELECT /*+ VECTOR_INDEX_TRANSFORM(b) */
       b.book_id,
       b.paragraph_id,
       b.sentence,
       p.publisher_name,
       p.region,
       VECTOR_DISTANCE(b.sentence, :query_vector, COSINE) AS similarity
FROM   books b
JOIN   publisher_info p ON b.book_id = p.book_id
ORDER BY similarity
FETCH FIRST 3 PARTITIONS BY (p.region),
            2 PARTITIONS BY (b.book_id),
            2 ROWS ONLY;

This query finds, for each region, the two most relevant books, and for each book, the two closest sentences, all ranked by vector similarity to the query. By joining with publisher_info, you can include publisher metadata in your search results.

Parent topic: Perform Multi-Vector Similarity Search