Included Columns with HNSW Indexes

Consider a classic example where you are analyzing the home price data. You want only the top 5 homes from your description where the price is less than 2 million dollars and/or from the specific zip code. Or to a finer granularity, you just need to retrieve only the top home that matches the description and your filter attribute. Included columns make this possible by incorporating filterable attributes, such as price and zip code, directly into the vector index. This integration allows the index to evaluate and return results based on both the vector similarity and the attribute filters, without requiring additional steps to cross-reference the base table.

By bridging the gap between traditional database filters and AI-powered vector searches, included columns deliver faster, more efficient results. This synergy enhances search performance, especially in use cases requiring fine-grained filtering alongside advanced similarity computations.

Syntax and Specifications

Create a base table, which contains house sales data with prices, descriptions, and vectors:

CREATE TABLE houses (id NUMBER, zip_code NUMBER, 
    price NUMBER, description CLOB, data_vector VECTOR);

Then, you can create an HNSW vector index with an included column on price as follows:

CREATE VECTOR INDEX vidx_hnsw
ON houses (data_vector)
INCLUDE (price)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
WITH TARGET ACCURACY 95;

The above syntax creates a vector index, VIDX_HNSW, with ORGANIZATION as INMEMORY NEIGHBOR GRAPH on the DATA_VECTOR column where PRICE is an included column. The INCLUDE keyword allows a user to specify the attributes they wish to include in the HNSW index.

You can check the included columns by querying the V$VECTOR_GRAPH_INDEX view:

SELECT OWNER, INDEX_NAME, COVERING_COLS
FROM V$VECTOR_GRAPH_INDEX;


OWNER   INDEX_NAME   COVERING_COLS
------- ------------ --------------------------------
SYS     VIDX_HNSW    PRICE

Note:

Restrictions on INCLUDE :

The INCLUDE list can contain:

Limited to a maximum of 31 columns - Oracle indexes support 32 key columns.
Supports only the following data types : NUMBER, CHAR, VARCHAR2, DATE, TIMESTAMP, RAW, JSON, and reference-based LOB types such as BLOB and CLOB.
Does not support NLS types and LONG types.

Benefits of Using Included Columns

HNSW No-Filter with and without Included Columns:

If you consider a non-filter query that fetches the five homes nearest to a specified vector, the resulting query will result in an execution plan with multiple joins.

SELECT /*+ VECTOR_INDEX_TRANSFORM(t) */ price 
FROM houses t 
ORDER BY VECTOR_DISTANCE(data_vector, :query_vector) 
FETCH APPROX FIRST 5 ROWS ONLY;

Note:

query_vector contains the actual input vector. You can use the instructions mentioned in SQL Quick Start Using a FLOAT32 Vector Generator to generate query_vector.

The multiple joins are displayed in the following execution plan:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 2484688059

--------------------------------------------------------------------------------------------
| Id | Operation                      | Name        | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |             |    5 |    20 |      3 (67)| 00:00:01 |
|* 1 |  COUNT STOPKEY                 |             |      |       |            |          |
|  2 |   VIEW                         |             |    5 |    20 |      3 (67)| 00:00:01 |
|* 3 |    SORT ORDER BY STOPKEY       |             |    5 |  7965 |      3 (67)| 00:00:01 |
|  4 |     TABLE ACCESS BY INDEX ROWID| HOUSES      |    5 |  7965 |      2 (50)| 00:00:01 |
|  5 |      VECTOR INDEX HNSW SCAN    | VIDX_HNSW   |    5 |  7965 |      2 (50)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)

You can see in the execution plan that the VIDX_HNSW index is scanned for the closest vectors and then the base table HOUSES is accessed by the INDEX ROWID returned. This is because after we have located the vectors in the HNSW Index, we return to the HOUSES table to retrieve the non-indexed columns from the base table.

The following execution plan displays the same operation when using an In-Memory Neighbor Graph vector index (HNSW) created with included columns:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 656317923

---------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |            |     5 |    20 |      3 (67)| 00:00:01 |
|* 1 |  COUNT STOPKEY            |            |       |       |            |          |
|  2 |   VIEW                    |            |     5 |    20 |      3 (67)| 00:00:01 |
|* 3 |    SORT ORDER BY STOPKEY  |            |     5 |  7965 |      3 (67)| 00:00:01 |
|  4 |     VECTOR INDEX HNSW SCAN| VIDX_HNSW  |     5 |  7965 |      2 (50)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)

As you can see in the execution plan above, adding included columns to a vector index ensures there is no base table pre-filter evaluation required. The query can be resolved by scanning only the HNSW vector index as the columns queried can be sourced directly from the index.

HNSW Filtering with Included Columns:

One of the more commonly used plans for neighbor graph vector indexes is the pre-filter plan. In this case you are filtering the contents of the base table to eliminate non-relevant rows. But evaluating a query without using a filter attribute can be very expensive as the operation involves several joins. The following query uses a neighbor partition vector index without included columns:

SELECT /*+ vector_index_transform(houses vidx_hnsw in_filter_with_join_back) */ 
price
FROM houses
WHERE price < 2000000
ORDER BY vector_distance(data_vector, :query_vector)
FETCH APPROX FIRST 5 ROWS ONLY;

Note:

query_vector contains the actual input vector. You can use the instructions mentioned in SQL Quick Start Using a FLOAT32 Vector Generator to generate query_vector.

On the other hand, if you use the same in-filter query on the HNSW index created with price as the included column, the execution plan shows fewer joins, as the join with the base table is avoided. This is shown in the plan below:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
Plan hash value: 942384497

------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                 |     5 |    20 |  1272   (1)| 00:00:01 |
|* 1 |  COUNT STOPKEY                      |                 |       |       |            |          |
|  2 |   VIEW                              |                 |     5 |    20 |  1272   (1)| 00:00:01 |
|* 3 |    SORT ORDER BY STOPKEY            |                 |     5 |  8025 |  1272   (1)| 00:00:01 |
|  4 |     VECTOR INDEX HNSW SCAN IN-FILTER| VIDX_HNSW       |     5 |  8025 |  1271   (1)| 00:00:01 |
|  5 |      VIEW                           | VW_HIJ_3C2A15A9 |     1 |       |     2   (0)| 00:00:01 |
|* 6 |       FILTER                        |                 |       |       |            |          |
|  7 |        FAST DUAL                    |                 |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)
   6 - filter("HOUSES"."PRICE"<2000000)