Included Columns with HNSW Indexes
Included columns in vector indexes facilitate faster searches with attribute filters by incorporating non-vector columns within an In-Memory Neighbor Graph Vector Index.
Consider a classic example where you are analyzing the home price data. You want only the top 5 homes from your description where the price is less than 2 million dollars and/or from the specific zip code. Or to a finer granularity, you just need to retrieve only the top home that matches the description and your filter attribute. Included columns make this possible by incorporating filterable attributes, such as price and zip code, directly into the vector index. This integration allows the index to evaluate and return results based on both the vector similarity and the attribute filters, without requiring additional steps to cross-reference the base table.
By bridging the gap between traditional database filters and AI-powered vector searches, included columns deliver faster, more efficient results. This synergy enhances search performance, especially in use cases requiring fine-grained filtering alongside advanced similarity computations.
Syntax and Specifications
CREATE TABLE houses (id NUMBER, zip_code NUMBER,
price NUMBER, description CLOB, data_vector VECTOR);price as
follows:CREATE VECTOR INDEX vidx_hnsw
ON houses (data_vector)
INCLUDE (price)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
WITH TARGET ACCURACY 95;The above syntax creates a vector index, VIDX_HNSW, with
ORGANIZATION as INMEMORY NEIGHBOR GRAPH on the
DATA_VECTOR column where PRICE is an included
column. The INCLUDE keyword allows a user to specify the attributes
they wish to include in the HNSW index.
You can check the included columns by querying the
V$VECTOR_GRAPH_INDEX view:
SELECT OWNER, INDEX_NAME, COVERING_COLS
FROM V$VECTOR_GRAPH_INDEX;
OWNER INDEX_NAME COVERING_COLS
------- ------------ --------------------------------
SYS VIDX_HNSW PRICE
Note:
Restrictions onINCLUDE :
The INCLUDE list can contain:
- Limited to a maximum of 31 columns - Oracle indexes support 32 key columns.
- Supports only the following data types :
NUMBER,CHAR,VARCHAR2,DATE,TIMESTAMP,RAW,JSON, and reference-based LOB types such asBLOBandCLOB. - Does not support NLS types and
LONGtypes.
Benefits of Using Included Columns
- HNSW No-Filter with and without Included Columns:
If you consider a non-filter query that fetches the five homes nearest to a specified vector, the resulting query will result in an execution plan with multiple joins.
SELECT /*+ VECTOR_INDEX_TRANSFORM(t) */ price FROM houses t ORDER BY VECTOR_DISTANCE(data_vector, :query_vector) FETCH APPROX FIRST 5 ROWS ONLY;Note:
query_vectorcontains the actual input vector. You can use the instructions mentioned in SQL Quick Start Using a FLOAT32 Vector Generator to generate query_vector.The multiple joins are displayed in the following execution plan:
PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------- Plan hash value: 2484688059 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 20 | 3 (67)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 5 | 20 | 3 (67)| 00:00:01 | |* 3 | SORT ORDER BY STOPKEY | | 5 | 7965 | 3 (67)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID| HOUSES | 5 | 7965 | 2 (50)| 00:00:01 | | 5 | VECTOR INDEX HNSW SCAN | VIDX_HNSW | 5 | 7965 | 2 (50)| 00:00:01 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5)You can see in the execution plan that the
VIDX_HNSWindex is scanned for the closest vectors and then the base tableHOUSESis accessed by theINDEX ROWIDreturned. This is because after we have located the vectors in the HNSW Index, we return to theHOUSEStable to retrieve the non-indexed columns from the base table.The following execution plan displays the same operation when using an In-Memory Neighbor Graph vector index (HNSW) created with included columns:
PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------- Plan hash value: 656317923 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 20 | 3 (67)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 5 | 20 | 3 (67)| 00:00:01 | |* 3 | SORT ORDER BY STOPKEY | | 5 | 7965 | 3 (67)| 00:00:01 | | 4 | VECTOR INDEX HNSW SCAN| VIDX_HNSW | 5 | 7965 | 2 (50)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5)As you can see in the execution plan above, adding included columns to a vector index ensures there is no base table pre-filter evaluation required. The query can be resolved by scanning only the HNSW vector index as the columns queried can be sourced directly from the index.
- HNSW Filtering with Included Columns:
One of the more commonly used plans for neighbor graph vector indexes is the pre-filter plan. In this case you are filtering the contents of the base table to eliminate non-relevant rows. But evaluating a query without using a filter attribute can be very expensive as the operation involves several joins. The following query uses a neighbor partition vector index without included columns:
SELECT /*+ vector_index_transform(houses vidx_hnsw in_filter_with_join_back) */ price FROM houses WHERE price < 2000000 ORDER BY vector_distance(data_vector, :query_vector) FETCH APPROX FIRST 5 ROWS ONLY;Note:
query_vectorcontains the actual input vector. You can use the instructions mentioned in SQL Quick Start Using a FLOAT32 Vector Generator to generate query_vector.On the other hand, if you use the same in-filter query on the HNSW index created withpriceas theincludedcolumn, the execution plan shows fewer joins, as the join with the base table is avoided. This is shown in the plan below:PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------ Plan hash value: 942384497 ------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 5 | 20 | 1272 (1)| 00:00:01 | |* 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 5 | 20 | 1272 (1)| 00:00:01 | |* 3 | SORT ORDER BY STOPKEY | | 5 | 8025 | 1272 (1)| 00:00:01 | | 4 | VECTOR INDEX HNSW SCAN IN-FILTER| VIDX_HNSW | 5 | 8025 | 1271 (1)| 00:00:01 | | 5 | VIEW | VW_HIJ_3C2A15A9 | 1 | | 2 (0)| 00:00:01 | |* 6 | FILTER | | | | | | | 7 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=5) 3 - filter(ROWNUM<=5) 6 - filter("HOUSES"."PRICE"<2000000)
Parent topic: In-Memory Neighbor Graph Vector Index