Hierarchical Navigable Small World Index Syntax and Parameters
Syntax and examples for creating Hierarchical Navigable Small World vector indexes.
Syntax
CREATE VECTOR INDEX vector_index_name
ON table_name (vector_column)
[GLOBAL] ORGANIZATION INMEMORY [NEIGHBOR] GRAPH
[WITH] [DISTANCE [CUSTOM [<schema>.][<package>.]] metric name]
[WITH TARGET ACCURACY percentage_value]
[QUANTIZATION SCALAR COMPRESSION RATIO {2|4|8}]
[PARAMETERS (TYPE HNSW ,
{NEIGHBORS max_closest_vectors_connected | M max_closest_vectors_connected},
EFCONSTRUCTION max_candidates_to_consider,
[OFFLOAD_CREDENTIAL_NAME credential_name_containing_gpu_access_token,
OFFLOAD_URL https_endpoint_for_gpu_index_construction]
RESCORE FACTOR rescore_factor,
ALGORITHM UNIFORM_QUANTIZATION
)]
[PARALLEL degree_of_parallelism]HNSW Parameters
NEIGHBORS and M are equivalent and represent the maximum number of neighbors a vector can have on any layer. The last vertex has one additional flexibility that it can have up to 2M neighbors.
EFCONSTRUCTION represents the maximum number of closest vector candidates considered at each step of the search during insertion.
OFFLOAD_CREDENTIAL_NAME is the
identifier of the credential that should be used when authenticating with the
Private AI Services Container GPU offloading service. The referenced credential name
should be provided as a JSON object with an access_token field
containing the API key to access the container. If
OFFLOAD_CREDENTIAL_NAME is provided,
OFFLOAD_URL must also be provided (and vice versa).
https:// URL with the hostname or IP address of the remote
server and the /v1/index path. For example,
https://<my_privateai_service_hostname>/v1/index
where <my_privateai_service_hostname> is the URL that the
Oracle AI Database can use to connect to the Private AI Services Container. For more
information about the Private AI Services Container, see Oracle
Private AI Services Container User's GuideRESCORE FACTOR is an optional integer
parameter that defines the rescoring multiplier for semantic search queries. It has
a default value of 1 and is applied only when the index is quantized (when
QUANTIZATION is specified).
The valid range for HNSW vector index parameters are:
ACCURACY |
> 0 and <= 100 |
DISTANCE |
|
QUANTIZATION |
SCALAR |
TYPE |
HNSW |
NEIGHBORS |
> 0 and <= 2048 |
EFCONSTRUCTION |
> 0 and <= 65535 |
RESCORE FACTOR |
>=1 and <= 100 (default 1) |
ALGORITHM |
UNIFORM_QUANTIZATION |
Usage Notes
You can optionally use the QUANTIZATION keyword together with the
COMPRESSION RATIO parameter in order to apply scalar
quantization to a new vector index.
The COMPRESSION RATIO specifies the level of compression for a
vector column when creating a scalar-quantized index. Higher compression ratios
yield smaller storage requirements but increased quantization. The possible values
for COMPRESSION RATIO are 2, 4, and 8.
The following table shows the resulting data type stored in the index for each compression ratio, depending on the original vector column format.
| Compression Ratio | FLOAT64 Input | FLOAT32 Input | FLOAT16 Input |
|---|---|---|---|
| 2 | Float32 | Float16 | Int8 |
| 4 | Float16 | Int8 | Not Supported |
| 6 | Int8 | Not Supported | Not Supported |
Note that "not supported" means that scalar quantization is not available for the given combination of compression ratio and vector column format.
Note:
Any parallel DML executed against a table with an HNSW index is automatically converted to a serial DML. HNSW indexes are not currently supported with parallel DML operations.Examples
CREATE VECTOR INDEX galaxies_hnsw_idx ON galaxies (embedding)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
DISTANCE COSINE
WITH TARGET ACCURACY 95;
CREATE VECTOR INDEX galaxies_hnsw_idx ON galaxies (embedding)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
DISTANCE COSINE
WITH TARGET ACCURACY 90
PARAMETERS (
TYPE HNSW,
NEIGHBORS 40,
EFCONSTRUCTION 500
) PARALLEL 8;
CREATE VECTOR INDEX galaxies_quantized_idx ON galaxies (embedding)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
DISTANCE EUCLIDEAN
WITH TARGET ACCURACY 90
QUANTIZATION SCALAR COMPRESSION RATIO 2
PARAMETERS (
TYPE HNSW,
NEIGHBORS 32,
EFCONSTRUCTION 200,
RESCORE FACTOR 2,
ALGORITHM UNIFORM_QUANTIZATION
);
CREATE VECTOR INDEX gist_idx ON gist(embedding)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
DISTANCE EUCLIDEAN
PARAMETERS (
TYPE HNSW,
NEIGHBORS 32,
EFCONSTRUCTION 200,
OFFLOAD_CREDENTIAL_NAME mycredname,
OFFLOAD_URL 'https://<container_url>:<port>/v1/index'
) PARALLEL 4;
For detailed information, see CREATE VECTOR INDEX in Oracle AI Database SQL Language Reference.
- Online Hierarchical Navigable Small World Index
In situations where a base table needs to be available for ongoing updates and cannot be locked during index creation, you can create your HNSW (Hierarchical Navigable Small World) index online with theONLINEclause ofCREATE VECTOR INDEX. This way an application with heavy DML need not stop updating the base table for indexing. - Function-Based Hierarchical Navigable Small World Index
Oracle AI Database has long supported function-based indexes, allowing indexing expressions and computed values. The function-based HNSW index is an extension of this capability which applies the function-based indexing paradigm to the vector data type.
Parent topic: In-Memory Neighbor Graph Vector Index
Online Hierarchical Navigable Small World Index
In situations where a base table needs to be available for ongoing updates and cannot be locked during index creation, you can create your HNSW (Hierarchical Navigable Small World) index online with the ONLINE clause of CREATE VECTOR INDEX. This way an application with heavy DML need not stop updating the base table for indexing.
Syntax
CREATE VECTOR INDEX index_name ON table_name(vector_column)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
[WITH TARGET ACCURACY 95]
[DISTANCE EUCLIDEAN]
PARAMETERS (type HNSW, neighbors 32, efConstruction 500)
ONLINE;Usage Notes
- All vectors indexed need to have the same dimension and storage type throughout the index creation. Any changes or inconsistencies during index creation result in errors (for example, ORA-51902, ORA-51934).
- If the base table experiences a very high rate of DML during index build, then index creation time may increase.
- Online index creation with included columns (columns in addition to the vector column) is not currently supported.
- Online distributed HNSW index creation is not currently supported.
Example
CREATE VECTOR INDEX galaxies_hnsw_idx ON galaxies(embedding)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
DISTANCE EUCLIDEAN
WITH TARGET ACCURACY 95
PARAMETERS (type HNSW, neighbors 32, efConstruction 500)
ONLINE;Function-Based Hierarchical Navigable Small World Index
Oracle AI Database has long supported function-based indexes, allowing indexing expressions and computed values. The function-based HNSW index is an extension of this capability which applies the function-based indexing paradigm to the vector data type.
This index is created on an expression instead of a vector column. The index expression can be an arithmetic, conditional expression or any other expression that produces a vector. Oracle AI Database computes the value of the expression and stores the resulting vectors in the index. The base table does not get an extra vector column; the index holds the vector values resulting from the expression.
The index is automatically maintained and remains consistent whenever a DML operation is performed.
Function-based HNSW indexes are especially useful to extract vector values from JSON documents for indexing. When creating the index, you provide an expression to extract vectors from the JSON document. At the time of indexing, the vector data is extracted from JSON documents by evaluating the specified expression.
When a query includes an expression that matches a function-based vector index expression, the optimizer automatically rewrites and routes queries to leverage the function-based index. This boosts query performance for similarity searches.
Usage Notes
- Online index creation is required; offline mode is not permitted.
- The expression in the index must return a single vector value per row.
- Only one expression per function-based vector index; multi- vector expression or hybrid vector indexes are not supported.
- The expression length is limited (currently 1000 characters).
- If a column that is used within the expression of the index is dropped, the associated index is also automatically dropped.
- Scalar quantization is not currently supported.
- RAC Distributed HNSW Indexes are not supported.
- There are currently the following two syntax types supported for extracting the vectors from JSON documents:
Dot-pathnotation:Example:
t.doc.v.vector()json_valueapplication:Example:
json_value(doc, '$.v' returning vector(512, float32))It is often beneficial to specify the dimension and element type of the vectors for best performance and correctness. However, with the dot-path notation, you cannot currently specify these attributes. The only way to do so is using the
json_valuesyntax.
Syntax
CREATE VECTOR INDEX index_name ON table_name (expression)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
ONLINE;Examples
Example 1
Indexing a vector extracted from JSON document. Indexes the vector in the field v inside the JSON column doc.
Example JSON document:
v is a JSON array representing the vector you want to index.{
"name": "Paul",
"age": 60,
"v": [1, 2, 3],
"salary": 1000000
}Index creation statement:
CREATE VECTOR INDEX vecidx ON mytable t (t.doc.v.vector())
ORGANIZATION INMEMORY NEIGHBOR GRAPH
ONLINE;Example 2
Indexing a vector extracted from JSON document. Indexes the vector in the field street_embedding from the address object inside the JSON column doc. In this example, thejson_value SQL function instead of dot-path notation to specify the vector dimension and the vector element types. This is enables to improve performance when building the index.
Example JSON document:
street_embedding is an array of numbers (a vector) inside the nested address object of the JSON column doc. This example JSON document is in the extended JSON format. The extended JSON format enables Oracle Database to recognize vectors within textual JSON data. When this text is ingested as extended JSON, Oracle Database can store the resulting vectors in an optimized binary format, enabling faster extraction and processing. For more information about how Oracle Database handles extended JSON data, see JSON Data Type.
{
"id": 1,
"doc": {
"address": {
"street": "500 Oracle Pkwy",
"street_embedding": {
"$vector": [0.12, 0.19, 0.75, 0.44, 0.08, 0.63, 0.27, 0.91],
"$vectorElementType": "float32"
},
"zip_code": "94065"
}
}
}CREATE VECTOR INDEX addr_street_ejson_jv_ix
ON addr_docs t (JSON_VALUE(t.doc, '$.address.street_embedding' RETURNING VECTOR(8, float32)))
ORGANIZATION INMEMORY NEIGHBOR GRAPH
WITH TARGET ACCURACY 95
DISTANCE EUCLIDEAN
ONLINE;Example 3
Indexing a Computed Vector Expression. Indexes the sum of two vector columns, (v + u)
CREATE VECTOR INDEX vecidx ON mytable (v + u)
ORGANIZATION INMEMORY NEIGHBOR GRAPH
ONLINE;Example 4
country_code = 'US') is satisfied; otherwise, stores NULL in the index.CREATE VECTOR INDEX emp_resume_us_ix
ON employees_resume e (DECODE(e.country_code, 'US', e.resume_embedding, NULL))
ORGANIZATION INMEMORY NEIGHBOR GRAPH
WITH TARGET ACCURACY 95
DISTANCE EUCLIDEAN
ONLINE;