CREATE_INDEX

Use the DBMS_VECTOR.CREATE_INDEX procedure to create a vector index.

Purpose

To create a vector index such as Hierarchical Navigable Small World (HNSW) vector index or Inverted File Flat (IVF) vector index.

Syntax

DBMS_VECTOR.CREATE_INDEX (
    idx_name                    IN VARCHAR2,
    table_name                  IN VARCHAR2,
    idx_vector_col              IN VARCHAR2,
    idx_include_cols            IN VARCHAR2 DEFAULT NULL,
    idx_partitioning_scheme     IN VARCHAR2 default 'LOCAL',
    idx_organization            IN VARCHAR2,
    idx_distance_metric         IN VARCHAR2 DEFAULT COSINE,
    idx_accuracy                IN NUMBER DEFAULT 90,
    idx_parameters              IN CLOB,
    idx_parallel_creation       IN NUMBER DEFAULT 1,
    idx_quantization_type       IN VARCHAR2 DEFAULT 'NONE',
    idx_compression_ratio       IN NUMBER DEFAULT NULL,
    idx_distribute_parameters   IN CLOB DEFAULT NULL,
    idx_duplicate_parameters    IN CLOB DEFAULT NULL,
    idx_online_build            IN BOOLEAN DEFAULT FALSE
); 

Parameters

Parameter Description

idx_name

Name of the index to create.

table_name

Table on which to create the index.

idx_vector_col

Vector column on which to create the index.

idx_include_cols

A comma-separated list of column names to be covered by the index.

idx_partitioning_scheme

Partitioning scheme for IVF indexes:

  • GLOBAL

  • LOCAL

All IVF indexes are partitioned by centroid. Additionally, IVF indexes support both global and local indexes on partitioned tables. You can choose to create a local IVF index, which provides a one-to-one relationship between the base table partitions or subpartitions and the index partitions.

For detailed information on these partitioning schemes, see Partition Maintenance Operations and Vector indexes.

idx_organization

Index organization:

  • NEIGHBOR PARTITIONS

  • INMEMORY NEIGHBOR GRAPH

For detailed information on these organization types, see Manage the Different Categories of Vector Indexes.

idx_distance_metric

Distance metric or mathematical function used to compute the distance between vectors:

  • COSINE (default)
  • MANHATTAN
  • HAMMING
  • JACCARD
  • DOT
  • EUCLIDEAN
  • L2_SQUARED
  • EUCLIDEAN_SQUARED

For detailed information on each of these metrics, see Vector Distance Functions and Operators.

idx_accuracy

Target accuracy at which the approximate search should be performed when running an approximate search query.

As explained in Understand Approximate Similarity Search Using Vector Indexes, you can specify non-default target accuracy values either by specifying a percentage value or by specifying internal parameters values, depending on the index type you are using.

  • For an HNSW approximate search:

    In the case of an HNSW approximate search, you can specify a target accuracy percentage value to influence the number of  candidates considered to probe the search. This is automatically calculated by the algorithm. A value of 100 will tend to impose a similar result as an exact search, although the system may still use the index and will not perform an exact search. The optimizer may choose to still use an index as it may be faster to do so given the predicates in the query. Instead of specifying a target accuracy percentage value, you can specify the EFSEARCH parameter to impose a certain maximum number of candidates to be considered while probing the index. The higher that number, the higher the accuracy.

    For detailed information, see Understand Hierarchical Navigable Small World Indexes.

  • For an IVF approximate search:

    In the case of an IVF approximate search, you can specify a target accuracy percentage value to influence the number of partitions used to probe the search. This is automatically calculated by the algorithm. A value of 100 will tend to impose an exact search, although the system may still use the index and will not perform an exact search. The optimizer may choose to still use an index as it may be faster to do so given the predicates in the query. Instead of specifying a target accuracy percentage value, you can specify the NEIGHBOR PARTITION PROBES parameter to impose a certain maximum number of partitions to be probed by the search. The higher that number, the higher the accuracy.

    For detailed information, see Understand Inverted File Flat Vector Indexes.

idx_parameters

Type of vector index and associated parameters.

Specify the indexing parameters in JSON format:

  • For HNSW indexes:

    • type: Type of vector index to create, that is, HNSW

    • neighbors: Maximum number of connections permitted per vector in the HNSW graph

    • efConstruction: Maximum number of closest vector candidates considered at each step of the search during insertion

    • offload_credential_name: The identifier of the credential that should be used when authenticating with the offload service of the Private AI Services Container.

    • offload_url: The Private AI Services Container service endpoint for index creation.

    For example:

    {
        "type"                    : "HNSW", 
        "neighbors"               : 3, 
        "efConstruction"          : 4,
        "offload_credential_name" : "privateai",
        "offload_url"             : "https://<myprivateaiservicehostname>/v1/index"
    }

    For detailed information on these parameters, see Hierarchical Navigable Small World Index Syntax and Parameters.

    For information about the Private AI Services Container vector index service, see Oracle Private AI Services Container User's Guide.

  • For IVF indexes:

    • type: Type of vector index to create, that is, IVF

    • partitions: Neighbor partition or cluster in which you want to divide your vector space

    • neighbor partition grouping: Enables, in addition to partitioning, further data grouping in the storage layer for centroids in the IVF index. This can result in query performance benefits in serial, parallel, and multi-user (concurrent) queries due to additional pruning during the scan. This storage optimization is only supported for global IVF indexes. The default value is OFF.

    For example:

    {
        "type"                        : "IVF",
        "partitions"                  : 5,
        "neighbor partition grouping" : "ON"
    }

    For detailed information on these parameters, see Inverted File Flat Index Syntax and Parameters.

idx_parallel_creation

Number of parallel threads used for index construction.

idx_quantization_type

Optional quantization algorithm to apply to the indexed vectors.

The supported values are 'NONE' and 'SCALAR'.

idx_compression_ratio

Optional compression ratio to use with vector quantization.

This is provided either as a numeric value or NULL and is only valid when quantization is enabled.

idx_distribute_parameters

Optional JSON object that is used to control distribution for distributed HNSW vector indexes.

The value is provided as NULL or as a JSON CLOB that includes distribute_method and optionally a service_name. The following are supported as the value for distribute_method:

  • 'ROWID RANGE'
  • 'SIMILARITY'
  • 'PARTITION'
  • 'SUBPARTITION'
  • 'DISTRIBUTE'
  • 'AUTO'

This parameter is only supported for HNSW indexes.

idx_duplicate_parameters

Optional JSON object that controls duplication for distributed HNSW vector indexes.

The value is expected to be either NULL or a JSON CLOB that includes duplicate_method and optionally a service_name. The value provided for duplicate_method must be 'ALL'.

This parameter is only supported for HNSW indexes.

idx_online_build

Specifies whether the vector index should be built online.

The values accepted are TRUE and FALSE. The default value is FALSE.

Examples

  • Specify neighbors and efConstruction for HNSW indexes:

    BEDIN
      DBMS_VECTOR.CREATE_INDEX(
        idx_name                => 'v_hnsw_01', 
        table_name              => 'vpt01', 
        idx_vector_col          => 'EMBEDDING', 
        idx_include_cols        => NULL, 
        idx_partitioning_scheme => NULL, 
        idx_organization        => 'INMEMORY NEIGHBOR GRAPH', 
        idx_distance_metric     => 'EUCLIDEAN', 
        idx_accuracy            => 95, 
        idx_parameters          => '{"type" : "HNSW", 
                                     "neighbors" : 3, 
                                     "efConstruction" : 4}'
      );
    END;
    /
  • Specify the number of partitions for IVF indexes:

    BEGIN
      DBMS_VECTOR.CREATE_INDEX(
        idx_name                => 'V_IVF_01', 
        table_name              => 'vpt01', 
        idx_vector_column       => 'EMBEDDING', 
        idx_include_cols        => NULL,
        idx_partitioning_scheme => 'GLOBAL', 
        idx_organization        => 'NEIGHBOR PARTITIONS', 
        idx_distance_metric     => 'EUCLIDEAN', 
        idx_accuracy            => 95, 
        idx_parameters          => '{"type" : "IVF", 
                                     "partitions" : 5,
                                     "neighbor partition grouping" : "ON"}',
        idx_online_build        => TRUE
      );
    END;
    /