CREATE_INDEX

Use the DBMS_VECTOR.CREATE_INDEX procedure to create a vector index.

Purpose

To create a vector index such as Hierarchical Navigable Small World (HNSW) vector index or Inverted File Flat (IVF) vector index.

Syntax

DBMS_VECTOR.CREATE_INDEX (
    idx_name                    IN VARCHAR2,
    table_name                  IN VARCHAR2,
    idx_vector_col              IN VARCHAR2,
    idx_include_cols            IN VARCHAR2 DEFAULT NULL,
    idx_partitioning_scheme     IN VARCHAR2 default 'LOCAL',
    idx_organization            IN VARCHAR2,
    idx_distance_metric         IN VARCHAR2 DEFAULT COSINE,
    idx_accuracy                IN NUMBER DEFAULT 90,
    idx_parameters              IN CLOB,
    idx_parallel_creation       IN NUMBER DEFAULT 1,
    idx_quantization_type       IN VARCHAR2 DEFAULT 'NONE',
    idx_compression_ratio       IN NUMBER DEFAULT NULL,
    idx_distribute_parameters   IN CLOB DEFAULT NULL,
    idx_duplicate_parameters    IN CLOB DEFAULT NULL,
    idx_online_build            IN BOOLEAN DEFAULT FALSE
);

Parameters

Parameter	Description
`idx_name`	Name of the index to create.
`table_name`	Table on which to create the index.
`idx_vector_col`	Vector column on which to create the index.
`idx_include_cols`	A comma-separated list of column names to be covered by the index.
`idx_partitioning_scheme`	Partitioning scheme for IVF indexes: `GLOBAL` `LOCAL` All IVF indexes are partitioned by centroid. Additionally, IVF indexes support both global and local indexes on partitioned tables. You can choose to create a local IVF index, which provides a one-to-one relationship between the base table partitions or subpartitions and the index partitions. For detailed information on these partitioning schemes, see Partition Maintenance Operations and Vector indexes.
`idx_organization`	Index organization: `NEIGHBOR PARTITIONS` `INMEMORY NEIGHBOR GRAPH` For detailed information on these organization types, see Manage the Different Categories of Vector Indexes.
`idx_distance_metric`	Distance metric or mathematical function used to compute the distance between vectors: `COSINE` (default) `MANHATTAN` `HAMMING` `JACCARD` `DOT` `EUCLIDEAN` `L2_SQUARED` `EUCLIDEAN_SQUARED` For detailed information on each of these metrics, see Vector Distance Functions and Operators.
`idx_accuracy`	Target accuracy at which the approximate search should be performed when running an approximate search query. As explained in Understand Approximate Similarity Search Using Vector Indexes, you can specify non-default target accuracy values either by specifying a percentage value or by specifying internal parameters values, depending on the index type you are using. For an HNSW approximate search: In the case of an HNSW approximate search, you can specify a target accuracy percentage value to influence the number of candidates considered to probe the search. This is automatically calculated by the algorithm. A value of 100 will tend to impose a similar result as an exact search, although the system may still use the index and will not perform an exact search. The optimizer may choose to still use an index as it may be faster to do so given the predicates in the query. Instead of specifying a target accuracy percentage value, you can specify the `EFSEARCH` parameter to impose a certain maximum number of candidates to be considered while probing the index. The higher that number, the higher the accuracy. For detailed information, see Understand Hierarchical Navigable Small World Indexes. For an IVF approximate search: In the case of an IVF approximate search, you can specify a target accuracy percentage value to influence the number of partitions used to probe the search. This is automatically calculated by the algorithm. A value of 100 will tend to impose an exact search, although the system may still use the index and will not perform an exact search. The optimizer may choose to still use an index as it may be faster to do so given the predicates in the query. Instead of specifying a target accuracy percentage value, you can specify the `NEIGHBOR PARTITION PROBES` parameter to impose a certain maximum number of partitions to be probed by the search. The higher that number, the higher the accuracy. For detailed information, see Understand Inverted File Flat Vector Indexes.
`idx_parameters`	Type of vector index and associated parameters. Specify the indexing parameters in JSON format: For HNSW indexes: `type`: Type of vector index to create, that is, `HNSW` `neighbors`: Maximum number of connections permitted per vector in the HNSW graph `efConstruction`: Maximum number of closest vector candidates considered at each step of the search during insertion `offload_credential_name`: The identifier of the credential that should be used when authenticating with the offload service of the Private AI Services Container. `offload_url`: The Private AI Services Container service endpoint for index creation. For example: `{ "type" : "HNSW", "neighbors" : 3, "efConstruction" : 4, "offload_credential_name" : "privateai", "offload_url" : "https://<myprivateaiservicehostname>/v1/index" }` For detailed information on these parameters, see Hierarchical Navigable Small World Index Syntax and Parameters. For information about the Private AI Services Container vector index service, see Oracle Private AI Services Container User's Guide. For IVF indexes: `type`: Type of vector index to create, that is, `IVF` `partitions`: Neighbor partition or cluster in which you want to divide your vector space `neighbor partition grouping`: Enables, in addition to partitioning, further data grouping in the storage layer for centroids in the IVF index. This can result in query performance benefits in serial, parallel, and multi-user (concurrent) queries due to additional pruning during the scan. This storage optimization is only supported for global IVF indexes. The default value is `OFF`. For example: `{ "type" : "IVF", "partitions" : 5, "neighbor partition grouping" : "ON" }` For detailed information on these parameters, see Inverted File Flat Index Syntax and Parameters.
`idx_parallel_creation`	Number of parallel threads used for index construction.
`idx_quantization_type`	Optional quantization algorithm to apply to the indexed vectors. The supported values are `'NONE'` and `'SCALAR'`.
`idx_compression_ratio`	Optional compression ratio to use with vector quantization. This is provided either as a numeric value or `NULL` and is only valid when quantization is enabled.
`idx_distribute_parameters`	Optional JSON object that is used to control distribution for distributed HNSW vector indexes. The value is provided as `NULL` or as a JSON `CLOB` that includes `distribute_method` and optionally a `service_name`. The following are supported as the value for `distribute_method`: `'ROWID RANGE'` `'SIMILARITY'` `'PARTITION'` `'SUBPARTITION'` `'DISTRIBUTE'` `'AUTO'` This parameter is only supported for HNSW indexes.
`idx_duplicate_parameters`	Optional JSON object that controls duplication for distributed HNSW vector indexes. The value is expected to be either `NULL` or a JSON `CLOB` that includes `duplicate_method` and optionally a `service_name`. The value provided for `duplicate_method` must be `'ALL'`. This parameter is only supported for HNSW indexes.
`idx_online_build`	Specifies whether the vector index should be built online. The values accepted are `TRUE` and `FALSE`. The default value is `FALSE`.

Examples

Specify neighbors and efConstruction for HNSW indexes:

BEDIN
  DBMS_VECTOR.CREATE_INDEX(
    idx_name                => 'v_hnsw_01', 
    table_name              => 'vpt01', 
    idx_vector_col          => 'EMBEDDING', 
    idx_include_cols        => NULL, 
    idx_partitioning_scheme => NULL, 
    idx_organization        => 'INMEMORY NEIGHBOR GRAPH', 
    idx_distance_metric     => 'EUCLIDEAN', 
    idx_accuracy            => 95, 
    idx_parameters          => '{"type" : "HNSW", 
                                 "neighbors" : 3, 
                                 "efConstruction" : 4}'
  );
END;
/

Specify the number of partitions for IVF indexes:

BEGIN
  DBMS_VECTOR.CREATE_INDEX(
    idx_name                => 'V_IVF_01', 
    table_name              => 'vpt01', 
    idx_vector_column       => 'EMBEDDING', 
    idx_include_cols        => NULL,
    idx_partitioning_scheme => 'GLOBAL', 
    idx_organization        => 'NEIGHBOR PARTITIONS', 
    idx_distance_metric     => 'EUCLIDEAN', 
    idx_accuracy            => 95, 
    idx_parameters          => '{"type" : "IVF", 
                                 "partitions" : 5,
                                 "neighbor partition grouping" : "ON"}',
    idx_online_build        => TRUE
  );
END;
/

Parent topic: DBMS_VECTOR