Creating Hybrid Vector Indexes using DBMS_SEARCH API

The DBMS_SEARCH package provides procedures and functions for creating, managing, and querying hybrid vector indexes when multiple sources are used for the search. Sources includes tables and views as described in the DBMS_SEARCH documentation.

The CREATE_INDEX procedure can be used to create a hybrid vector index.

Syntax:
DBMS_SEARCH.CREATE_INDEX(
    index_name       VARCHAR2,
    tablespace       VARCHAR2 DEFAULT NULL,
    datatype         VARCHAR2 DEFAULT NULL,
    lexer            VARCHAR2 DEFAULT NULL,
    stoplist         VARCHAR2 DEFAULT NULL,
    wordlist         VARCHAR2 DEFAULT NULL,
    vectorizer       VARCHAR2 DEFAULT NULL
);

The syntax contains parameters that can define both text as well as vector search preferences. The parameters lexer, stoplist, and wordlist allow you to specify the text preferences for the hybrid vector index and the parameter vectorizer can be used to specify the vector preference for the hybrid search. When using the vectorizer parameter, you must first use the DBMS_VECTOR_CHAIN.CREATE_PREFERENCE PL/SQL function to create a vectorizer preference. For more information, see CREATE_PREFERENCE. After creating a vectorizer preference, you can use the vectorizer parameter to pass the preference name. You can define the lexer, stoplist and wordlist using CREATE_PREFERENCE and CREATE_STOPLIST functions.

Example:

The following example also displays the sample code to define lexer, stoplist, wordlist and vectorizer.

 exec CTX_DDL.CREATE_PREFERENCE('my_lexer','BASIC_LEXER');
 exec CTX_DDL.CREATE_STOPLIST('my_stoplist','BASIC_STOPLIST');
 exec CTX_DDL.CREATE_PREFERENCE('my_wordlist','BASIC_WORDLIST');

begin
    DBMS_VECTOR_CHAIN.CREATE_PREFERENCE(
       PREF_NAME => 'my_vectorizer_spec',
       PREF_TYPE =>  DBMS_VECTOR_CHAIN.VECTORIZER,
       PARAMS    => json(
                         '{"vector_idxtype" : "ivf",
                           "model"          : "ALL_MINILM_L12",
                           "by"             : "words",
                           "max"            : "100",
                           "overlap"        : "10",
                           "split"          : "recursively",
                           "language"       : "english"
                          }'));
 end;
 /

 begin
    DBMS_SEARCH.CREATE_INDEX(
       index_name => 'my_hybrid_idx',
       datatype   => 'JSON',
       lexer      => 'my_lexer',
       stoplist   => 'my_stoplist',
       wordlist   => 'my_wordlist',
       vectorizer => 'my_vectorizer_spec');
 end;
/

The DBMS_SEARCH package also contains a procedure, ADD_SOURCE, that allows you to add one or more data sources (tables, views, or duality views) from different schemas to the DBMS_SEARCH index.

Syntax:
DBMS_SEARCH.ADD_SOURCE (
index_name      VARCHAR2,
source_name     VARCHAR2,
memory          VARCHAR2 DEFAULT NULL,
parallel_degree NUMBER DEFAULT NULL);
Example:
exec DBMS_SEARCH.ADD_SOURCE('my_hybrid_idx','MYVIEW');
You can also specify more than one source. Oracle recommends that the sources are added rather serially than concurrently. Here is an example where you can add two sources to the same index (my_hybrid_idx) created above:
exec DBMS_SEARCH.ADD_SOURCE('my_hybrid_idx','MYVIEW1');
exec DBMS_SEARCH.ADD_SOURCE('my_hybrid_idx','MYVIEW2');
You can query the ctx_user_index_partitions view to check if the source is added to the index and is ready to be queried. The query
select ixp_index_partition_name, ixp_status from ctx_user_index_partitions 
where IXP_INDEX_NAME = 'my_hybrid_idx' and IXP_PARTITION_NAME LIKE ‘DSR$’;
would display the status of the sources as "INDEXED" indicating that they are indexed and ready to be queried. Sample output is shown below:
IXP_INDEX_PARTITION_NAME                               IXP_STATUS
------------------------------------------------------------------- 
DSR$1                                                  INDEXED
DSR$2                                                  INDEXED
When you use IVF index to create the hybrid vector index, Oracle recommends to run a rebuild of the index after the sources are added. Since our example uses IVF indexes, you would run the following command once all sources have been indexed:
alter index DR$ARTICLES_IDX$VI rebuild online;

The DBMS_SEARCH package also contains procedures that allow you to remove a data source and all its associated data from the index, remove the index and all its associated data from the database, return a virtual indexed JSON document for the specified source metadata, retrieve a hitlist, and produce an aggregation of JSON documents based on the specified filter conditions. Refer to DBMS_SEARCH Package for more details about the procedure, syntax, usage, restrictions, and so on.