Creating Hybrid Vector Indexes using DBMS_SEARCH API
The DBMS_SEARCH
package provides procedures and functions
for creating, managing, and querying hybrid vector indexes when multiple sources are used
for the search. Sources includes tables and views as described in the DBMS_SEARCH
documentation.
The CREATE_INDEX procedure can be used to create a hybrid vector index.
DBMS_SEARCH.CREATE_INDEX(
index_name VARCHAR2,
tablespace VARCHAR2 DEFAULT NULL,
datatype VARCHAR2 DEFAULT NULL,
lexer VARCHAR2 DEFAULT NULL,
stoplist VARCHAR2 DEFAULT NULL,
wordlist VARCHAR2 DEFAULT NULL,
vectorizer VARCHAR2 DEFAULT NULL
);
The syntax contains parameters that can define both text as well as vector
search preferences. The parameters lexer
, stoplist
,
and wordlist
allow you to specify the text preferences for the hybrid
vector index and the parameter vectorizer
can be used to specify the
vector preference for the hybrid search. When using the vectorizer
parameter, you must first use the DBMS_VECTOR_CHAIN.CREATE_PREFERENCE
PL/SQL function to create a vectorizer preference. For more information, see CREATE_PREFERENCE. After creating a vectorizer
preference, you can use the vectorizer parameter to pass the preference name. You can
define the lexer
, stoplist
and
wordlist
using CREATE_PREFERENCE and CREATE_STOPLIST functions.
Example:
exec CTX_DDL.CREATE_PREFERENCE('my_lexer','BASIC_LEXER');
exec CTX_DDL.CREATE_STOPLIST('my_stoplist','BASIC_STOPLIST');
exec CTX_DDL.CREATE_PREFERENCE('my_wordlist','BASIC_WORDLIST');
begin
DBMS_VECTOR_CHAIN.CREATE_PREFERENCE(
PREF_NAME => 'my_vectorizer_spec',
PREF_TYPE => DBMS_VECTOR_CHAIN.VECTORIZER,
PARAMS => json(
'{"vector_idxtype" : "ivf",
"model" : "ALL_MINILM_L12",
"by" : "words",
"max" : "100",
"overlap" : "10",
"split" : "recursively",
"language" : "english"
}'));
end;
/
begin
DBMS_SEARCH.CREATE_INDEX(
index_name => 'my_hybrid_idx',
datatype => 'JSON',
lexer => 'my_lexer',
stoplist => 'my_stoplist',
wordlist => 'my_wordlist',
vectorizer => 'my_vectorizer_spec');
end;
/
The DBMS_SEARCH
package also contains a procedure, ADD_SOURCE, that allows you to add one or more data sources
(tables, views, or duality views) from different schemas to the
DBMS_SEARCH
index.
DBMS_SEARCH.ADD_SOURCE (
index_name VARCHAR2,
source_name VARCHAR2,
memory VARCHAR2 DEFAULT NULL,
parallel_degree NUMBER DEFAULT NULL);
exec DBMS_SEARCH.ADD_SOURCE('my_hybrid_idx','MYVIEW');
my_hybrid_idx
) created
above:exec DBMS_SEARCH.ADD_SOURCE('my_hybrid_idx','MYVIEW1');
exec DBMS_SEARCH.ADD_SOURCE('my_hybrid_idx','MYVIEW2');
ctx_user_index_partitions
view to check if the source
is added to the index and is ready to be queried. The query
select ixp_index_partition_name, ixp_status from ctx_user_index_partitions
where IXP_INDEX_NAME = 'my_hybrid_idx' and IXP_PARTITION_NAME LIKE ‘DSR$’;
would
display the status of the sources as "INDEXED" indicating that they are indexed and
ready to be queried. Sample output is shown
below:IXP_INDEX_PARTITION_NAME IXP_STATUS
-------------------------------------------------------------------
DSR$1 INDEXED
DSR$2 INDEXED
rebuild
of the index after the sources are added. Since our example
uses IVF indexes, you would run the following command once all sources have been
indexed:alter index DR$ARTICLES_IDX$VI rebuild online;
The DBMS_SEARCH
package also contains procedures that allow
you to remove a data source and all its associated data from the index, remove the index
and all its associated data from the database, return a virtual indexed JSON document
for the specified source metadata, retrieve a hitlist, and produce an aggregation of
JSON documents based on the specified filter conditions. Refer to DBMS_SEARCH Package for more details about the procedure,
syntax, usage, restrictions, and so on.
Parent topic: Manage Hybrid Vector Indexes