Create and Use Custom Vocabulary
Create and use your own vocabulary of tokens when chunking data.
Here, you use the chunker helper function
CREATE_VOCABULARY from the DBMS_VECTOR_CHAIN package to load custom vocabulary. This vocabulary file contains a list of tokens, recognized by your vector embedding model's tokenizer.
After loading the token vocabulary, you can now use the
BY VOCABULARY chunking mode (with VECTOR_CHUNKS or UTL_TO_CHUNKS) to split data by counting the number of tokens.
Related Topics
Parent topic: Configure Chunking Parameters