Convert File to Embeddings

You can directly extract vector embeddings from a PDF document, using a single-step statement.

Perform a file-to-text-to-chunks-to-embeddings transformation (using a declared embedding model), by calling a set of DBMS_VECTOR_CHAIN.UTL functions in a single CREATE TABLE statement.

This statement creates a relational table (doc_chunks) from unstructured text chunks and the corresponding vector embeddings:

CREATE TABLE doc_chunks as
(select dt.id doc_id, et.embed_id, et.embed_data, to_vector(et.embed_vector) embed_vector
 from
   documentation_tab dt,
   dbms_vector_chain.utl_to_embeddings(
       dbms_vector_chain.utl_to_chunks(dbms_vector_chain.utl_to_text(dt.data), json('{"normalize":"all"}')),
       json('{"provider":"database", "model":"doc_model"}')) t,
   JSON_TABLE(t.column_value, '$[*]' COLUMNS (embed_id NUMBER PATH '$.embed_id', embed_data VARCHAR2(4000) PATH '$.embed_data', embed_vector CLOB PATH '$.embed_vector')) et
);

Note that each successive function depends on the output of the previous function, so the order of chains is important here. First, the output from utl_to_text (dt.data column) is passed as an input for utl_to_chunks and then the output from utl_to_chunks is passed as an input for utl_to_embeddings.

For complete example, run SQL Quick Start Using a Vector Embedding Model Uploaded into the Database, where you embed two Oracle Database Documentation books in the doc_chunks table and perform similarity searches using vector indexes.

Parent topic: Generate Embeddings: SQL and PL/SQL Examples