9.5 Use Text Embedding Vector in a Data Flow
Data Transforms supports the use of vector datatype and embedding vectors in a data flow. Currently, Data Transforms integrates with OCI Generative AI service to convert input text into vector embeddings that you can use for data analysis and searches.
For text embedding Data Transforms supports both the text stored in a column and from the http links stored in a column. Before you use embedding vectors in a data flow, you need to do the following:
- Create an Oracle Database 23ai connection. See Work with Connections for generic instructions on how to create a connection in Data Transforms.
- Create an Oracle Cloud Infrastructure (OCI) Generative AI connection. See Create and use an Oracle Cloud Infrastructure Generative AI Connection.
To use vector embeddings in a data flow:
- Follow the instructions in Create a Data Flow to create a new data flow.
- In the Data Flow Editor click Add a Schema to define your source connection. From the Connection drop-down, select the Oracle Database 23ai connection and the schema that you want to use from the drop down. Click OK.
- Drag the tables that you want to use as a source in the data flow and drop them on the design canvas.
- From the Database Functions toolbar, click Machine Learning and drag the Text Embedding Vector transformation component drop it on the design canvas.
- Click the Text Embedding Vector transformation component to view its properties.
- In the General tab, specify the following:
- AI Service - Select OCI Generative AI from the drop-down.
- Connection - The drop-down lists all the available connections for the selected AI Service. Select the connection that you want to use.
- AI Model - The drop-down lists all the available models for the
selected AI Service and Connection. The following models are listed:
- "cohere.embed-english-light-v2.0"
- "cohere.embed-english-light-v3.0"
- "cohere.embed-english-v3.0"
- "cohere.embed-multilingual-light-v3.0"
- "cohere.embed-multilingual-v3.0"
- In the Column Mapping tab, map the source column that you want
to embed to the INPUT attribute of the operator. The only column available in the
column mappings is
input_text
. Drag a text column from the available columns to the Expression column. This is the data that the vectors will be built on. - Drag the table that you want to use as a target in the data flow and drop it on the design canvas.
- Save and execute the data flow.
Data Transforms will build vectors for each of the rows in the source table and write that to the target table.
Parent topic: Data Flows