9.6 Use Embedding Vectors in a Data Flow
Data Transforms supports the use of vector datatype and embedding vectors in a data flow. Data Transforms integrates with OCI Generative AI service to convert input text or images into vector embeddings that you can use for data analysis and searches.
Before you use embedding vectors in a data flow, you need to do the following:
- Create an Oracle Database 23ai connection. See Work with Connections for generic instructions on how to create a connection in Data Transforms.
- Create an Oracle Cloud Infrastructure (OCI) Generative AI connection. See Create and use an Oracle Cloud Infrastructure Generative AI Connection.
- Create a data flow. See Create a Data Flow to explore the different options you can use to create a data flow.
You can add the following types of embedding in a data flow:
Use Text Embedding Vectors in a Data Flow
For text embedding Data Transforms supports both the text stored in a column and from the http links stored in a column.
To use text vector embeddings in a data flow:
- Select the data flow from the list displayed in the Data Flows page of your project.
- In the Data Flow Editor click Add a Schema to define your source. From the Connection drop-down, select the Oracle Database 23ai connection and the schema that you want to use from the drop down. Click OK. Similarly define the target connection.
- From the left panel, drag the table that you want to use as a source in the data flow and drop it on the design canvas.
- From the Database Functions toolbar, click Machine Learning and drag the Text Embedding Vector transformation component drop it on the design canvas.
- Select the source object on the design canvas, and drag the Connector icon (
) next to it to connect it to the Text Embedding Vector transformation component.
- Click the Text Embedding Vector transformation component to view its properties on the right panel.
- In the General tab, specify the following:
- AI Service - Select OCI Generative AI from the drop-down.
- Connection - The drop-down lists all the available connections for the selected AI Service. Select the connection that you want to use.
- AI Model - The drop-down lists all the available models for the selected AI Service and Connection. The following models are listed:
- cohere.embed-english-light-v2.0
- cohere.embed-english-light-v3.0
- cohere.embed-english-v3.0
- cohere.embed-multilingual-light-v3.0
- cohere.embed-multilingual-v3.0
- In the Column Mapping tab, map the source column that you want to embed to the INPUT attribute of the operator. The only column available in the column mappings is
input_text. Drag a text column from the available columns to the Expression column. This is the data that the vectors will be built on. - Drag the table that you want to use as a target in the data flow and drop it on the design canvas.
- Select the the Text Embedding Vector transformation component and drag the Connector icon (
) next to it to connect it to the target object.
- Save (
) and execute (
) the data flow.
Data Transforms will build vectors for each of the rows in the source table and write that to the target table.
To check the status of the data flow, see the Status panel on the right below the Properties Panel. For details about the Status panel, see Monitor Status of Data Loads, Data Flows, and Workflows. This panel also shows the link to the Job ID that you can click to monitor the progress on the Jobs page.
To see the vector embedding that is generated as part of the data flow, select the target table on the design canvas and click the
icon in the right panel. You should see an entry called VECTOR_EMBEDDINGof type VECTOR. You can edit this name.
Use Image Embedding Vectors in a Data Flow
For using image embedding vectors Data Transforms supports images that are in the BLOB data type.
To use image vector embeddings in a data flow:
- In the Data Flow Editor click Add a Schema to define your source connection. From the Connection drop-down, select the Oracle Database 23ai connection and the schema that you want to use from the drop down. Click OK. Similarly define the target connection.
- From the left panel, drag the table that you want to use as a source in the data flow and drop it on the design canvas.
- Select the source table on the design canvas and click the
icon in the right panel to preview the data within the table. Make sure that the table includes a column that lists the images you want to use for the vector embedding.
- Add a Filter join and set the desired filter condition.
- Select the source object on the design canvas, and drag the Connector icon (
) next to it to connect it to the Filter join.
- From the Database Functions toolbar, click Machine Learning and drag the Image Embedding Vector transformation component drop it on the design canvas.
- Select the Filter join, and drag the Connector icon (
) next to it to connect it to the Image Embedding Vector transformation component.
- Click the Image Embedding Vector transformation component to view its properties on the right panel.
- In the General tab, specify the following:
- AI Service - Select OCI Generative AI from the drop-down.
- AI Connection - The drop-down lists all the available connections for the selected AI Service. Select the connection that you want to use.
- AI Model - The drop-down lists all the available models for the selected AI Service and Connection. The following models are listed:
- cohere.embed-v4.0
- cohere.embed-english-image-v3.0
- cohere.embed-english-light-image-v3.0
- cohere.embed-multilingual-image-v3.0
- cohere.embed-multilingual-light-image-v3.0
- [Optional] If the source table column lists an image that is stored in a file in an OCI Object Storage bucket, select the connection from the Object Storage Connection drop-down. Data Transforms fetches the file from that location for the embedding.
- In the Column Mapping tab, map the source column that lists the images that you want to embed to the INPUT attribute of the operator. The only column available in the column mappings is
input_image. Drag a text column from the available columns to the Expression column. This is the data that the vectors will be built on. - Drag the table that you want to use as a target in the data flow and drop it on the design canvas.
- Select the the Image Embedding Vector transformation component and drag the Connector icon (
) next to it to connect it to the target object.
- Save (
) and execute (
) the data flow.
Data Transforms will build vectors for each of the rows in the source table and write that to the target table.
To check the status of the data flow, see the Status panel on the right below the Properties Panel. For details about the Status panel, see Monitor Status of Data Loads, Data Flows, and Workflows. This panel also shows the link to the Job ID that you can click to monitor the progress on the Jobs page.
To see the vector embedding that is generated as part of the data flow, select the target table on the design canvas and click the
icon in the right panel. You should see an entry called VECTOR_EMBEDDINGof type VECTOR. You can edit this name.
Parent topic: Data Flows