UTL_TO_TEXT
Use the DBMS_VECTOR_CHAIN.UTL_TO_TEXT chainable utility
function to convert an input document (for example, PDF, DOC, JSON, XML, or HTML) to plain
text.
Purpose
To perform a file-to-text transformation by using the Oracle Text component
(CONTEXT) of Oracle AI Database.
Syntax
DBMS_VECTOR_CHAIN.UTL_TO_TEXT (
DATA IN CLOB | BLOB,
PARAMS IN JSON default NULL
) return CLOB;DATA
This function accepts the input data type as CLOB or BLOB. It can read documents from a remote location or from files stored locally in the database tables.
It returns a plain text version of the document as CLOB.
Oracle Text supports around 150 file types. For a complete list of all the supported document formats, see Oracle Text Reference.
PARAMS
Specify the following input parameter in JSON format:
{
"plaintext": "true or false",
"charset": "UTF8 | EUCJP | <other_valid_charset>",
"format": "BINARY | TEXT | IGNORE"
}Table 12-32 Parameter Details
| Parameter | Description |
|---|---|
|
|
Plain text output. The default value for this parameter is If you do not want to return the document as plain text, then set this parameter to |
|
|
Character set encoding. The default value for this parameter is the current database character set. That is, by default, the input is assumed to use the same character set as the database. If your input uses a different character set, specify that character set using this parameter. |
|
|
Format type of the content to be processed. Valid values are:
|
Example
select DBMS_VECTOR_CHAIN.UTL_TO_TEXT (
t.blobdata,
json('{
"plaintext": "true",
"charset" : "UTF8",
"format" : "TEXT"
}')
) from tab t;End-to-end example:
To run an end-to-end example scenario using this function, see Convert File to Text to Chunks to Embeddings Within Oracle AI Database.
Parent topic: DBMS_VECTOR_CHAIN