|Oracle Text Reference
Part Number A90121-01
This section describes new features of Oracle Text (formerly Oracle8i interMedia Text) and provides pointers to additional information. The following sections describe the new features in Oracle Text:
The following sections outline the new features in this release.
A document classification application is one that classifies an incoming stream of documents based on their content. These applications are also known as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into categories such as politics, crime, and sports.
Oracle Text enables you to build such applications with the new CTXRULE index type. This index type indexes the rules (queries) that define classifications or routing criteria. When documents arrive, the new MATCHES operator can be used to categorize and route each document.
The format column in your text table allows you to specify whether binary or text data is stored in the text column.
A new format column value of IGNORE is provided. When you issue the CREATE INDEX statement and specify a format column, any row whose format column is set to IGNORE is ignored during indexing. This feature is useful for indexing text columns that contain data incompatible with text indexing such as images or raw binary data.
When you specify your user procedure for the USER_DATSTORE, you can return permanent BLOB and CLOB locators for your IN/OUT parameter.
In this release, Oracle Text continues to support the indexing and querying of Korean text with a new Korean lexer, KOREAN_MORPH_LEXER. The KOREAN_MORPH_LEXER lexer offers the following benefits over the KOREAN_LEXER:
In this release, Oracle Text continues to support the indexing and querying of Japanese text with a new Japanese lexer JAPANESE_LEXER. This lexer offers the following benefits over the JAPANESE_VGRAM_LEXER:
Oracle Text supports the indexing of text columns of type XMLType.
Oracle Text Application Developer's Guide for more information about XMLType indexing.
You can create a MULTI_STOPLIST type stoplist that contains words that are to be stopped in more than one language. This new stopword type is called ALL. For example, you can use an ALL stopword when you need to index international documents that contain English fragments.
Oracle Text supports UTF-16 conversion to the database character set with the charset and Inso filters. These filters can convert documents that are UTF-16 big-endian (AL16UTF16) or little-endian (AL16UTF16LE).
Oracle Text also supports endian auto-detection when the character set column or charset filter is set to UTF16AUTO.
The INSO_FILTER document filter has a new timeout attribute that allows you to specify the maximum time Oracle waits for a document to be filtered during indexing. You can use this mechanism to avoid hanging during the index operation.
XML documents can have parent-child tag structures such as the following:
In this example, tag C is a child of tag B which is a child of tag A.
Oracle Text now enables you to do path searching with the new PATH_SECTION_GROUP. This section group allows you to specify direct parentage in queries, such as to find all documents that contain the term dog in element C which is a child of element B and so on.
The new section group also allows you to do tag attribute value searching and attribute equality testing.
The new operators associated with the this feature are
The following procedures in the CTX_DDL PL/SQL package have been updated:
This procedure has two new parameters for specifying memory size and partition name.
This procedure accepts ON/OFF boolean attributes in addition to TRUE, T, FALSE,F, YES, Y, NO, and N.
Use this procedure when you need your USER_DATASTORE procedure to filter binary data to text before concatenation.
The CTX_OUTPUT package has the following new procedures:
Use the first procedure to augment the index log file with rowid information, which is useful for debugging an index operation.
The following views have been updated for this release:
The CTX_VERSION view has a new column VER_CODE which is the version number of the Oracle Text code linked in to the Oracle shadow process. Use this column to detect and verify patch releases.
The following views are new. Use the first four for querying information about sub-lexers with multi-lexer preference: