|Oracle Text Reference
Part Number A96518-01
This chapter describes new features of Oracle Text (formerly Oracle8i interMedia Text) and provides pointers to additional information. The following topics are covered:
The following features are new for this release:
The new CTX_CLS.TRAIN procedure enables you to generate rules for routing documents to different categories.
The user-defined lexer enables you to create lexing solutions for indexing and querying languages not supported by Oracle Text such as Arabic.
CONTAINS and CATSEARCH are no longer limited to their respective CONTEXT and CTXCAT grammars. Query templating enables you to use the CONTEXT grammar and associated operators in CATSEARCH queries and vice-versa.
You can create a CONTEXT index while allowing inserts, updates, and deletes to your base table.
Parallel indexing is now supported for non-partitioned tables. You can use parallelism with
CREATE INDEX and
ALTER INDEX with parameters
sync. You can also run
CTX_DDL.OPTIMIZE_INDEX with a parallel degree.
Stem indexing enables better performance for stem ($) queries by indexing the stem form in addition to the base form.
New CHINESE_LEXER enables you to index traditional and simplified Chinese text more efficiently.
You can create CONTEXT indexes on URIType columns.
The CTXXPATH indextype enables you to speed up ExistsNode() queries on XMLType columns.
You can call the CONTAINS function within an ExistsNode() statement without a Text index.
The following sections outline the new features in this release.
A document classification application is one that classifies an incoming stream of documents based on their content. These applications are also known as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into categories such as politics, crime, and sports.
Oracle Text enables you to build such applications with the new
CTXRULE index type. This index type indexes the rules (queries) that define classifications or routing criteria. When documents arrive, the new
MATCHES operator can be used to categorize and route each document.
The format column in your text table allows you to specify whether binary or text data is stored in the text column.
A new format column value of
IGNORE is provided. When you issue the
CREATE INDEX statement and specify a format column, any row whose format column is set to
IGNORE is ignored during indexing. This feature is useful for indexing text columns that contain data incompatible with text indexing such as images or raw binary data.
When you specify your user procedure for the
USER_DATSTORE, you can return permanent
CLOB locators for your
In this release, Oracle Text continues to support the indexing and querying of Korean text with a new Korean lexer,
KOREAN_MORPH_LEXER lexer offers the following benefits over the
In this release, Oracle Text continues to support the indexing and querying of Japanese text with a new Japanese lexer
JAPANESE_LEXER. This lexer offers the following benefits over the
Oracle Text supports the indexing of text columns of type
Oracle Text Application Developer's Guide for more information about
You can create a
MULTI_STOPLIST type stoplist that contains words that are to be stopped in more than one language. This new stopword type is called ALL. For example, you can use an ALL stopword when you need to index international documents that contain English fragments.
Oracle Text supports UTF-16 conversion to the database character set with the charset and Inso filters. These filters can convert documents that are UTF-16 big-endian (AL16UTF16) or little-endian (AL16UTF16LE).
Oracle Text also supports endian auto-detection when the character set column or charset filter is set to
INSO_FILTER document filter has a new timeout attribute that allows you to specify the maximum time Oracle waits for a document to be filtered during indexing. You can use this mechanism to avoid hanging during the index operation.
XML documents can have parent-child tag structures such as the following:
In this example, tag C is a child of tag B which is a child of tag A.
Oracle Text now enables you to do path searching with the new
PATH_SECTION_GROUP. This section group allows you to specify direct parentage in queries, such as to find all documents that contain the term dog in element C which is a child of element B and so on.
The new section group also allows you to do tag attribute value searching and attribute equality testing.
The new operators associated with the this feature are
The following procedures in the
CTX_DDL PL/SQL package have been updated:
This procedure has two new parameters for specifying memory size and partition name.
This procedure accepts
ON/OFF boolean attributes in addition to
Use this procedure when you need your
USER_DATASTORE procedure to filter binary data to text before concatenation.
CTX_OUTPUT package has the following new procedures:
Use the first procedure to augment the index log file with rowid information, which is useful for debugging an index operation.
The following views have been updated for this release:
CTX_VERSION view has a new column
VER_CODE which is the version number of the Oracle Text code linked in to the Oracle shadow process. Use this column to detect and verify patch releases.
The following views are new. Use the first four for querying information about sub-lexers with multi-lexer preference: