|Oracle® Text Reference
10g Release 1 (10.1)
Part Number B10730-01
The following features are new for this release:
In previous versions of Oracle Text,
CTXSYS had DBA privileges. To tighten security and protect the database in the case of unauthorized access,
CTXSYS now has only
RESOURCE roles, and only limited, necessary direct grants on some system views and packages. Some applications using Oracle Text may therefore require minor changes in order to work properly with this security change.
See Also:The Migration chapter in the Oracle Text Application Developer's Guide
The following features are new for classification and clustering:
Supervised Training and Document Classification
CTX_CLS.TRAIN procedure has been enhanced to support an additional classifier type called Support Vector Machine method for the supervised training of documents. The SVM method of training can produce better rules for classification than the query-based method.
CTX_CLS.CLUSTERING procedure enables you to generate document clusters. A cluster is a group of documents similar to each other in content.
The following features are new for indexing.
ON COMMIT Synchronization for
You can set the
CONTEXT index to synchronize automatically either at intervals you specify or at commit time.
TRANSACTIONAL parameter to
CREATE INDEX and
ALTER INDEX enables changes to a base table to be immediately queryable.
Automatic Multi-Language Indexing
WORLD_LEXER lexer type includes automatic language detection in documents, enabling you to index multilingual documents without having to include a language column in a base table.
Oracle Text can filter and index RFC-822 email messages. To do so, you use the new
MAIL_FILTER filter preference.
Fast Filtering of Binary Documents
New attributes for the
MAIL_FILTER filter preferences offer the option of significantly improving performance when filtering binary documents. This fast filtering preserves only a limited amount of document formatting.
Support for creating local partitioned
CONTEXT indexes in parallel
You can now create local partitioned
CONTEXT indexes in parallel with
MDATA section for adding metadata to documents
You can now add an
MDATA section to a section group.
MDATA sections define metadata that enables you to perform mixed
CONTAINS queries faster.
ALTER TABLE enhanced support for partitioned tables
ALTER TABLE supports the
UPDATE GLOBAL INDEXES clause for partitioned tables.
Binary Filtering for
MULTI_COLUMN_DATASTORE now enables you to filter binary columns into text for concatenation with other columns during indexing. This datastore has also been enhanced to switch its XML-like auto-tagging on and off.
New XML Output Option for Index Reports
Several procedures and functions in the
CTX_REPORT package now include a report_format parameter that enables you to obtain index report output either as plain text or XML.
See Also:Chapter 11, " CTX_REPORT "
Replacing Index Metadata
You can replace index metadata (preference attributes) without having to rebuild the index. You do this using the new
METADATA keyword with
New Columns for Oracle Text Views
Three Oracle Text views,
CTX_USER_INDEX_PARTITIONS, have new columns.
See Also:Appendix G, " Views"
New Options for Index Optimization
CTX_DDL.OPTIMIZE_INDEX has two new optlevels.
TOKEN_TYPE optimizes on demand all tokens in the index matching the input token type. This is intended to help users keep critical field sections or
MDATA sections optimal.
CTX_DDL.OPTIMIZE_INDEX to rebuild an index entirely.
Log tokens During Index Optimization
CTX_OUTPUT.EVENT_OPT_PRINT_TOKEN event, which prints each token as it is being optimized, can be used with
Oracle Text includes a tracing facility that enables you to identify bottlenecks in indexing and querying.
New German Spelling
Oracle Text now can index German words under both traditional and reformed spelling.
The following are new language features:
Japanese Language Enhancements
Oracle Text supports stem queries in Japanese with the stem $ operator.
Customization of Japanese and Chinese Lexicons
A new command,
ctxlc, enables you to either modify the existing system Japanese and Chinese dictionaries (lexicons) or create new dictionaries from the merging of the system dictionaries with user-provided word lists. ctxlc also outputs the contents of dictionaries as word files.
New character sets for the Chinese VGRAM lexer
The Chinese VGRAM lexer now supports the AL32UTF8 and ZHS32GB18030 character sets.
Query Template Enhancements
Query templating has been enhanced to provide the following features:
progressive relaxation of queries, which enables you to progressively execute less restrictive versions of a single query
query rewriting, which enables you to programatically rewrite any single query into different versions to increase recall
query language specification
alternative scoring algorithms
Query Log Analysis
Oracle Text now offers the capability to create a log of queries and to issue reports on its contents, indicating, for example, the most or least frequent successful queries.
XML DB Enhancements
Oracle Text has the following XML DB enhancements:
Better performance of
CTXXPATH queries, with new support for attribute existence searching, and positional predicates.
Support for positional predicate testing with
Overriding of Base-letter Transformations
OVERRIDE_BASE_LETTER, prevents unexpected results when base-letter transformations are combined with alternate spelling.
Oracle Text supports highlighting with
See Also:Chapter 8, " CTX_DOC Package "
CTX_DOC Enhancements for Policy-Based Document Services
With the new
CTX_DOC.POLICY_* procedures, you can perform document highlighting and filtering without requiring a table or a context index.
See Also:Chapter 8, " CTX_DOC Package "