2
interMedia Text

This chapter describes the new and updated features for interMedia Text, release 8.1.7.

The following topics are covered:

Overview of New and Updated Features

The following sections outline the new and updated features for interMedia Text, Release 8.1.7. These features are:

Catalog (CTXCAT) Index

With previous releases of interMedia Text, you can create a text index of type CONTEXT. This type of index is suited for indexing document collections that contain large coherent documents.

For this release, you can create a new type of index called CTXCAT in addition to the CONTEXT indextype. This is a combined index on a text column and one or more other columns. It is also known as a catalog index.

The CTXCAT indextype is suited for indexing short text fragments, such as names, addresses and item descriptions that are stored in separate columns. This type of index offers better query performance for structured queries.

You query a CTXCAT index with the CATSEARCH operator in the WHERE clause of a SELECT statement.

Indexing Example

Consider a table called AUCTION with the following schema:

create table auction(

item_id number,
title varchar2(100),
category_id number,
price number,
bid_close date);

Assume that queries on the table involve a mandatory text query clause and optional structured conditions on category_id. Results must be sorted based on either bid_close, category_id, or price.

You can create a catalog index to support the different types of structured queries a user might enter.

To create the indexes, first create the index set preference then add the required indexes to it.

The following example creates the index set preference and adds five different indexes to it:

begin

ctx_ddl.create_index_set('auction_iset');
ctx_ddl.add_index('auction_iset','bid_close');               /* index A */
ctx_ddl.add_index('auction_iset','category_id, bid_close');  /* index B */
ctx_ddl.add_index('auction_iset','bid_close, category_id');  /* index C */
ctx_ddl.add_index('auction_iset','price, bid_close');        /* index D */
ctx_ddl.add_index('auction_iset','bid_close, price');        /* index E */

end;

Create the combined catalog index with CREATE INDEX as follows:

create index auction_titlex on AUCTION(title) indextype is CTXCAT parameters 
('index set auction_iset');

Querying Example

To query the title column for the word camera, you can issue regular and mixed queries as follows:

select from AUCTION where CATSEARCH(title, 'camera', NULL)>0;

The following query uses index A:

select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000')>0;

The following query uses index B:

select from AUCTION where CATSEARCH(title, 'camera', 'category_id=99 order by 
bid_close desc')>0;

The following query uses index C:

select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000 
order by category_id')>0;

The following query uses index D:

select from AUCTION where CATSEARCH(title, 'camera', 'price=200 order by bid_
close')>0;

The following query uses index E:

select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000 
order by price')>0;

Keyless Index

In previous releases, your text table required a primary key to be indexed. This is no longer required in this release. You can create text table without a primary key and and index your text column.

Multi-Column Datastore

Use the new MULTI_COLUMN_DATASTORE type when your text is stored in more than one column in your text table. During indexing, the system concatenates the text columns and indexes the text as a single document.

URL Datastore Username and Password Support

With the URL_DATASTORE, you can store URLs in your text column for indexing. This release supports embedding username and password for FTP protocol URLs, as in:

ftp://username:password@ftp.hostname.com/dir/file.doc

See Also:
Oracle8i interMedia Text Reference for more information about how to use the URL_DATASTORE.

File and URL Datastore Security

FILE and URL datastores access files on the database machine. This might be undesirable for sensitive sites, since any user can browse the file system accessible to the oracle user using these datastores.

For better security, this release introduces the FILE_ACCESS_ROLE system parameter. You or your DBA can set this system parameter to the name of a database role. If set, any user attempting to create an index using FILE or URL datastores must have this role, or the index creation will fail. For instance, if the DBA does:

begin
ctx_adm.set_parameter('FILE_ACCESS_ROLE','WHITEHAT');

end;

then when user SCOTT tries to index:

create index foox on foo(text) indextype is ctxsys.context parameters('datastore 
ctxsys.file_datastore')

Oracle checks if SCOTT has the role WHITEHAT. If he does, then the create index will proceed as normal. If not, then the create index will fail.

Note:
This check is made only at create index time. Setting this parameter or granting/revoking the named role has no effect on existing indexes using the file datastore

See Also:
Oracle8i interMedia Text Reference for more information about how to use the FILE_DATASTORE and URL_DATASTORE.

Procedure Filter

Use the new PROCEDURE_FILTER filter preference to filter your documents with a PL/SQL or Java stored procedure. The stored procedure is called each time a document needs to be filtered during indexing.

Note:
This feature is different from the existing USER_FILTER which calls an external executable to perform document filtering.

See Also:
PROCEDURE_FILTER New Filter Type

Multi-Language Stoplist

Oracle8i interMedia Text supports multi-language stoplists. A multi-language stoplist is useful when you index a multi-language column, such as a text column with English, German, and Japanese documents.

Prefix Indexing

With normal indexing, right-truncated wildcard queries such as TO% can possibly expand into a large wordlist and degrade query performance.

To improve query performance of right-truncated wildcard queries, you can create a prefix index. This type of index records token prefixes. The trade-off is a bigger index for improved wildcard searching.

Prefix indexing is different from sub-string indexing which also improves wildcard queries.

See Also:

Prefix Indexing New Feature

Single Token Index Optimization

interMedia Text supports the optimization of single tokens in the index. The optimization of specific tokens in the index saves time over optimizing the entire index.

Document Tokens Service

Document services has the following two new features:

You can identify the text tokens in a particular document using the CTX_DOC.TOKENS.
You can specify the number of themes and theme summaries returned for CTX_DOC.THEMES and CTX_DOC.GIST

See Also:

CTX_DOC.TOKENS New Procedure
CTX_DOC.THEMES Updated Syntax
CTX_DOC.GIST Updated Syntax

Thesaurus Translation Calls

The CTX_THES PL/SQL package supports adding, updating, and removing translations in your thesaurus. These new procedures are the following:

New Fuzzy Operator Functionality

The fuzzy query operator now supports similarity scoring. You can order the results so that results with high similarity to the query word are ranked higher than results with low similarity.

You can also limit the number of expanded terms.

See Also:
Fuzzy Operator New Syntax

Inso Filter Enhancements

Oracle8i interMedia Text supports most document formats for indexing with Inso filter technology.

See Also:
Oracle8i interMedia Text Reference for more information about the Inso filter and all supported document formats

Newly Supported Formats

In this release, support for the following formats has been added:

Format Version

Microsoft Word 2000

Word 2000

Microsoft Excel 2000

Excel 2000

Microsoft PowerPoint 2000

PowerPoint 2000

Corel WordPerfect for Windows

Versions through 9.0

QuattroPro for Windows

Versions through 9.0

Corel Presentations

Versions 8.0 and 9.0

Microsoft Project

Project 98

Visio graphics format

Visio 4, 5 2000

Ichitaro

Version 5, 6, 7, 8, an 9
(Text and paragraph attributes only for versions 5 and 6.)

CDR (if tiff image is embedded)

Corel Draw version 2.0 - 9.0

MSG

Microsoft Outlook mail format
(Some limitations in field support.)

Format	Version
Microsoft Word 2000	Word 2000
Microsoft Excel 2000	Excel 2000
Microsoft PowerPoint 2000	PowerPoint 2000
Corel WordPerfect for Windows	Versions through 9.0
QuattroPro for Windows	Versions through 9.0
Corel Presentations	Versions 8.0 and 9.0
Microsoft Project	Project 98
Visio graphics format	Visio 4, 5 2000
Ichitaro	Version 5, 6, 7, 8, an 9 (Text and paragraph attributes only for versions 5 and 6.)
CDR (if tiff image is embedded)	Corel Draw version 2.0 - 9.0
MSG	Microsoft Outlook mail format (Some limitations in field support.)

Supported Platforms

Inso filter technology is supported on the following platforms:

Sun Solaris Sparc (2.5.1 - 7)
IBM AIX RS 6000 (4.1.4 - 4.3)
HP/UX HP9000 (9.0 - 11.0)
DEC UNIX (4.0)
SGI IRIX (6.3 "New 32 -bit")
Microsoft Windows NT on Intel (4.0)
Microsoft Windows 2000 (release candidate 2)
DEC Alpha Windows NT (4.0)

Supplied French Knowledge Base

Oracle8i interMedia Text now provides a knowledge base for French in addition to English. The knowledge base is the language specific data used during theme analysis.

For other languages, you must provide your own thesauri. One or more thesauri in a language can be compiled to produce an interMedia Text knowledge base for that language using the ctxkbtc compiler.

User-Defined Knowledge Bases

In this release, Oracle8i interMedia Text extends theme functionality to other languages by allowing you to load your own knowledge base for any single-byte whitespace delimited language, including Spanish and French.

Theme functionality includes theme indexing, ABOUT queries, theme highlighting, and the generation of themes, gists, and theme summaries with CTX_DOC.

You extend theme functionality by adding a user-defined knowledge base. For example, you can create a Spanish knowledge base from a Spanish thesuarus.

To load your language-specific knowledge base, follow these steps:

Load your custom thesaurus using ctxload.
Set NLS_LANG so that the language portion is the target language. The charset portion must be a single-byte character set.
Compile the loaded thesaurus using ctxkbtc:

ctxkbtc -user ctxsys/ctxsys -name my_lang_thes

A knowledge base is compiled from the loaded thesaurus. To use this knowledge base in an index, specify the NLS_LANG language as the THEME_LANGUAGE attribute value for the BASIC_LEXER preference.

Limitations

The following limitations hold for adding knowledge bases:

Oracle supplies knowledge bases in English and French only. You must provide your own thesaurus for any other language.
You can only add knowledge bases for languages with single-byte character sets. You cannot create a knowledge base for languages which can be expressed only in multi-byte character sets. If the database is a multi-byte universal character set, such as UTF-8, the NLS_LANG parameter must still be set to a compatible single-byte character set when compiling the thesaurus.
Adding a knowledge base works best for whitespace delimited languages.
You can have at most one knowledge base per NLS language.

See Also:
Oracle8i interMedia Text Reference for more information about theme functionality

Knowledge Base Character Set

Knowledge bases can be in any single-byte character set. Supplied knowledge bases are in WE8ISO8859P1. You can store an extended knowledge base in another character set such as US7ASCII.

CTXKBTC Knowledge Base Compiler Supports Stopthemes

The ctxkbtc executable now takes an additional argument as follows:

ctxkbtc -user ctxsys/ctxsys -stoplist <stoplistname>

Stopwords in the stoplist are added to the knowledge base as useless words that are prevented from becoming themes or contributing to themes. You can still add stopthemes after running this command using CTX_DLL.ADD_STOPTHEME.

Hierarchical Query Feedback

Obtaining hierarchical query feedback information such as broader terms, narrower terms and related terms does not work in languages other than English and French.

In other languages, the knowledge bases are derived entirely from your thesauri. In such cases, Oracle recommends that you obtain hierarchical information from your thesauri.

CATSEARCH New SQL Operator

Use the CATSEARCH operator to search catalog indexes. Use this operator in the WHERE clause of a SELECT statement.

Syntax

CATSEARCH(

[schema.]column,
text_query       VARCHAR2,
structured_query VARCHAR2,

RETURN NUMBER;

[schema.]column

Specify the text column to be searched on. This column must have a CTXCAT index associated with it.

text_query

Specify the query expression that defines your search in column. The CATSEARCH operator supports only the following query operations:

Logical AND
Logical OR (|)
Logical NOT (-)
" " (exact phrases quoted)

These operators have the following syntax:

Operation Syntax Description of Operation

Logical AND

a b c

Returns rows that contain a, b, and c.

Logical OR

a | b | c

Returns rows that contain a, b, or c.

Logical NOT

a - b

Returns rows that contain a and not b.

hyphen with no space

a-b

Hyphen ignored. Words such as web-site treated as a single query term.

" "

"a b c"

Returns rows that contain the phrase "a b c".
For example, entering "XYZ CD Player" means return all rows that contain this phrase exactly.

Operation	Syntax	Description of Operation
Logical AND	a b c	Returns rows that contain a, b, and c.
Logical OR	a \| b \| c	Returns rows that contain a, b, or c.
Logical NOT	a - b	Returns rows that contain a and not b.
hyphen with no space	a-b	Hyphen ignored. Words such as web-site treated as a single query term.
" "	"a b c"	Returns rows that contain the phrase "a b c". For example, entering "XYZ CD Player" means return all rows that contain this phrase exactly.

structured_query

Specify the structured conditions and the ORDER BY clause. There must exist an index for any column you specify. For example, if you specify 'category_id=1 order by bid_close', you must have an index for 'category_id, bid_close' as specified with CTX_DDL.ADD_INDEX.

With structured_query, you can use standard SQL syntax with only the following operators:

=
<=
>=
>
<
IN
BETWEEN

When you use these operators, the following limitations apply:

The left-hand side (the column name) must be a column named in at least one of the indexes of the index set
The left-hand side must be a plain column name. Functions and expressions are not allowed.
The right-hand side must be composed of literal values. Functions, expressions, other columns, subselects, are not allowed.
Multiple criteria can be combined with AND. OR is not supported.

For example, these expressions are supported:

catsearch(text, 'dog', 'foo > 15')
catsearch(text, 'dog', 'bar = ''SMITH''')
catsearch(text, 'dog', 'foo between 1 and 15')
catsearch(text, 'dog', 'foo = 1 and abc = 123')

And these expression are not supported:

 catsearch(text, 'dog', 'upper(bar) = ''A''')
 catsearch(text, 'dog', 'bar LIKE ''A%''')
 catsearch(text, 'dog', 'foo = abc')
 catsearch(text, 'dog', 'foo = 1 or abc = 3')

Examples

A typical query with CATSEARCH might include a structured clause as follows to find all rows that contain the word camera with id of 99 ordered by bid_close:

select from AUCTION where CATSEARCH(title, 'camera', 'category_id=99 order by 
bid_close desc')> 0;

The following query finds all rows with the exact phrase XYZ CD Player:

select from AUCTION where CATSEARCH(title, '"XYZ CD Player"', 'order by bid_
close desc')> 0;

The following query finds all rows with the terms XYZ and CD and Player:

select from AUCTION where CATSEARCH(title, 'XYZ CD Player', 'order by bid_close 
desc')> 0;

The following query finds all rows with the term CD-Player:

select from AUCTION where CATSEARCH(title, 'CD-Player', 'order by bid_close 
desc')> 0;

The following query finds all rows with the term CD and not Player:

select from AUCTION where CATSEARCH(title, 'CD - Player', 'order by bid_close 
desc')> 0;

The following query finds all rows with the terms CD or DVD or Speaker:

select from AUCTION where CATSEARCH(title, 'CD | DVD | Speaker', 'order by bid_
close desc')> 0;

CREATE INDEX Updated Syntax

Note:
This section describes the CREATE INDEX statement as it pertains to creating an interMedia Text domain index. In addition, it only describes those changes that are new for this release.
For a complete description of the CREATE INDEX statement, see Oracle8i interMedia Text Reference and Oracle8i SQL Reference.

Purpose

Use CREATE INDEX to create an interMedia Text index. An interMedia Text index is an Oracle domain index of either type context or ctxcat.

You must create an appropriate interMedia Text index to issue CONTAINS or CATSEARCH queries.

You can create two types of interMedia Text indexes:

CONTEXT index. This is an index on one text column. You query this index with the CONTAINS operator in the WHERE clause of a SELECT statement.
CTXCAT index, new for this release. This is a combined index on a text column and one or more other columns.You query this index with the CATSEARCH operator in the WHERE clause of a SELECT statement.

Required Privileges

You do not need the CTXAPP role to create an interMedia Text index. If you have Oracle grants to create a B-tree index on the text column, you have sufficient permission to create a text index. The issuing owner, table owner, and index owner can all be different users, which is the standard behavior for regular B-tree indexes.

Syntax for New CATALOG Indextype

CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.ctxcat 
[PARAMETERS(paramstring)] [PARALLEL n];

Parameter String Limitations for CTXCAT

To create a CTXCAT index, you use CREATE INDEX and optionally specify preferences in the parameter string like you do when you create a context index.

However, the keywords you can use in a CTXCAT parameter string are limited to:

Parameter Use

INDEX SET

Specify the index set to use to create the catalog index.

DATASTORE

Specify the datastore preference.

LEXER

Specify your lexer preference.

MEMORY

Specify memory size.

STOPLIST

Specify your stoplist.

STORAGE

Specify your storage preference.

WORDLIST

Specify your wordlist preference.

Parameter	Use
INDEX SET	Specify the index set to use to create the catalog index.
DATASTORE	Specify the datastore preference.
LEXER	Specify your lexer preference.
MEMORY	Specify memory size.
STOPLIST	Specify your stoplist.
STORAGE	Specify your storage preference.
WORDLIST	Specify your wordlist preference.

Supported Preferences for CTXCAT

When you create an index of type ctxcat, you can use only the following index preferences:

Table 2-1

Preference Class	Supported Types
Datastore	DIRECT_DATASTORE
Filter	None
Lexer	BASIC_LEXER CHINESE_VGRAM_LEXER JAPANESE_VGRAM_LEXER KOREAN_LEXER
Wordlist	PREFIX_INDEX attribute of BASIC_WORDLIST for Japanese data.
Storage	BASIC_STORAGE
Stoplist	GENERIC_STOPLIST
Section Group	None

CTXCAT Supported Column Types

The index set can take up to 99 indexes, each made up of ordered lists of columns from the base table. However, there are two important restrictions on these columns:

The allowable column types are NUMBER, DATE, and CHAR/VARCHAR2 with maximum length <= 30 bytes.
NULL values are not allowed. A NULL in a user column will cause an index error and the row will not be indexed.

CTXCAT System Parameters

The catalog index has its own set of system parameters for setting preference defaults. This allows an installation to have one set of default preferences for context indexes, and another set for ctxcat indexes. The new system parameters are:

DEFAULT_CTXCAT_LEXER
DEFAULT_CTXCAT_STOPLIST
DEFAULT_CTXCAT_WORDLIST
DEFAULT_CTXCAT_STORAGE
DEFAULT_CTXCAT_INDEX_SET

Note:
While you can specify a wordlist preference for ctxcat indexes, most of the attributes do not apply, since the catsearch query language does not support wildcarding, fuzzy, and stemming. The only attribute of the wordlist preference that is useful is PREFIX_INDEX, for Japanese data.

CTXCAT Example

Refer to the section Catalog (CTXCAT) Index.

Syntax for CONTEXT Indextype

CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS 
ctxsys.context [PARAMETERS(paramstring)] [PARALLEL n];

See Also:
Oracle8i interMedia Text Reference for more information about how to create a CONTEXT index.

ALTER INDEX Updated Syntax

Note:
This section describes the ALTER INDEX statement as it pertains to managing an interMedia Text domain index. In addition it only describes those changes that are new for this release.
For a complete description of the ALTER INDEX statement, see Oracle8i interMedia Text Reference and the Oracle8i SQL Reference.

Purpose

Use ALTER INDEX REBUILD syntax to perform the following maintenance tasks for a text index:

Rebuild the index using different preferences. See REBUILD Syntax.
Resume a failed index operation (creation or optimization). See REBUILD Syntax.
Process DML in batch (synchronize). See REBUILD Syntax.
Optimize the index, fully or by token. Token optimization is new for this release. See REBUILD Syntax.
Add stopwords to the index. See REBUILD Syntax.
Add sections and stop sections to the index. See REBUILD Syntax.

REBUILD Syntax

The following syntax is used to rebuild the index, resume a failed operation, perform batch DML, add stopwords to the index, add sections and stop sections to index, or optimize the index:

ALTER INDEX [schema.]index REBUILD [ONLINE] [PARAMETERS (paramstring)];

ONLINE

Optionally specify the ONLINE parameter for nonblocking operation, which allows the index to be queried during an ALTER INDEX synchronize or optimize operation. You cannot specify ONLINE for replace, resume, or when adding stopwords or stop sections.

PARAMETERS (paramstring)

Optionally specify paramstring. If you do not specify paramstring, Oracle rebuilds the index with existing preference settings.

The syntax for paramstring is as follows:

paramstring = 'replace [datastore datastore_pref] 
                       [filter filter_pref] 
                       [lexer lexer_pref] 
                       [wordlist wordlist_pref] 
                       [storage storage_pref] 
                       [stoplist stoplist] 
                       [section group section_group]
                          [memory memsize]

|    resume [memory memsize]
|    optimize [token index_token | fast | full [maxtime (time | unlimited)]
|    sync [memory memsize]
|    add stopword word [language language]
|    add zone section section_name tag tag
|    add field section section_name tag tag [(VISIBLE | INVISIBLE)]
|    add attr section section_name tag tag@attr
|    add stop section tag'

optimize [token index_token | fast | full [maxtime (time | unlimited)]

Optimizes the index. Specify token, fast, or full optimization. You typically optimize after you synchronize the index.

When you optimize in token mode, Oracle optimizes only the index token index_token in full mode. Use this method of optimization to quickly optimize frequently searched terms.

When you optimize in fast mode, Oracle works on the entire index, compacting fragmented rows. However, in fast mode, old data is not removed.

When you optimize in full mode, you can optimize the whole index or a portion. This method compacts rows and removes old data.

You use the maxtime parameter to specify in minutes the time Oracle is to spend on the optimization operation. Oracle starts the optimization where it left off and optimizes until complete or until the time limit has been reached, whichever comes first. Specifying a time limit is useful for automating index optimization, where you set Oracle to optimize the index for a specified time on a regular basis.

When you specify maxtime unlimited, the entire index is optimized. This is the default. When you specify 0 for maxtime, Oracle performs minimal optimization.

Examples

Token Optimization

The following statement optimizes the token Oracle in token mode:

ALTER INDEX newsindex REBUILD PARAMETERS('optimize token Oracle');

Fast Optimization

The following statement optimizes newsindex in fast mode:

ALTER INDEX newsindex REBUILD PARAMETERS('optimize fast');

Full Optimization

To specify an optimization operation to last for three hours (180 minutes), issue the following statement:

ALTER INDEX newsindex REBUILD PARAMETERS('optimize full maxtime 180');

To optimize the entire index without regard to time, issue the following statement:

ALTER INDEX newsindex REBUILD PARAMETERS('optimize full maxtime unlimited');

To optimize the entire index and to allow queries to be issued during the optimization, issue the following statement:

ALTER INDEX newsindex REBUILD ONLINE PARAMETERS('optimize full maxtime 
unlimited');

MULTI_COLUMN_DATASTORE New Datastore Type

Use this datastore when your text is stored in more than one column. During indexing, the system concatenates the text columns and indexes the text as a single document.

MULTI_COLUMN_DATASTORE has the following attributes:

Attribute Attribute Value

columns

Specify a comma separated list of columns to be concatenated during indexing. You can also specify any expression allowable for the select statement column list for the base table. This includes expressions, PL/SQL functions, column aliases, and so on.
NUMBER and DATE column types are supported. They are converted to text before indexing using the default format mask. The TO_CHAR function can be used in the column list for formatting.
RAW and BLOB columns are directly concatenated as binary data.
LONG, LONG RAW, NCHAR, and NCLOB, nested table columns and collections are not supported.
You can specify no more than 500 columns.

Attribute	Attribute Value
columns	Specify a comma separated list of columns to be concatenated during indexing. You can also specify any expression allowable for the select statement column list for the base table. This includes expressions, PL/SQL functions, column aliases, and so on. NUMBER and DATE column types are supported. They are converted to text before indexing using the default format mask. The TO_CHAR function can be used in the column list for formatting. RAW and BLOB columns are directly concatenated as binary data. LONG, LONG RAW, NCHAR, and NCLOB, nested table columns and collections are not supported. You can specify no more than 500 columns.

Indexing and DML

To index, you must create a dummy column to specify in the CREATE INDEX statement. This column's contents are not made part of the virtual document, unless its name is specified in the columns attribute.

The index is synchronized only when the dummy column is updated. You can create triggers to propagate changes if needed.

Security

Only CTXSYS is allowed to create preferences for the MULTI_COLUMN_DATASTORE type. Any other user who attempts to create a MULTI_COLUMN_DATASTORE preference receives an error.

Oracle makes this restriction because when the columns attribute contains a function call, the call is made by the CTXSYS schema. The potential exists for a malicious CTXAPP users to execute arbitrary functions for which they do not have execute permission.

If this is too restrictive, you can create a stored procedure under CTXSYS to create MULTI_COLUMN_DATASTORE preferences. The effective user is CTXSYS, who creates and owns the preferences. However, you can call this procedure from any schema as CTXSYS.

Note:
For even better security, you can drop your CTXSYS stored procedures or preferences immediately after creating the index.For optimal security, consider using the USER_DATASTORE type, which checks user permissions during indexing.

MULTI_COLUMN_DATASTORE Example

The following example creates a multi-column datastore preference called my_multi with three text columns:

begin
ctx_ddl.create_preference('my_multi', 'MULTI_COLUMN_DATASTORE');
ctx_ddl.set_attribute('my_multi', 'columns', 'column1, column2, column3');
end;

Tagging Behavior

During indexing, the system creates a virtual document for each row. The virtual document is composed of the contents of the columns concatenated in the listing order with column name tags automatically added. For example:

create table mc(id primary key, name varchar2(10), address varchar2(80));
insert into mc values(1, 'John Smith', '123 Main Street');
exec ctx_ddl.set_attibute('mymds', 'columns', 'name, address');

This produces the following virtual text for indexing:

<NAME>
John Smith
</NAME>
<ADDRESS>
123 Main Street
</ADDRESS>

The system indexes the text between the tags, ignoring the tags themselves. To index these tags as sections, you can optionally create field sections with the BASIC_SECTION_GROUP.

Note:
No section group is created when you use the MULTI_COLUMN_DATASTORE. To create sections for these tags, you must create a section group.

When you use expressions or functions, the tag is composed of the first 30 characters of the expression unless a column alias is used.

For example, if your expression is as follows:

exec ctx_ddl.set_attibute('mymds', 'columns', '4 + 17');

then it produces the following virtual text:

<4 + 17>
21
</4 + 17>

If your expression is as follows:

exec ctx_ddl.set_attibute('mymds', 'columns', '4 + 17 col1');

then it produces the following virtual text:

<col1>
21
<col1>

The tags are in uppercase unless the column name or column alias is in lowercase and surrounded by double quotes. For example:

exec ctx_ddl.set_attibute('mymds', 'COLUMNS', 'foo');

produces the following virtual text:

<FOO>
content of foo
</FOO>

For lowercase tags, use the following:

exec ctx_ddl.set_attibute('mymds', 'COLUMNS', 'foo "foo"');

This expression produces:

<foo>
content of foo
</foo>

PROCEDURE_FILTER New Filter Type

Use the PROCEDURE_FILTER filter preference type to filter your documents with a stored procedure. The stored procedure is called each time a document needs to be filtered.

This type has the following attributes:

Attribute Purpose Allowable Values

procedure

Name of the filter stored procedure.

Any CTXSYS owned procedure. The procedure can be a PL/SQL or Java stored procedure.

input_type

Type of input argument for stored procedure.

VARCHAR2, BLOB, CLOB, FILE

output_type

Type of output argument for stored procedure.

VARCHAR2, CLOB, FILE

rowid_parameter

Include rowid parameter?

TRUE/FALSE

format_parameter

Include format parameter?

TRUE/FLASE

charset_parameter

Include charset parameter?

TRUE/FLASE

procedure

Specify the name of the stored procedure to use for filtering. The procedure can be a PL/SQL or Java stored procedure. The procedure can be a safe callout or call a safe callout.

The procedure must be owned by CTXSYS and have one of the following signatures:

PROCEDURE(IN BLOB, IN OUT NOCOPY CLOB)
PROCEDURE(IN CLOB, IN OUT NOCOPY CLOB)
PROCEDURE(IN VARCHAR, IN OUT NOCOPY CLOB)
PROCEDURE(IN BLOB, IN OUT NOCOPY VARCHAR2)
PROCEDURE(IN CLOB, IN OUT NOCOPY VARCHAR2)
PROCEDURE(IN VARCHAR2, IN OUT NOCOPY VARCHAR2)
PROCEDURE(IN BLOB, IN VARCHAR2)
PROCEDURE(IN CLOB, IN VARCHAR2)
PROCEDURE(IN VARCHAR2, IN VARCHAR2)

The first argument is the content of the unfiltered row as passed out by the datastore. The second argument is for the procedure to pass back the filtered document text.

The procedure attribute is mandatory and has no default.

input_type

Specify the type of the input argument of the filter procedure. You can specify one of the following:

Type Description

BLOB

The input argument is of type BLOB. The unfiltered document is contained in the BLOB passed in.

CLOB

The input argument is of type CLOB. The unfiltered document is contained in the CLOB passed in.
No pre-filtering or character set conversion is done. If the datastore outputs binary data, that binary data is written directly to the CLOB, with NLS doing implicit mapping to character data as best it can.

VARCHAR2

The input argument is of type VARCHAR2. The unfiltered document is contained in the VARCHAR2 passed in.
The document can be a maximum of 32767 bytes of data. If the unfiltered document is greater than this length, an error is raised for the document and the filter procedure is not called.

FILE

The input argument is of type VARCHAR2. The unfiltered document content is contained in a temporary file in the file system whose filename is stored in the VARCHAR2 passed in.
For example, the value of the passed-in VARCHAR2 might be 'tmp/mydoc.tmp' which means that the document content is stored in the file '/tmp/mydoc.tmp'.
The file input type is useful only when your procedure is a safe callout, which can read the file.

The input_type attribute s not mandatory. If not specified, BLOB is the default.

output_type

Specify the type of output argument of the filter procedure. You can specify one of the following types:

Type Description

CLOB

The output argument is IN OUT NOCOPY CLOB. Your procedure must write the filtered content to the CLOB passed in.

VARCHAR2

The output argument is IN OUT NOCOPY VARCHAR2. Your procedure must write the filtered content to the VARCHAR2 variable passed in.

FILE

The output argument must be IN VARCHAR2. On entering the filter procedure, the output argument is the name of a temporary file. The filter procedure must write the filtered contents to this named file.
Using a FILE output type is useful only when the procedure is a safe callout, which can write to the file.

The output_type attribute is not mandatory. If not specified, CLOB is the default.

rowid_ parameter

When you specify TRUE, the rowid of the document to be filtered is passed as the first parameter, before the input and output parameters.

For example, with INPUT_TYPE BLOB, OUTPUT_TYPE CLOB, and ROWID_PARAMETER TRUE, the filter procedure must have the signature as follows:

procedure(in rowid, in blob, in out nocopy clob)

This attribute is useful for when your procedure requires data from other columns or tables. This attribute is not mandatory. The default is FALSE.

format_parameter

When you specify TRUE, the value of the format column of the document being filtered is passed to the filter procedure before input and output parameters, but after the rowid parameter, if enabled.

You specify the name of the format column at index time in the parameters string, using the keyword 'format column <columnname>'. The parameter type must be IN VARCHAR2.

The format column value can be read via the rowid parameter, but this attribute allows a single filter to work on multiple table structures, because the format attribute is abstracted and does not require the knowledge of the name of the table or format column.

FORMAT_PARAMETER is not mandatory. The default is FALSE.

charset_parameter

When you specify TRUE, the value of the charset column of the document being filtered is passed to the filter procedure before input and output parameters, but after the rowid and format parameter, if enabled.

You specify the name of the charset column at index time in the parameters string, using the keyword 'charset column <columnname>'. The parameter type must be IN VARCHAR2.

CHARSET_PARAMETER attribute is not mandatory. The default is FALSE.

Parameter Order

ROWID_PARAMETER, FORMAT_PARAMETER, and CHARSET_PARAMETER are all independent. The order is rowid, the format, then charset, but the filter procedure is passed only the minimum parameters required.

For example, assume that INPUT_TYPE is BLOB and OUTPUT_TYPE is CLOB. If your filter procedure requires all parameters, the procedure signature must be:

(id IN ROWID, format IN VARCHAR2, charset IN VARCHAR2, input IN BLOB, output IN 
OUT NOCOPY CLOB)

If your procedure requires only the ROWID, then the procedure signature must be:

(id IN ROWID,input IN BLOB, ouput IN OUT NOCOPY CLOB)

Create Index Requirements

In order to create an index using a PROCEDURE_FILTER preference, the index owner must have execute permission on the procedure.Oracle checks this at index time, which is similar to the security measures for USER_DATASTORE.

Error Handling

The filter procedure can raise any errors needed through the normal PL/SQL raise_application_error facility. These errors are propagated to the CTX_USER_INDEX_ERRORS view or reported to the user, depending on the context in which the filter is invoked.

Procedure Filter Preference Example

Consider a filter procedure CTXSYS.NORMALIZE that you define with the following signature:

PROCEDURE NORMALIZE(id IN ROWID, charset IN VARCHAR2, input IN CLOB, 
output IN OUT NOCOPY VARCHAR2);

To use this procedure as your filter, you set up your filter preference as follows:

begin
ctx_ddl.create_preference('myfilt', 'procedure_filter');
ctx_ddl.set_attribute('myfilt', 'procedure', 'normalize');
ctx_ddl.set_attribute('myfilt', 'input_type', 'clob');
ctx_ddl.set_attribute('myfilt', 'output_type', 'varchar2');
ctx_ddl.set_attribute('myfilt', 'rowid_parameter', 'TRUE');
ctx_ddl.set_attribute('myfilt', 'charset_parameter', 'TRUE');
end;

Prefix Indexing New Feature

You can create a prefix index to improve right-truncated wildcard searches such as TO%. Without a prefix index, right truncated wildcard queries are expanded using equivalence. This type of expansion can possibly result in large wordlists, degrading query performance.

To enable prefix indexing, use the BASIC_WORDLIST preference type. The following new attributes have been added:

Table 2-2

Attribute	Attribute Values
index_prefix	Specify YES to enable prefix indexing. Prefix indexing improves performance for right truncated wildcard searches such as TO%. Defaults to NO.
prefix_length_min	Specify the minimum length of indexed prefixes. Defaults to 1.
prefix_length_max	Specify the maximum length of indexed prefixes. Defaults to 64.

index_prefix

Specify yes to enable prefix indexing. Prefix indexing improves performance for right truncated wildcard searches such as TO%. Defaults to NO.

Note:
Enabling prefix indexing increases index size.

Prefix indexing chops up tokens into multiple prefixes to store in the $I table.For example, words TOKEN and TOY are normally indexed like this in the $I table:

Token Type Information

TOKEN

0

DOCID 1 POS 1

TOY

0

DOCID 1 POS 3

With prefix indexing, Oracle indexes the prefix substrings of these tokens as follows with a new token type of 6:

Token Type Information

TOKEN

0

DOCID 1 POS 1

TOY

0

DOCID 1 POS 3

T

6

DOCID 1 POS 1 POS 3

TO

6

DOCID 1 POS 1 POS 3

TOK

6

DOCID 1 POS 1

TOKE

6

DOCID 1 POS 1

TOKEN

6

DOCID 1 POS 1

TOY

6

DOCID 1 POS 3

Wildcard searches such as TO% are now faster because Oracle does no expansion of terms and merging of result sets. To obtain the result, Oracle need only examine the (TO,6) row.

prefix_length_min

Specify the minimum length of indexed prefixes. Defaults to 1.

For example, setting prefix_length_min to 3 and prefix_length_max to 5 indexes all prefixes between 3 and 5 characters long.

Note:
A wildcard search whose pattern is below the minimum length or above the maximum length is searched using the slower method of equivalence expansion and merging.

prefix_length_max

Specify the maximum length of indexed prefixes. Defaults to 64.

For example, setting prefix_length_min to 3 and prefix_length_max to 5 indexes all prefixes between 3 and 5 characters long.

Note:
A wildcard search whose pattern is below the minimum length or above the maximum length is searched using the slower method of equivalence expansion and merging.

Enabling Sub-string and Prefix Indexing

The following example sets the wordlist preference for prefix indexing. The example specifies that Oracle create token prefixes between 3 and 4 characters long:

begin 

ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST'); 
ctx_ddl.set_attribute('mywordlist','INDEX_PREFIX','YES');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH',3);
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', 4);

end

Fuzzy Operator New Syntax

Use the fuzzy operator to expand queries to include words that are spelled similarly to the specified term. This type of expansion is helpful for finding more accurate results when there are frequent misspellings in your document set.

The new fuzzy syntax enables you to rank the result set so that documents that contain words with high similarity to the query word are scored higher than documents with lower similarity. You can also limit the number of expanded terms.

Unlike stem expansion, the number of words generated by a fuzzy expansion depends on what is in the index. Results can vary significantly according to the contents of the index.

Supported Languages

Oracle8i interMedia Text supports fuzzy definitions for English, German, Italian, Dutch, Spanish, and OCR.

Stopwords

If the fuzzy expansion returns a stopword, the stopword is not included in the query or highlighted by CTX_DOC.HIGHLIGHT or CTX_DOC.MARKUP.

Base-Letter Conversion

If base-letter conversion is enabled for a text column and the query expression contains a fuzzy operator, Oracle operates on the base-letter form of the query.

New Syntax

fuzzy(term, score, numresults, weight)

Parameter Description

term

Specify the word on which to perform the fuzzy expansion. Oracle expands term to include words only in the index.

score

Specify a similarity score. Terms in the expansion that score below this number are discarded. Use a number between 1 and 80. The default is 60.

numresults

Specify the maximum number of terms to use in the expansion of term. Use a number between 1 and 5000. The default is 100.

weight

Specify WEIGHT or W for the results to be weighted according to their similarity scores.
Specify NOWEIGHT or N for no weighting of results.

Examples

Consider the CONTAINS query:

...CONTAINS(TEXT, 'fuzzy(government, 70, 6, weight)', 1) > 0;

This query expands to the first six fuzzy variations of government in the index that have a similarity score over 70.

In addition, documents in the result set are weighted according to their similarity to government. Documents containing words most similar to government receive the highest score.

You can skip unnecessary parameters using the appropriate number of commas. For example:

'fuzzy(government,,,weight)'

Backward Compatibility

The old fuzzy syntax from previous releases is still supported. This syntax is as follows:

Parameter Description

?term

Expands term to include all terms with similar spellings as the specified term.

CTX_DDL.ADD_INDEX New Procedure

Use this procedure to add a B-tree index to a catalog index preference. You create a catalog index preference to create a CTXCAT index.

Note:
Invoking CTX_DDL.ADD_INDEX after creating a CTXCAT index does not update the domain index to include the new index. You must add B-tree indexes with this procedure and then create your CTXCAT index with CREATE INDEX.

Syntax

CTX_DDL.ADD_INDEX(

set_name in varchar2,
column_list varchar2,
storage_clause varchar2);

set_name

Specify the name of the index set.

column_list

Specify a comma separated list of columns upon which to create the B-tree index. Order your columns according to your queries.

For example, if your structured query clause is 'column1=99 order by column2', specify 'column1, column2' for optimal query performance.

Similarly, if your structured query clause is 'column2=200 order by column1', specify 'column2, column1' for optimal query performance.

storage_clause

Specify a storage clause.

Example

Indexing

Consider a table called AUCTION with the following schema:

create table auction(

item_id number,
title varchar2(100),
category_id number,
price number,
bid_close date);

You can create a catalog index to support the different types of structured queries a user might enter.

To create the indexes, first create the index set preference then add the required indexes to it.

The following example creates the index set preference and adds five different indexes to it:

begin

ctx_ddl.create_index_set('auction_iset');
ctx_ddl.add_index('auction_iset','bid_close');               /* index A */
ctx_ddl.add_index('auction_iset','category_id, bid_close');  /* index B */
ctx_ddl.add_index('auction_iset','bid_close, category_id');  /* index C */
ctx_ddl.add_index('auction_iset','price, bid_close');        /* index D */
ctx_ddl.add_index('auction_iset','bid_close, price');        /* index E */

end;

Create the combined catalog index with CREATE INDEX as follows:

create index auction_titlex on AUCTION(title) indextype is CTXCAT parameters 
('index set auction_iset');

Querying

To query the title column for the word camera, you can issue regular and mixed queries as follows:

select from AUCTION where CATSEARCH(title, 'camera', NULL)>0;

The following query uses index A:

select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000')>0;

The following query uses index B:

select from AUCTION where CATSEARCH(title, 'camera', 'category_id=99 order by 
bid_close desc')>0;

The following query uses index C:

select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000 
order by category_id')>0;

The following query uses index D:

select from AUCTION where CATSEARCH(title, 'camera', 'price=200 order by bid_
close')>0;

The following query uses index E:

select from AUCTION where CATSEARCH(title, 'camera', 'bid_close=20-FEB-2000 
order by price')>0;

CTX_DDL.CREATE_INDEX_SET New Procedure

Use this procedure to create an index set for CTXCAT index types. You name this index set in the parameter clause of CREATE INDEX when you create a CTXCAT index.

Syntax

CTX_DDL.CREATE_INDEX_SET(set_name in  varchar2);

set_name

Specify the name of the index set. You name this index set in the parameter clause of CREATE INDEX when you create a CTXCAT index.

CTX_DDL.DROP_INDEX_SET New Procedure

Use this procedure to drop an index set.

Syntax

CTX_DDL.DROP_INDEX_SET(set_name in varchar2);

set_name

Specify the name of the index set to drop.

CTX_DDL.REMOVE_INDEX New Procedure

Use this procedure to remove a column from the column list.

Syntax

CTX_DDL.REMOVE_INDEX(set_name in varchar2, column_list in varchar2);

set_name

Specify the name of the index set to drop.

column_list

Specify the column list to remove from the index set preference.

CTX_DDL.OPTIMIZE_INDEX Updated Syntax

Use this procedure to optimize the index. You optimize your index after you synchronize it. Optimizing the index removes old data and minimizes index fragmentation. Optimizing the index can improve query response time

You can optimize in fast, full, or token mode. In token mode, you specify a specific token to be optimized. You can use token mode to optimize index tokens that are frequently searched, without spending time on optimizing tokens that are rarely referenced. An optimized token can improve query response time for that token.

Note:
Optimizing an index can result in better response time only if you insert, delete, or update documents in your base table after your initial indexing operation.

Using this procedure to optimize the index is the same as optimizing with the ALTER INDEX statement.

Syntax

CTX_DDL.OPTIMIZE_INDEX( 

idx_name  in  varchar2, 
optlevel  in  varchar2, 
maxtime   in  number default null 
token     in varchar2 default null

);

idx_name

Specify the name of the index.

optlevel

Specify optimization level as a string. You can specify one of the following methods for optimization:

Value Description

FAST or
CTX_DDL.OPTLEVEL_FAST

This method compacts fragmented rows. However, old data is not removed.

FULL or
CTX_DDL.OPTLEVEL_FULL

In this mode you can optimize the entire index or a portion of the index. This method compacts rows and removes old data.

TOKEN

This method lets you specify a specific token to be optimized. Oracle does a FULL optimization on the token you specify with token.
Use this method to optimize those tokens that are searched frequently.

maxtime

Specify maximum optimization time, in minutes, for FULL optimize.

When you specify the symbol CTX_DDL.MAXTIME_UNLIMITED (or pass in NULL), the entire index is optimized. This is the default.

token

Specify the token to be optimized.

Example

The following two examples optimize the index for fast optimization.

begin 
ctx_ddl.optimize_index('myidx','FAST'); 
end;

begin
ctx_ddl.optimize_index('myidx',CTX_DDL.OPTLEVEL_FAST);
end;

The following example optimizes the index token Oracle:

begin
ctx_ddl.optimize_index('myidx','token', TOKEN=>'Oracle');
end;

CTX_DDL.CREATE_STOPLIST Updated Syntax

Use this procedure to create a new, empty stoplist. Stoplists can contain words or themes that are not to be indexed.

You can also create multi-language stoplists to hold language-specific stopwords. A multi-lingual stoplist is useful when you index a table that contains documents in different languages, such as English, German, and Japanese.

You can add either stopwords, stopclasses, or stopthemes to a stoplist using ADD_STOPWORD, ADD_STOPCLASS, or ADD_STOPTHEME.

You can specify a stoplist in the parameter string of CREATE INDEX or ALTER INDEX to override the default stoplist CTXSYS.DEFAULT_STOPLIST.

Syntax

CTX_DDL.CREATE_STOPLIST(

stoplist_name IN VARCHAR2,
stoplist_type IN VARCHAR2 DEFAULT 'BASIC_STOPLIST');

stoplist_name

Specify the name of the stoplist to be created.

stoplist_type

Specify MULTI_STOPLIST to create a stoplist with language-specific stopwords.

Note:
When indexing a multi-lingual table with a multi-lingual stoplist, your table must have a language column.

Specify BASIC_STOPLIST to create a stoplist for a single language. This is the default.

Example

The following code creates a stoplist called mystop:

begin
ctx_ddl.create_stoplist('mystop', 'BASIC_STOPLIST');
end;

CTX_DDL.ADD_STOPWORD Updated Syntax

Use this procedure to add a single stopword to a stoplist. To create a list of stopwords, you must call this procedure once for each word.

Note:
The maximum number of stopwords, stopthemes, and stopclasses you can add to a stoplist is 4095.

Syntax

CTX_DDL.ADD_STOPWORD(

stoplist_name  in   varchar2,
stopword       in   varchar2,
language       in varchar2 default NULL

);

stoplist_name

Specify the name of the stoplist.

stopword

Specify the stopword to be added.

Language-specific stopwords must be unique across the other stopwords specific to the language. For example, it is valid to have a German die and an English die in the same stoplist.

The maximum number of stopwords, stopthemes, and stopclasses you can add to a stoplist is 4095.

language

Specify the language of stopword when stoplist_name is of type MULTI_STOPWORD. You must specify the NLS name or abbreviation of an Oracle-supported language.

Otherwise, specify NULL.

Example

Single Language Stoplist

The following example adds the stopwords because, notwithstanding, nonetheless, and therefore to the stoplist mystop:

begin
ctx_ddl.add_stopword('mystop', 'because');
ctx_ddl.add_stopword('mystop', 'notwithstanding');
ctx_ddl.add_stopword('mystop', 'nonetheless');
ctx_ddl.add_stopword('mystop', 'therefore');
end;

Multi-Language Stoplist

The following example adds the German word die to a multi-language stoplist:

begin

ctx_ddl.add_stopword('mulitstop', 'die', 'd');

end;

Note:
You can add stopwords after you create the index with ALTER INDEX.

CTX_THES.CREATE_TRANSLATION New Procedure

Use this procedure to create a new translation for a phrase in a specified language.

Syntax

CTX_THES.CREATE_TRANSLATION(tname       in    varchar2,
                            phrase      in    varchar2,
                            language    in    varchar2,
                            translation in    varchar2);

tname

Specify the name of the thesaurus, using no more than 30 characters.

phrase

Specify the phrase in the thesaurus to which to add a translation. Phrase must already exist in the thesaurus, or an error is raised.

language

Specify the language of the translation, using no more than 10 characters.

translation

Specify the translated term, using no more than 256 characters.

If a translation for this phrase already exists, this new translation is added without removing that original translation, so long as that original translation is not the same. Adding the same translation twice results in an error.

Example

The following code adds the Spanish translation for dog to my_thes:

begin
   ctx_thes.create_translation('my_thes', 'dog', 'SPANISH', 'PERRO');
end;

CTX_THES.DROP_TRANSLATION New Procedure

Use this procedure to remove one or more translations for a phrase.

Syntax

CTX_THES.DROP_TRANSLATION (tname       in    varchar2,
                           phrase      in    varchar2,
                           language    in    varchar2 default null,
                           translation in    varchar2 default null);

tname

Specify the name of the thesaurus, using no more than 30 characters.

phrase

Specify the phrase in the thesaurus to which to remove a translation. The phrase must already exist in the thesaurus or an error is raised.

language

Optionally, specify the language of the translation, using no more than 10 characters. If not specified, the translation must also not be specified and all translations in all languages for the phrase are removed. An error is raised if the phrase has no translations.

translation

Optionally, specify the translated term to remove, using no more than 256 characters. If no such translation exists, an error is raised.

Example

The following code removes the Spanish translation for dog:

begin
   ctx_thes.drop_translation('my_thes', 'dog', 'SPANISH', 'PERRO');
end;

CTX_THES.HAS_RELATION New Procedure

Use this procedure to test that a thesaurus relation exists without actually doing the expansion. The function returns TRUE if the phrase has any of the relations in the specified list.

Syntax

CTX_THES.HAS_RELATION(phrase in varchar2, 
                      rel in varchar2, 
                      tname in varchar2 default 'DEFAULT')
  returns boolean;

phrase

Specify the phrase.

rel

Specify a single thesaural relation or a comma-separated list of relations, except PT. Specify 'ANY' for any relation.

tname

Specify the thesaurus name.

Example

The following example returns TRUE if the phrase "cat" in the DEFAULT thesaurus has any broader terms or broader generic terms:

begin 
 ctx_thes.has_relation('cat','BT,BTG');
end;

CTX_THES.UPDATE_TRANSLATION New Procedure

Use this procedure to update an existing translation.

Syntax

CTX_THES.UPDATE_TRANSLATION(tname       in     varchar2,
                             phrase      in     varchar2,
                             language    in     varchar2,
                             translation in     varchar2,
                             new_translation in varchar2);

tname

Specify the name of the thesaurus, using no more than 30 characters.

phrase

Specify the phrase in the thesaurus to which to update a translation. The phrase must already exist in the thesaurus or an error is raised.

language

Specify the language of the translation, using no more than 10 characters.

translation

Specify the translated term to update no more than 256 characters. If no such translation exists, an error is raised.

You can specify NULL if there is only one translation for the phrase. An error is raised if there is more than one translation for the term in the specified language.

new_translation

Optionally, specify the new form of the translated term.

Example

The following code updates the Spanish translation for dog:

begin
   ctx_thes.drop_translation('my_thes', 'dog', 'SPANISH', 'PERRO', 'CAN');
end;

CTX_DOC.TOKENS New Procedure

Use this procedure to identify all text tokens in a document. The tokens returned are those tokens which are inserted into the index.

Stop words are not returned. Section tags are not returned because they are not text tokens.

Syntax 1: In-Memory Table Storage

CTX_DOC.TOKENS(index_name      IN VARCHAR2,
               textkey         IN VARCHAR2,
               restab          IN OUT NOCOPY TOKEN_TAB);

Syntax 2: Result Table Storage

CTX_DOC.TOKENS(index_name      IN VARCHAR2,
               textkey         IN VARCHAR2,
               restab          IN VARCHAR2,
               query_id        IN NUMBER DEFAULT 0);

index_name

Specify the name of the index for the text column.

textkey

Specify the unique identifier (usually the primary key) for the document.

The textkey parameter can be one of the following:

a single column primary key value
encoded specification for a composite (multiple column) primary key.
the rowid of the row containing the document

You toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.

restab

You can specify that this procedure store results to either a table or to an in-memory PL/SQL table.

The tokens returned are those tokens which are inserted into the index. Stop words are not returned. Section tags are not returned because they are not text tokens.

Token Table

To store results to a table, specify the name of the table. Token tables can be named anything, but must include the following columns, with names and data types as specified.

Table 2-3

Column Name	Type	Description
QUERY_ID	NUMBER	The identifier for the results generated by a particular call to CTX_DOC.TOKEN (only populated when table is used to store results from multiple TOKEN calls)
TOKEN	VARCHAR2(64)	The token string in the text.
OFFSET	NUMBER	The position of the token in the document, relative to the start of document which has a position of 1.
LENGTH	NUMBER	The character length of the token.

In-Memory Table

To store results to an in-memory table, specify the name of the in-memory table of type TOKEN_TAB. The TOKEN_TAB datatype is defined as follows:

type token_rec is record (

token varchar2(64);
offset number;
length number;

);

type token_tab is table of token_rec index by binary_integer;

CTX_DOC.TOKENS clears the TOKEN_TAB you specify before the operation.

query_id

Specify the identifier used to identify the row(s) inserted into restab.

Examples

In-Memory Tokens

The following example generates the tokens for document 1 and stores them in an in-memory table, declared as the_tokens. The example then loops through the table to display the document tokens.

declare
 the_tokens ctx_doc.token_tab;

begin
 ctx_doc.tokens('myindex','1',the_tokens);
 for i in 1..the_tokens.count loop
  dbms_output.put_line(the_tokens(i).token);
  end loop;
end;

CTX_DOC.THEMES Updated Syntax

Use the CTX_DOC.THEMES procedure to generate a list of themes for a document. Each theme is stored as a row in either a result table or in-memory PL/SQL table you specify.

Syntax 1: In-Memory Table Storage

CTX_DOC.THEMES(

index_name      IN VARCHAR2,
textkey         IN VARCHAR2,
restab          IN OUT THEME_TAB,
full_themes     IN BOOLEAN DEFAULT FALSE,
numthemes       IN NUMBER DEFAULT 50);

Syntax 2: Result Table Storage

CTX_DOC.THEMES(

index_name      IN VARCHAR2,
textkey         IN VARCHAR2,
restab          IN VARCHAR2,
query_id        IN NUMBER DEFAULT 0,
full_themes     IN BOOLEAN DEFAULT FALSE,
numthemes       IN NUMBER DEFAULT 50);

index_name

Specify the name of the index for the text column.

textkey

Specify the unique identifier (usually the primary key) for the document.

The textkey parameter can be one of the following:

a single column primary key value
an encoded specification for a composite (multiple column) primary key. When textkey is a composite key, you must encode the composite textkey string using the CTX_DOC.PKENCODE procedure.
the rowid of the row containing the document

You toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.

restab

You can specify that this procedure store results to either a table or to an in-memory PL/SQL table.

To store results in a table, specify the name of the table.

See Also:
For more information about the structure of the theme result table, see the theme table description in the see Oracle8i interMedia Text Reference.

To store results in an in-memory table, specify the name of the in-memory table of type THEME_TAB. The THEME_TAB datatype is defined as follows:

type theme_rec is record (
   theme varchar2(2000);
   weight number;
);

type theme_tab is table of theme_rec index by binary_integer;

CTX_DOC.THEMES clears the THEME_TAB you specify before the operation.

query_id

Specify the identifier used to identify the row(s) inserted into restab.

full_themes

Specify whether this procedure generates a single theme or a hierarchical list of parent themes (full themes) for each document theme.

Specify TRUE for this procedure to write full themes to the THEME column of the result table.

Specify FALSE for this procedure to write single theme information to the THEME column of the result table. This is the default.

numthemes

Specify the number of themes to retrieve. For example, if you specify 10, the top 10 themes are returned for the document. The default is 50.

If you specify 0 or NULL, this procedure returns all themes in a document. If the document contains more than 50 themes, only the top 50 themes show conceptual hierarchy.

Examples

In-Memory Themes

The following example generates the top 10 themes for document 1 and stores them in an in-memory table called the_themes. The example then loops through the table to display the document themes.

declare
 the_themes ctx_doc.theme_tab;

begin
 ctx_doc.themes('myindex','1',the_themes, numthemes=>10);
 for i in 1..the_themes.count loop
  dbms_output.put_line(the_themes(i).theme||':'||the_themes(i).weight);
  end loop;
end;

Theme Table

The following example creates a theme table called CTX_THEMES:

create table CTX_THEMES (query_id number, 
                         theme varchar2(2000), 
                         weight number);

Single Themes

To obtain a list of the top 20 themes where each element in the list is a single theme, issue a statement like the following:

begin

 ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => FALSE, 
 numthemes=> 20);

end;

Full Themes

To obtain a list of the top 20 themes where each element in the list is a hierarchical list of parent themes, issue a statement like the following:

begin

ctx_doc.themes('newsindex','34','CTX_THEMES',1,full_themes => TRUE,      
numthemes=>20);

end;

CTX_DOC.GIST Updated Syntax

Use the CTX_DOC.GIST procedure to generate a gist and theme summaries for a document. You can generate paragraph-level or sentence-level gists or theme summaries.

Syntax 1: In-Memory Storage

CTX_DOC.GIST(

index_name    IN VARCHAR2, 
textkey       IN VARCHAR2, 
restab        IN OUT CLOB, 
glevel        IN VARCHAR2 DEFAULT 'P',
pov           IN VARCHAR2 DEFAULT NULL,
numParagraphs IN NUMBER DEFAULT 16,
maxPercent    IN NUMBER DEFAULT 10,
numthemes   IN NUMBER DEFAULT 50);

Syntax 2: Result Table Storage

CTX_DOC.GIST(

index_name    IN VARCHAR2, 
textkey       IN VARCHAR2, 
restab        IN VARCHAR2, 
query_id      IN NUMBER DEFAULT 0,
glevel        IN VARCHAR2 DEFAULT 'P',
pov           IN VARCHAR2 DEFAULT NULL,
numParagraphs IN NUMBER DEFAULT 16,
maxPercent    IN NUMBER DEFAULT 10,
numthemes     IN NUMBER DEFAULT 50);

index_name

Specify the name of the index associated with the text column containing the document identified by textkey.

textkey

Specify the unique identifier (usually the primary key) for the document.

The textkey parameter can be one of the following:

a single column primary key value
an encoded specification for a composite (multiple column) primary key. To encode a composite textkey, use the CTX_DOC.PKENCODE procedure.
the rowid of the row containing the document

You toggle between primary key and rowid identification using CTX_DOC.SET_KEY_TYPE.

restab

You can specify that this procedure store the gist and theme summaries to either a table or to an in-memory CLOB.

To store results to a table specify the name of the table.

See Also:
For more information about the structure of the gist result table, see the gist table description in the Oracle8i interMedia Text Reference.

To store results in memory, specify the name of the CLOB locator. If restab is NULL, a temporary CLOB is allocated and returned. You must de-allocate the locator after using it.

If restab is not NULL, the CLOB is truncated before the operation.

query_id

Specify an identifier to use to identify the row(s) inserted into restab.

glevel

Specify the type of gist or theme summary to produce. The possible values are:

P for paragraph
S for sentence

The default is P.

pov

Specify whether a gist or a single theme summary is generated. The type of gist or theme summary generated (sentence-level or paragraph-level) depends on the value specified for glevel.

To generate a gist for the entire document, specify a value of `GENERIC' for pov. To generate a theme summary for a single theme in a document, specify the theme as the value for pov.

When using result table storage and you do not specify a value for pov, this procedure returns the generic gist plus up to fifty theme summaries for the document.

When using in-memory result storage to a CLOB, you must specify a pov. However, if you do not specify pov, this procedure generates only a generic gist for the document.

Note:
The pov parameter is case sensitive. To return a gist for a document, specify `GENERIC' in all uppercase. To return a theme summary, specify the theme exactly as it is generated for the document.
Only the themes generated by CTX_DOC.THEMES Updated Syntax for a document can be used as input for pov.

numParagraphs

Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries. The default is 16.

Note:
The numParagraphs parameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the maxPercent parameter.
This means that the system always returns the smallest size gist or theme summary.

maxPercent

Specify the maximum number of document paragraphs (or sentences) selected for the document gist or theme summaries as a percentage of the total paragraphs (or sentences) in the document. The default is 10.

Note:
The maxPercent parameter is used only when this parameter yields a smaller gist or theme summary size than the gist or theme summary size yielded by the numParagraphs parameter.
This means that the system always returns the smallest size gist or theme summary.

numthemes

Specify the number of theme summaries to produce when you do not specify a value for pov. For example, if you specify 10, this procedure returns the top 10 theme summaries. The default is 50.

If you specify 0 or NULL, this procedure returns all themes in a document. If the document contains more than 50 themes, only the top 50 themes show conceptual hierarchy.

Examples

In-Memory Gist

The following example generates a non-default size generic gist of at most 10 paragraphs. The result is stored in memory in a CLOB locator. The code then de-allocates the returned CLOB locator after using it.

declare
  gklob clob;
  amt number := 40;
  line varchar2(80);

begin
 ctx_doc.gist('newsindex','34','gklob',1,glevel => 'P',pov => 'GENERIC',       
numParagraphs => 10);
  -- gklob is NULL when passed-in, so ctx-doc.gist will allocate a temporary
  -- CLOB for us and place the results there.
  
  dbms_lob.read(gklob, amt, 1, line);
  dbms_output.put_line('FIRST 40 CHARS ARE:'||line);
  -- have to de-allocate the temp lob
  dbms_lob.freetemporary(gklob);
 end;

Result Table Gists

The following example creates a gist table called CTX_GIST:

create table CTX_GIST (query_id  number,
                       pov       varchar2(80), 
                       gist      CLOB);

Gists and Theme Summaries

The following example returns a default sized paragraph level gist for document 34 as well as the top 10 theme summaries in the document:

begin
   ctx_doc.gist('newsindex','34','CTX_GIST', 1, glevel => 'P', numthemes=10);
end;

The following example generates a non-default size gist of at most 10 paragraphs:

begin
  ctx_doc.gist('newsindex','34','CTX_GIST',1,glevel => 'P',pov => 'GENERIC',   
numParagraphs => 10);
end;

The following example generates a gist whose number of paragraphs is at most 10 percent of the total paragraphs in document:

begin 
  ctx_doc.gist('newsindex','34','CTX_GIST',1, glevel =>'P',pov => 'GENERIC',  
maxPercent => 10);
end;

Theme Summary

The following example returns a paragraph level theme summary for insects for document 34. The default theme summary size is returned.

begin
   ctx_doc.gist('newsindex','34','CTX_GIST',1,glevel =>'P', pov => 'insects');
end;

Attribute	Purpose	Allowable Values
procedure	Name of the filter stored procedure.	Any CTXSYS owned procedure. The procedure can be a PL/SQL or Java stored procedure.
input_type	Type of input argument for stored procedure.	VARCHAR2, BLOB, CLOB, FILE
output_type	Type of output argument for stored procedure.	VARCHAR2, CLOB, FILE
rowid_parameter	Include rowid parameter?	TRUE/FALSE
format_parameter	Include format parameter?	TRUE/FLASE
charset_parameter	Include charset parameter?	TRUE/FLASE

Type	Description
BLOB	The input argument is of type BLOB. The unfiltered document is contained in the BLOB passed in.
CLOB	The input argument is of type CLOB. The unfiltered document is contained in the CLOB passed in. No pre-filtering or character set conversion is done. If the datastore outputs binary data, that binary data is written directly to the CLOB, with NLS doing implicit mapping to character data as best it can.
VARCHAR2	The input argument is of type VARCHAR2. The unfiltered document is contained in the VARCHAR2 passed in. The document can be a maximum of 32767 bytes of data. If the unfiltered document is greater than this length, an error is raised for the document and the filter procedure is not called.
FILE	The input argument is of type VARCHAR2. The unfiltered document content is contained in a temporary file in the file system whose filename is stored in the VARCHAR2 passed in. For example, the value of the passed-in VARCHAR2 might be 'tmp/mydoc.tmp' which means that the document content is stored in the file '/tmp/mydoc.tmp'. The file input type is useful only when your procedure is a safe callout, which can read the file.

Token	Type	Information
TOKEN	0	DOCID 1 POS 1
TOY	0	DOCID 1 POS 3
T	6	DOCID 1 POS 1 POS 3
TO	6	DOCID 1 POS 1 POS 3
TOK	6	DOCID 1 POS 1
TOKE	6	DOCID 1 POS 1
TOKEN	6	DOCID 1 POS 1
TOY	6	DOCID 1 POS 3

Parameter	Description
term	Specify the word on which to perform the fuzzy expansion. Oracle expands term to include words only in the index.
score	Specify a similarity score. Terms in the expansion that score below this number are discarded. Use a number between 1 and 80. The default is 60.
numresults	Specify the maximum number of terms to use in the expansion of term. Use a number between 1 and 5000. The default is 100.
weight	Specify WEIGHT or W for the results to be weighted according to their similarity scores. Specify NOWEIGHT or N for no weighting of results.

2 interMedia Text

Overview of New and Updated Features

Catalog (CTXCAT) Index

Indexing Example

Querying Example

Keyless Index

Multi-Column Datastore

URL Datastore Username and Password Support

File and URL Datastore Security

Procedure Filter

Multi-Language Stoplist

Prefix Indexing

Single Token Index Optimization

Document Tokens Service

Thesaurus Translation Calls

New Fuzzy Operator Functionality

Inso Filter Enhancements

Newly Supported Formats

Supported Platforms

Supplied French Knowledge Base

User-Defined Knowledge Bases

Limitations

Knowledge Base Character Set

CTXKBTC Knowledge Base Compiler Supports Stopthemes

Hierarchical Query Feedback

CATSEARCH New SQL Operator

Syntax

Examples

CREATE INDEX Updated Syntax

Purpose

Required Privileges

Syntax for New CATALOG Indextype

Parameter String Limitations for CTXCAT

Supported Preferences for CTXCAT

Table 2-1

CTXCAT Supported Column Types

CTXCAT System Parameters

CTXCAT Example

Syntax for CONTEXT Indextype

ALTER INDEX Updated Syntax

Purpose

REBUILD Syntax

optimize [token index_token | fast | full [maxtime (time | unlimited)]

Examples

Token Optimization

Fast Optimization

Full Optimization

MULTI_COLUMN_DATASTORE New Datastore Type

Indexing and DML

Security

MULTI_COLUMN_DATASTORE Example

Tagging Behavior

PROCEDURE_FILTER New Filter Type

Parameter Order

Create Index Requirements

Error Handling

Procedure Filter Preference Example

Prefix Indexing New Feature

Table 2-2

Enabling Sub-string and Prefix Indexing

Fuzzy Operator New Syntax

Supported Languages

Stopwords

Base-Letter Conversion

New Syntax

Examples

Backward Compatibility

CTX_DDL.ADD_INDEX New Procedure

Syntax

Example

Indexing

Querying

CTX_DDL.CREATE_INDEX_SET New Procedure

Syntax

CTX_DDL.DROP_INDEX_SET New Procedure

Syntax

CTX_DDL.REMOVE_INDEX New Procedure

Syntax

CTX_DDL.OPTIMIZE_INDEX Updated Syntax

Syntax

2
interMedia Text