Oracle Text Application Developer's Guide
Release 9.0.1

Part Number A90122-01
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to beginning of chapter Go to next page

Introduction to Oracle Text , 4 of 8


Indexing Your Documents

Figure 1-2

Text description of ccapp003.gif follows
Text description of the illustration ccapp003.gif

To query your document collection, you must first index the text column of your text table. Indexing breaks your text into tokens, which are usually words. This creates a CONTEXT index, which records each token and the documents that contain it. An inverted index as such allows for querying on words and phrases. Figure 1-2 shows a text table within Oracle9i and its associated Oracle Text index.

Type of Index

Oracle Text supports the creation of three types of indexes depending on your application and text source. You use the CREATE INDEX statement to create all Oracle Text index types.

The following table describes these indexes and the type of applications you can build with them. The third column shows which query operator to use with the index.

Index Type  Application Type  Query Operator 

CONTEXT 

Use this index to build a text retrieval application when your text consists of large coherent documents. You can index documents of different formats such as MS Word, HTML, XML, or plain text.

With a context index, you can customize your index in a variety of ways. 

CONTAINS 

CTXCAT 

Use this index type to improve mixed query performance. Suitable for querying small text fragments with structured criteria like dates, item names, and prices that are stored across columns. 

CATSEARCH 

CTXRULE 

Use a CTXRULE index to build a document classification application. The CTXRULE index is an index created on a table of queries, where each query has a classification.

Single documents (plain text, HTML, or XML) can be classified using the MATCHES operator. 

MATCHES 

Creating a CONTEXT Index

Once your text data is loaded in a table, you can use CREATE INDEX to create a context index. When you create an index and specify no parameter clause, an index is created with default parameters.

For example, the following command creates a context index called myindex on the text column in the docs table:

CREATE INDEX myindex ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT;

General Defaults for All Languages

When you use CREATE INDEX to create a context index without explicitly specifying parameters, the system does the following for all languages by default:

You can always change the default indexing behavior by creating your own preferences and specifying these custom preferences in the parameter clause of CREATE INDEX.

Customizing Your CONTEXT Index

Using the parameter clause with CREATE INDEX, you can customize your context index. For example, in the parameter clause, you can specify where your text is stored, how you want it filtered for indexing, and whether sections should be created.

To index a set of HTML files loaded in the text column htmlfile, you can issue the CREATE INDEX statement, specifying datastore, filter and section group parameters as follows:

CREATE INDEX myindex ON doc(htmlfile) INDEXTYPE IS ctxsys.context PARAMETERS 
('datastore ctxsys.default_datastore filter ctxsys.null_filter section group 
ctxsys.html_section_group');

See Also:

"Considerations For Indexing" in Chapter 2, "Indexing" for more information about the different ways you can create an index.

Oracle Text Reference for more information on the CREATE INDEX statement. 

Creating a CTXCAT Index

A CTXCAT index is an index optimized for mixed queries. You can create this type of index when you store small documents or text fragments and associated structured information. To query this index, you use the CATSEARCH operator and specify a structured clause, if any. Query performance with a CTXCAT index is usually better for structured queries than with a CONTEXT index.

See Also:

"Creating a CTXCAT Index" in Chapter 2, "Indexing" for a complete example. 

Creating a CTXRULE Index

You create a CTXRULE index to build a document classification application in which an incoming stream of documents is routed according content. You define the classification rules as queries which you index. You use the MATCHES operator to classify single documents.

See Also:

"Creating a CTXRULE Index" in Chapter 2, "Indexing" for a complete example. 

Index Maintenance

Index maintenance is necessary after your application inserts, updates, or deletes documents in your base table.

If your base table is static, that is, you do no updating, inserting or deleting of documents after your initial index, you do not need to maintain your index.

However, if you perform DML operations (inserts, updates, or deletes) on your base table, you must update your index. You can synchronize your index manually with CTX_DDL.SYNC_INDEX.

The following example synchronizes the index myindex with 2 megabytes of memory:

begin

ctx_ddl.sync_index('myindex', '2M');
end;

If you synchronize your index regularly, you might also consider optimizing your index to reduce fragmentation and to remove old data.

See Also:

"Managing DML Operations for a CONTEXT Index" in Chapter 2, "Indexing" for more information about synchronizing and optimizing the index. 


Go to previous page Go to beginning of chapter Go to next page
Oracle
Copyright © 1996-2001, Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback