Oracle Text Application Developer's Guide Release 9.0.1 Part Number A90122-01 |
|
Introduction to Oracle Text , 4 of 8
To query your document collection, you must first index the text column of your text table. Indexing breaks your text into tokens, which are usually words. This creates a CONTEXT index, which records each token and the documents that contain it. An inverted index as such allows for querying on words and phrases. Figure 1-2 shows a text table within Oracle9i and its associated Oracle Text index.
Oracle Text supports the creation of three types of indexes depending on your application and text source. You use the CREATE INDEX statement to create all Oracle Text index types.
The following table describes these indexes and the type of applications you can build with them. The third column shows which query operator to use with the index.
Once your text data is loaded in a table, you can use CREATE INDEX to create a context
index. When you create an index and specify no parameter clause, an index is created with default parameters.
For example, the following command creates a context
index called myindex
on the text
column in the docs
table:
CREATE INDEX myindex ON docs(text) INDEXTYPE IS CTXSYS.CONTEXT;
When you use CREATE INDEX to create a context index without explicitly specifying parameters, the system does the following for all languages by default:
For document filtering to work correctly in your system, you must ensure that your environment is set up correctly to support the Inso filter.
To learn more about configuring your environment to use the Inso filter, see Oracle Text Reference.
Note:
You can always change the default indexing behavior by creating your own preferences and specifying these custom preferences in the parameter clause of CREATE INDEX.
Using the parameter clause with CREATE INDEX, you can customize your context
index. For example, in the parameter clause, you can specify where your text is stored, how you want it filtered for indexing, and whether sections should be created.
To index a set of HTML files loaded in the text column htmlfile
, you can issue the CREATE INDEX statement, specifying datastore, filter and section group parameters as follows:
CREATE INDEX myindex ON doc(htmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('datastore ctxsys.default_datastore filter ctxsys.null_filter section group ctxsys.html_section_group');
See Also:
"Considerations For Indexing" in Chapter 2, "Indexing" for more information about the different ways you can create an index. Oracle Text Reference for more information on the CREATE INDEX statement. |
A CTXCAT index is an index optimized for mixed queries. You can create this type of index when you store small documents or text fragments and associated structured information. To query this index, you use the CATSEARCH operator and specify a structured clause, if any. Query performance with a CTXCAT index is usually better for structured queries than with a CONTEXT index.
You create a CTXRULE index to build a document classification application in which an incoming stream of documents is routed according content. You define the classification rules as queries which you index. You use the MATCHES operator to classify single documents.
Index maintenance is necessary after your application inserts, updates, or deletes documents in your base table.
If your base table is static, that is, you do no updating, inserting or deleting of documents after your initial index, you do not need to maintain your index.
However, if you perform DML operations (inserts, updates, or deletes) on your base table, you must update your index. You can synchronize your index manually with CTX_DDL.SYNC_INDEX.
The following example synchronizes the index myindex
with 2 megabytes of memory:
beginctx_ddl.sync_index('myindex', '2M');end;
If you synchronize your index regularly, you might also consider optimizing your index to reduce fragmentation and to remove old data.
See Also:
"Managing DML Operations for a CONTEXT Index" in Chapter 2, "Indexing" for more information about synchronizing and optimizing the index. |
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|