Oracle8i interMedia Text Migration Release 8.1.5 A67845-01 |
|
The chapter discusses the changes to the Text indexing process that might affect your applications. The following topics are covered:
In pre-8.1.5, the index is created with the CTX_DDL package by first creating a policy and then using the policy to create the index.
In 8.1.5, a Text index is created as a special type of extensible index to Oracle using standard SQL. This means that a interMedia Text 8.1.5 index operates like a Oracle index. It has a name by which it is referenced, and policies do not exist.
See Also:
For more information about creating a Text index, see "Procedure for Creating Index" in this chapter. |
In 8.1.5, a single text index can contain both theme and word information. This is different from pre-8.1.5 where you needed a theme index in addition to a text index to issue theme queries.
By default in English, interMedia Text indexes theme information with word information. You can optionally enable and disable theme indexing with your lexer preference.
See Also:
To learn more about indexing theme information, see "Creating Preferences" in this chapter. |
In pre-8.1.5, the system allows you to create more than one index on a text column. This is useful when you want a text column to have a text and theme index.
In 8.1.5, a column can have no more than a single domain index attached to it, which is keeping with Oracle standards. However, a single Text index can contain theme information in addition to word information.
In pre-8.1.5, you can create a ConText index on a view. This might be useful when you need to index documents whose content is pieced together from different tables.
However, Oracle SQL standards does not support creating indexes on views. Therefore in 8.1.5, if you need to create and index documents whose contents are in different tables, you can create a data storage preference using the USER_DATSTORE object, which is new for 8.1.5. With this object, you define a procedure that synthesizes documents at install time.
The pre-8.1.5 procedure for creating an index is
The process for creating an index is simpler because of the following
By default, the system expects your documents to be stored in a text column. Once this requirement is satisfied, you can create a text index using the CREATE INDEX SQL command as an extensible index of type ConText, without explicitly specifying any preferences.
See Also:
For more information about the out-of-box defaults, see Oracle8i interMedia Text Reference. |
The 8.1.5 procedure for creating an index is:
See Also:
For more information about the preference objects available in the 8.1.5 release, see Oracle8i interMedia Text Reference. |
In 8.1.5, the syntax for the CTX_DDL.CREATE_PREFERENCE and CTX_DDL.SET_ATTRIBUTE procedures have changed. In addition, the order in which you call these procedures has changed.
In 8.1.5, you create the preferences then set the attributes, which is the opposite order of what you do in pre-8.1.5.
See Also:
For a complete list of preference objects and their associated attributes, and the syntax for the CTX_DDL.CREATE_PREFERENCE and CTX_DDL.SET_ATTRIBUTE procedures, see the Oracle8i interMedia Text Reference. |
The following example creates a custom data storage preference called mypref that tells the system that the files to be indexed are stored in the operating system. The example then uses CTX_DDL.SET_ATTRIBUTE to set the PATH attribute of to the directory /docs.
begin ctx_ddl.create_preference('mypref', 'FILE_DATASTORE'); ctx_ddl.set_attribute('mypref', 'PATH', '/docs'); end;
In pre-8.1.5, you create an index using CTX_DDL.CREATE_INDEX and name a policy.
In 8.1.5, you create the Text index as a type of extensible index using the CREATE INDEX SQL command. You name the index and optionally specify the preferences such as lexer and filter in the parameter string.
See Also:
To learn more about the CREATE INDEX command syntax, see the Oracle8i interMedia Text Reference. |
The following example creates a Text index called newsindex on the news column in mytable. The index is created with the lexer preference called my_lexer and the stoplist called my_stop. Default attributes are used for the unspecified preferences.
create index newsindex on mytable(news) indextype is ctxsys.context parameters('lexer my_lexer stoplist my_stop');
In pre-8.1.5, you drop preferences using CTX_DDL.DROP_PREFERENCE, and you can only do so when all referenced policies have been deleted from the data dictionary.
In 8.1.5, you drop index preferences with the same procedure CTX_DDL.DROP_PREFERENCE. Because preferences exist separately from the index and because policies do not exist in 8.1.5, you need not drop your index before you drop a preference.
Dropping a preference does not affect the index that is using the dropped preference.
See Also:
To learn more about the syntax for the CTX_DDL.DROP_PREFERENCE procedure, see the Oracle8i interMedia Text Reference. |
The following code drops the preference my_lexer.
begin ctx_ddl.drop_preference('my_lexer'); end;
In pre-8.1.5, you drop an index using CTX_DDL.DROP_INDEX.
In 8.1.5, you drop an index using the DROP INDEX command in SQL.
For example, to drop an index called newsindex, issue the following SQL command:
drop index newsindex;
If Oracle cannot determine the state of the index, for example as a result of an indexing crash, you cannot drop the index as described above. Instead use:
drop index newsindex force;
See Also: To learn more about the DROP INDEX command syntax, see the Oracle8i interMedia Text Reference.
In pre-8.1.5, when an indexing operation fails (creation or optimization), you can resume the operation using CTX_DDL.RESUME_FAILED_INDEX.
In interMedia Text 8.1.5, you resume a failed index operation using the ALTER INDEX command.
See Also:
To learn more about the ALTER INDEX command syntax, see the Oracle8i interMedia Text Reference. |
The following command resumes the indexing operation on newsindex with 2 megabytes of memory:
ALTER INDEX newsindex rebuild parameters('resume memory 2M');
You can rebuild a valid index using ALTER INDEX. You might rebuild an index when you want to index with a new preference.
See Also:
To learn more about the ALTER INDEX command syntax for rebuilding an index, see the Oracle8i interMedia Text Reference. |
The following command rebuilds the index, replacing the lexer preference with my_lexer.
ALTER INDEX newsindex rebuild parameters('replace lexer my_lexer');
In pre-8.1.5 to optimize an index, you use CTX_DDL.OPTIMIZE_INDEX and specify one of five different optimizing methods.
In 8.1.5 to optimize an index, you use the ALTER INDEX command in SQL with the REBUILD parameter. You can optimize the index in either fast or full mode.
See Also:
To learn more about optimizing the index with ALTER INDEX, see the Oracle8i interMedia Text Reference. |
As in pre-8.1.5, the 8.1.5 Text index is updated automatically whenever there is an insert, delete, or update to the base table. A ctxsrv server must be running. This is known as background DML processing.
The following example starts a server and writes all server messages to a file named ctx.log:
ctxsrv -user ctxsys/ctxsys -personality M -log ctx.log &
See Also:
To learn more about background DML with ctxsrv, see the specification for ctxsrv in the Oracle8i interMedia Text Reference. |
In pre-8.1.5, you synchronize the index using CTX_DML.SYNC. In addition, a ConText M server must be running.
You can update your index in batch mode by executing the ALTER INDEX command with the sync parameter. When you synchronize the index in batch mode, Oracle processes pending updates and inserts stored in the DML queue.
Because synchronizing an index in batch works on batches of inserts, updates and deletes, batch DML usually results in less index fragmentation than synchronizing the index immediately by running the ctxsrv daemon.
Note: No background ctxsrv server is required to synchronize an index in batch. If the ctxsrv daemon is running, it synchronizes the index immediately. |
See Also:
To learn more about the ALTER INDEX command syntax, see the Oracle8i interMedia Text Reference. |
The following example synchronizes the index with a runtime memory of 2 megabytes:
ALTER INDEX newsindex rebuild PARAMETERS('sync memory 2M');
In pre-8.1.5 a stoplist consisted of words that are not to be indexed. You recorded these words by calling CTX_DDL.SET_ATTRIBUTE for each stopword and then by creating a stoplist preference with CTX_DDL.CREATE_PREFERENCE.
Default stoplists in most of the supported languages are available. You manually set the stoplist fro your language.
By default, they system sets the default stoplist to the language you specify in your database setup. There is no need to create or set stoplists, unless you want to customize the list.
In addition to defining your own stopwords in 8.1.5, you can define stopthemes, which are themes that are not to be indexed. This is available for English only.
You can also specify that numbers are not to be indexed. A class of alphanumeric characters such a numbers that is not to be indexed is a stopclass.
You record your own stopwords, stopthemes, stopclasses by creating a single stoplist, to which you add the stopwords, stopthemes, and stopclasses. You specify the stoplist in the paramstring for CREATE INDEX.
In 8.1.5, you use the following procedures to manage stopwords, stopthemes, and stopclasses:
Defining document sections before you index enables you to query within the sections using the WITHIN operator. You define sections as part of a section group.
In pre-8.1.5, you create a section group and specify it in the Wordlist preference. You can create only user-defined zone sections and sentence and paragraph sections.
In 8.1.5, you create a section group and specify it in the paramstring for CREATE INDEX. To create a section group, use CTX_DDL.CREATE_SECTION_GROUP.
See Also:
to learn more about using CTX_DDL.CREATE_SECTION_GROUP, see its specification in the Oracle8i interMedia Text Reference. |
Within a section group, you can create three types of sections:
Zone sections (formerly known as user-defined sections in pre-8.1.5) are sections delimited by start and end tags. The <B> and </B> tags in HTML for instance, marks a range of words which are to be rendered in boldface.
Zone sections can be nested within one another, can overlap, and can occur more than once in a document.
You create zone sections as part of a section group with CTX_DDL.ADD_ZONE_SECTION.
See Also:
to learn more about using CTX_DDL.ADD_ZONE_SECTION, see its specification in the Oracle8i interMedia Text Reference. |
Field sections are new for 8.1.5. Field sections are delimited by start and end tags. By default, the text within field sections are indexed as a sub-document separate from the rest of the document.
Unlike zone sections, field sections cannot nest or overlap. As such, field sections are best suited for non-repeating, non-overlapping sections such as TITLE and AUTHOR sections in news type documents.
Because of how field sections are indexed, WITHIN queries on field sections are usually faster than WITHIN queries on zone sections.
You create a field section as part of a section group using CTX_DDL.ADD_FIELD_SECTION procedure.
See Also:
to learn more about using CTX_DDL.ADD_FIELD_SECTION, see its specification in the Oracle8i interMedia Text Reference. |
In 8.1.5, special sections are the same as paragraph and sentence sections in pre-8.1.5.
To create sentence and paragraph sections, use the CTX_DDL.ADD_SPECIAL_SECTION procedure.
See Also:
to learn more about using CTX_DDL.ADD_SPECIAL_SECTION, see its specification in the Oracle8i interMedia Text Reference. |