Oracle8i interMedia Text Reference
Release 8.1.5

A67843-01

Library

Product

Contents

Index

Prev Next

1
Introduction to interMedia Text

This chapter introduces the main features of Oracle8i interMedia Text (iMT). It is provided to help you get started with indexing, querying, and document presentation.

The following topics are covered:

Overview

The goal of this chapter is to introduce the main features of interMedia Text as it pertains to designing a query application. The sections that follow describe out-of-box default behavior mainly.

The general steps for enabling Text queries in a query application are the following:

  1. Load the text

  2. Index the text

  3. Issue queries

  4. Present the documents that satisfy a query

The sections that follow describe how Oracle8i interMedia text enables you to achieve these steps.

System-Defined Roles

Oracle8i interMedia Text provides the following two roles for system administrators and application developers:

CTXSYS Role

The CTXSYS role enables users to do the following

CTXAPP Role

The CTXAPP role enables users to do the following:

Loading Documents

The default indexing behavior expects documents loaded in a text column.


Note:

Even though the system expects documents to be loaded in a text column, you can also store your documents in other ways, including the file system and as a URL.

For more information about data storage, see "Datastore Objects" in Chapter 3.  


Column Types

By default, the system expects your documents to be loaded in a text column. Your text column can be VARCHAR2, CLOB, BLOB, CHAR or BFILE.


Note:

Storing data in the deprecated column types of LONG and LONG RAW is supported only for migrating Oracle7 systems to Oracle8.

The column types NCLOB, DATE and NUMBER cannot be indexed.  


Document Formats

Because the system can index most document formats including HTML, PDF, Microsoft Word, and plain text, you can load any of these document types into the text column.

See Also:

For more information about the supported document formats, see Appendix C, "Supported Filter Formats".  

Loading Methods

Oracle enables you to load data using various methods, including

Indexing Text

Once your text is loaded in a text column, you can run the command to create a Text index.

For example, the following command creates a Text index called myindex on the text column in the docs table:

create index myindex on docs(text) indextype is ctxsys.context;

General Defaults for All Languages

When you use CREATE INDEX without explicitly specifying parameters, the system does the following for all languages by default:

Of course, you can change the default indexing behavior by creating your own preferences and specifying these custom preferences in the parameter string of CREATE INDEX.

See Also:

To learn more about creating your own custom preferences, see Chapter 3, "Indexing".

See also CTX_DDL.CREATE_PREFERENCE in Chapter 7.

To learn more about using CREATE INDEX, see its specification in Chapter 2.  

Language Specific Defaults

English

In addition to the general defaults, the system enables the following option for English language text:

Other Languages

By default, the following features are enabled:

Index Maintenance

Index maintenance is necessary after your application inserts, updates, or deletes documents in your base table.

If your base table is static, that is, you do no updating, inserting or deleting of documents after your initial index, you do not need to maintain your index.

However, if you perform DML (inserts, updates, or deletes) on your base table, you must update your index. You can synchronize your index manually with ALTER INDEX. You can also run the ctxsrv server in the background which synchronizes the index automatically at regular intervals.

See Also:

For more information about synchronizing the index, see ALTER INDEX in Chapter 2.

For more information about ctxsrv, see "ctxsrv" in Chapter 11.  

Querying

You issue Text queries using the CONTAINS operator in a SELECT statement. With CONTAINS, you can issue two types of queries:

Word Query Example

A word query is a query on the exact word or phrase you enter between the single quotes in the CONTAINS operator.

The following example finds all the documents in the text column that contain the word oracle. The score for each row is selected with the SCORE operator using a label of 1:

SELECT SCORE(1) title from news 
           WHERE CONTAINS(text, 'oracle', 1) > 0;

In your query expression, you can use text operators such as AND and OR to achieve different results. You can also add structured predicates to the WHERE clause.

See Also:

For more information about the different operators you can use in queries, see Chapter 4, "Query Operators".  

You can count the hits to a query using count(*), CTX_QUERY.COUNT_HITS, or CTX_QUERY.EXPLAIN.

ABOUT Query Example

In all languages, ABOUT queries increases the number of relevant documents returned by a query.

In English, ABOUT queries can use the theme component of the index, which is created by default. As such, this operator returns documents based on the concepts of your query, not only the exact word or phrase you specify.

For example, the following query finds all the documents in the text column that are about the subject politics, not just the documents that contain the word politics:

SELECT SCORE(1) title from news 
           WHERE CONTAINS(text, 'about(politics)', 1) > 0;

See Also:

For more information about the ABOUT operator, see ABOUT in Chapter 4, "Query Operators".  

Other Query Features

In your query application, you can use other query features. The following table lists some of these features and shows where to look in this book for more information.

Feature   Where to Find More Information  

Section Searching  

Chapter 7, "CTX_DDL Package" for defining sections.

WITHIN Operator in Chapter 4 for queries.  

Proximity Searching  

NEAR (;) Operator in Chapter 4.  

Stem and Fuzzy Searching  

stem ($) and fuzzy (?) Operators in Chapter 4.  

Thesaural Queries  

Chapter 4, "Query Operators" for using thesaurus operators in queries.

Chapter 10, "CTX_THES Package" for browsing a thesaurus.

"ctxload" in Chapter 11 for loading thesauri.  

Case Sensitive Searching

Base Letter Conversion

Word Decompounding (German and Dutch)

Alternate Spelling (German, Dutch, and Swedish)  

"Lexer Objects" in Chapter 3 for enabling these features.  

Optimizing Queries for Response Time  

Appendix A, "Working with the Extensible Query Optimizer"  

Query Explain Plan  

CTX_QUERY.EXPLAIN procedure in Chapter 9.  

Hierarchical Query Feedback  

CTX_QUERY.HFEEDBACK procedure in Chapter 9.  

Document Presentation and Highlighting

Typically, a Text query application allows the user to view the documents returned by a query. The user selects a document from the hitlist and then your application presents the document in some form.

With interMedia Text, you can render a document in different ways. For example, you can present documents with query terms highlighted. Highlighted query terms can be either the words of a word query or the themes of an ABOUT query in English.

Table 1-1 describes the different output you can obtain and which procedure to use to obtain each type:

Table 1-1
Output  Procedure 

Highlighted document, plain text version  

CTX_DOC.MARKUP  

Highlighted document, HTML version  

CTX_DOC.MARKUP  

Highlight offset information for plain text version  

CTX_DOC.HIGHLIGHT  

Highlight offset information for HTML version  

CTX_DOC.HIGHLIGHT  

Plain text version, no highlights  

CTX_DOC.FILTER  

HTML version of document, no highlights  

CTX_DOC.FILTER  

See Also:

For more information about these procedures, see Chapter 8, "CTX_DOC Package".  




Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index