Oracle Text Application Developer's Guide
Release 9.0.1

Part Number A90122-01
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to beginning of chapter Go to next page

Introduction to Oracle Text , 2 of 8


What is Oracle Text?

Oracle Text is a tool that enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text.

Types of Query Applications

You can build two types of applications with Oracle Text:

Text Query Applications

The purpose of a text query application is to enable users to find text that contains one or more search terms. The text is usually a collection of documents. A good application can index and search common document formats such as HTML, XML, plain text, or Microsoft Word. For example, an application with a browser interface might enable users to query a company website consisting of HTML files, returning those files that match a query.

To build a text query application, you can create either a context or ctxcat index and query the index with CONTAINS or CATSEARCH respectively.

Document Classification Applications

A document classification application is one that classifies an incoming stream of documents based on its content. They are also know as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into categories such as politics, crime, and sports.

Oracle Text enables you to build these applications with the CTXRULE index type. This index type indexes the rules (queries) that define each class. When documents arrive, the MATCHES operator can be used to match each document with the rules that select it.


Note:

Oracle Text supports document classification for only plain text, XML, and HTML documents. 


See Also:

"Indexing Your Documents" in this chapter for more information about these index types. 

Supported Document Formats

For text query applications, Oracle Text supports most document formats for indexing and querying, including plain text, HTML and formatted documents such as Microsoft Word.

For document classification application, Oracle Text supports classifying plain text, HTML, and XML documents.

Theme Capabilities

With Oracle Text, you can search on document themes if your language is English and French. To do so, you use the ABOUT operator. For example, you can search for all documents that are about the concept politics. Documents returned might be about elections, governments, or foreign policy. The documents need not contain the word politics to score hits.

Theme information is derived from the supplied knowledge base, which is a hierarchical listing of categories and concepts. As the supplied knowledge base is a general view of the world, you can add to it new industry-specific concepts. With an augmented knowledge base, the system can process document themes more intelligently and so improve the accuracy of your theme searching.

With the supplied PL/SQL packages, you can also obtain document themes programatically.

See Also:

Oracle Text Reference to learn more about the ABOUT operator. 

Themes in Other Languages

You can enable theme capabilities such as ABOUT queries in other languages besides English and French by loading a language-specific knowledge base.

See Also:

Adding a Language-Specific Knowledge Base in Chapter 7, "Working With a Thesaurus"

Query Language and Operators

To query, you use the SQL SELECT statement. Depending on your index, you can query text with either the CONTAINS operator, which is used with the context index, or the CATSEARCH operator, which is used with the ctxcat index. You use these operators in the WHERE clause of the SELECT statement as follows:

SELECT SCORE(1) title from news WHERE CONTAINS(text, 'oracle', 1) > 0;

To classify single documents, you use the MATCHES operator with a ctxrule index.

For text querying with the CONTAINS operator, Oracle Text provides a rich query language with operators that enable you to issue variety of queries including simple word queries, ABOUT queries, logical queries, wildcard and thesaural expansion queries.

The CATSEARCH operator also supports some of the operations available with CONTAINS.

See Also:

Chapter 3, "Querying" 

Document Services and Using a Thesaurus

You can also use the supplied Oracle Text PL/SQL packages for advanced features such as document presentation and thesaurus maintenance.

See Also:

Chapter 7, "Working With a Thesaurus"

Chapter 4, "Document Presentation" 

Prerequisites For Building Your Query Application

To build an Oracle Text query application, you must have the following:

The following sections describe these prerequisites and also describe the main features of a generic text query application.


Go to previous page Go to beginning of chapter Go to next page
Oracle
Copyright © 1996-2001, Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback