Oracle Text Application Developer's Guide Release 9.0.1 Part Number A90122-01 |
|
Introduction to Oracle Text , 2 of 8
Oracle Text is a tool that enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text.
You can build two types of applications with Oracle Text:
The purpose of a text query application is to enable users to find text that contains one or more search terms. The text is usually a collection of documents. A good application can index and search common document formats such as HTML, XML, plain text, or Microsoft Word. For example, an application with a browser interface might enable users to query a company website consisting of HTML files, returning those files that match a query.
To build a text query application, you can create either a context
or ctxcat
index and query the index with CONTAINS or CATSEARCH respectively.
A document classification application is one that classifies an incoming stream of documents based on its content. They are also know as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into categories such as politics, crime, and sports.
Oracle Text enables you to build these applications with the CTXRULE index type. This index type indexes the rules (queries) that define each class. When documents arrive, the MATCHES operator can be used to match each document with the rules that select it.
For text query applications, Oracle Text supports most document formats for indexing and querying, including plain text, HTML and formatted documents such as Microsoft Word.
For document classification application, Oracle Text supports classifying plain text, HTML, and XML documents.
With Oracle Text, you can search on document themes if your language is English and French. To do so, you use the ABOUT operator. For example, you can search for all documents that are about the concept politics. Documents returned might be about elections, governments, or foreign policy. The documents need not contain the word politics to score hits.
Theme information is derived from the supplied knowledge base, which is a hierarchical listing of categories and concepts. As the supplied knowledge base is a general view of the world, you can add to it new industry-specific concepts. With an augmented knowledge base, the system can process document themes more intelligently and so improve the accuracy of your theme searching.
With the supplied PL/SQL packages, you can also obtain document themes programatically.
You can enable theme capabilities such as ABOUT queries in other languages besides English and French by loading a language-specific knowledge base.
To query, you use the SQL SELECT statement. Depending on your index, you can query text with either the CONTAINS operator, which is used with the context index, or the CATSEARCH operator, which is used with the ctxcat index. You use these operators in the WHERE clause of the SELECT statement as follows:
SELECT SCORE(1) title from news WHERE CONTAINS(text, 'oracle', 1) > 0;
To classify single documents, you use the MATCHES operator with a ctxrule index.
For text querying with the CONTAINS operator, Oracle Text provides a rich query language with operators that enable you to issue variety of queries including simple word queries, ABOUT queries, logical queries, wildcard and thesaural expansion queries.
The CATSEARCH operator also supports some of the operations available with CONTAINS.
You can also use the supplied Oracle Text PL/SQL packages for advanced features such as document presentation and thesaurus maintenance.
To build an Oracle Text query application, you must have the following:
The following sections describe these prerequisites and also describe the main features of a generic text query application.
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|