Oracle Text Application Developer's Guide
Release 9.0.1

Part Number A90122-01
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to beginning of chapter Go to next page

Querying, 2 of 6


Overview of Queries

The basic Oracle Text query takes a query expression, usually a word with or without operators, as input. Oracle returns all documents (previously indexed) that satisfy the expression along with a relevance score for each document. Scores can be used to order the documents in the result set.

To issue an Oracle Text query, use the SQL SELECT statement with either the CONTAINS or CATSEARCH operator. You can use these operators programatically wherever you can use the SELECT statement, such as in PL/SQL cursors.

Use the MATCHES operator to classify documents with a CTXRULE index.

Querying with CONTAINS

When you create an index of type context, you must use the CONTAINS operator to issue your query. An index of type context is suited for indexing collections of large coherent documents.

With the CONTAINS operator, you can use a number of operators to define your search criteria. These operators enable you to issue logical, proximity, fuzzy, stemming, thesaurus and wildcard searches. With a correctly configured index, you can also issue section searches on documents that have internal structure such as HTML and XML.

With CONTAINS, you can also use the ABOUT operator to search on document themes.

CONTAINS SQL Example

In the SELECT statement, specify the query in the WHERE clause with the CONTAINS operator. Also specify the SCORE operator to return the score of each hit in the hitlist. The following example shows how to issue a query:

SELECT SCORE(1) title from news WHERE CONTAINS(text, 'oracle', 1) > 0;

You can order the results from the highest scoring documents to the lowest scoring documents using the ORDER BY clause as follows:

SELECT SCORE(1), title from news 
           WHERE CONTAINS(text, 'oracle', 1) > 0
           ORDER BY SCORE(1) DESC;

CONTAINS PL/SQL Example

In a PL/SQL application, you can use a cursor to fetch the results of the query.

The following example issues a CONTAINS query against the NEWS table to find all articles that contain the word oracle. The titles and scores of the first ten hits are output.

declare 
  rowno number := 0; 
begin 
  for c1 in (SELECT SCORE(1) score, title FROM news 
              WHERE CONTAINS(text, 'oracle', 1) > 0
              ORDER BY SCORE(1) DESC) 
  loop 
    rowno := rowno + 1; 
    dbms_output.put_line(c1.title||': '||c1.score); 
    exit when rowno = 10; 
  end loop; 
end; 

This example uses a cursor FOR loop to retrieve the first ten hits. An alias score is declared for the return value of the SCORE operator. The score and title are output to standard out using cursor dot notation.

Structured Query with CONTAINS

A structured query, also called a mixed query, is a query that has a CONTAINS predicate to query a text column and has another predicate to query a structured data column.

To issue a structured query, you specify the structured clause in the WHERE condition of the SELECT statement.

For example, the following SELECT statement returns all articles that contain the word oracle that were written on or after October 1, 1997:

SELECT SCORE(1), title, issue_date from news 
           WHERE CONTAINS(text, 'oracle', 1) > 0
           AND issue_date >= ('01-OCT-97') 
           ORDER BY SCORE(1) DESC;


Note:

Even though you can issue structured queries with CONTAINS, consider creating a ctxcat index and issuing the query with CATSEARCH, which offers better structured query performance. 


Querying with CATSEARCH

When you create an index of type ctxcat, you must use the CATSEARCH operator to issue your query. An index of type ctxcat is best suited when your application stores short text fragments in the text column and other associated information in related columns.

For example, an application serving an online auction site might have a table that stores item description in a text column and associated information such as date and price in other columns. With a ctxcat index, you can create b-tree indexes on one or more of these columns. The result is that when you use the CATSEARCH operator to search a ctxcat index, query performance is generally faster for mixed queries.

The operators available for CATSEARCH queries are limited to logical operations such as AND or OR. The operators you can use to define your structured criteria are greater than, less than, equality, BETWEEN, and IN.

CATSEARCH SQL Query

A typical query with CATSEARCH might include a structured clause as follows to find all rows that contain the word camera ordered by the bid_close date:

SELECT FROM auction WHERE CATSEARCH(title, 'camera', 'order by bid_close desc')> 
0;

The type of structured query you can issue depends on how you create your sub-indexes.

See Also:

"Creating a CTXCAT Index" in Chapter 2, "Indexing"

CATSEARCH Structured Query

You specify the structured part of a CATSEARCH query with the structured_query parameter. The columns you name in the structured expression must have a corresponding sub-index.

For example, assuming that category_id and bid_close have a sub-index in the ctxcat index for the AUCTION table, you can issue the following structured query:

SELECT FROM auction WHERE CATSEARCH(title, 'camera', 'category_id=99 order by 
bid_close desc')> 0;

CATSEARCH PL/SQL Example

You can use a cursor to process the output of a CATSEARCH query as you do for CONTAINS.

Querying with MATCHES

When you create an index of type CTXRULE, you must use the MATCHES operator to classify your documents. The CTXRULE index is essentially an index on the set of queries that define your classifications.

For example, if you have an incoming stream of documents that need to be routed according to content, you can create a set of queries that define your categories. You create the queries as rows in a text column. You then index the table to create a CTXRULE index. When documents arrive, you use the MATCHES operator to classify each document.

MATCHES SQL Query

A matches query finds all rows in a query table that match a given document. Assuming that a table querytable has a CTXRULE index associated with it, you can issue the following query:

SELECT classification FROM querytable WHERE MATCHES(text, 'Smith is a common 
name in the United States') > 0;

MATCHES PL/SQL Example

The following example assumes that the table of queries tdrbrn0101 has a CTXRULE index associated with it. It also assumes that the table newsfeed contains a set of news articles to be categorized.

This example loops through the newsfeed table, categorizing each article using the MATCHES operator. The results are stored in the tdrbrn0102 table.

PROMPT  Populate the category table based on newsfeed articles
PROMPT
set serveroutput on;
declare
  mypk   number;
  mytitle varchar2(1000);
  myarticles clob;
  mycategory varchar2(100);
  cursor doccur is select pk,title,articles from newsfeed;
  cursor mycur is  select category from tdrbrn0101 where matches(rule, 
myarticles)>0;  
  cursor rescur is select category, pk, title from tdrbrn0102 order by 
category,pk;

begin
  dbms_output.enable(1000000);
  open doccur;
  loop
    fetch doccur into mypk, mytitle, myarticles;
    exit when doccur%notfound;
    open mycur;
    loop
      fetch mycur into mycategory;
      exit when mycur%notfound;
      insert into tdrbrn0102 values(mycategory, mypk, mytitle);
    end loop;
    close mycur;
    commit;
  end loop;
  close doccur;
  commit;

end;
/

The following example displays the categorized articles by category.

PROMPT  display the list of articles for every category
PROMPT

declare
  mypk   number;
  mytitle varchar2(1000);
  mycategory varchar2(100);
  cursor catcur is select category from tdrbrn0101 order by category;
  cursor rescur is select pk, title from tdrbrn0102 where category=mycategory 
order by pk;

begin
  dbms_output.enable(1000000);
  open catcur;
  loop
    fetch catcur into mycategory;
    exit when catcur%notfound;
    dbms_output.put_line('********** CATEGORY: '||mycategory||' *************');
open rescur;
    loop
      fetch rescur into mypk, mytitle;
      exit when rescur%notfound;
dbms_output.put_line('**  ('||mypk||'). '||mytitle);
    end loop;
    close rescur;
    dbms_output.put_line('**');
    dbms_output.put_
line('*******************************************************');
  end loop;
  close catcur; 
end;
/

Word and Phrase Queries

A word query is a query on a word or phrase. For example, to find all the rows in your text table that contain the word dog, you issue a query specifying dog as your query term.

You can issue word queries with both CONTAINS and CATSEARCH SQL operators.

If multiple words are contained in a query expression, separated only by blank spaces (no operators), the string of words is considered a phrase and Oracle searches for the entire string during a query.

For example, to find all documents that contain the phrase international law, you issue your query with the phrase international law.

Querying Stopwords

Stopwords are words for which Oracle does not create an index entry. They are usually common words in your language that are unlikely to be searched on by themselves.

Oracle Text includes a default list of stopwords for your language. This list is called a stoplist. For example, in English, the words this and that are defined as stopwords in the default stoplist. You can modify the default stoplist or create new stoplists with the CTX_DDL package. You can also add stopwords after indexing with the ALTER INDEX statement.

You cannot query on a stopword by itself or on a phrase composed of only stopwords. For example, a query on the word this returns no hits when this is defined as a stopword.

You can query on phrases that contain stopwords as well as non-stopwords such as this boy talks to that girl. This is possible because the Oracle Text index records the position of stopwords even though it does not create an index entry for them.

When you include a stopword within your query phrase, the stopword matches any word. For example, the query:

'Jack was big'

matches phrases such as Jack is big and Jack grew big assuming was is a stopword.

ABOUT Queries and Themes

An ABOUT query is a query on a document theme. A document theme is a concept that is sufficiently developed in the text. For example, an ABOUT query on US politics might return documents containing information about US presidential elections and US foreign policy. Documents need not contain the exact phrase US politics to be returned.

During indexing, document themes are derived from the knowledge base, which is a hierarchical list of categories and concepts that represents a view of the world. Some examples of themes in the knowledge catalog are concrete concepts such as jazz music, football, or Nelson Mandela. Themes can also be abstract concepts such as happiness or honesty.

During indexing, the system can also identify and index document themes that are sufficiently developed in the document, but do not exist in the knowledge base.

You can augment the knowledge base to define concepts and terms specific to your industry or query application. When you do so, ABOUT queries are more precise for the added concepts.

ABOUT queries perform best when you create a theme component in your index. Theme components are created by default for English and French.


Note:

ABOUT queries are supported with only the CONTAINS operator. 


Querying Stopthemes

Oracle enables you to query on themes with the ABOUT operator. A stoptheme is a theme that is not to be indexed. You can add and remove stopthemes with the CTX_DLL package. You can add stopthemes after indexing with the ALTER INDEX statement.

Query Expressions

A query expression is everything in between the single quotes in the text_query argument of the CONTAINS or CATSEARCH operator. What you can include in a query expression in a CONTAINS query is different from what you can include in a CATSEARCH operator.

CONTAINS Operator

A CONTAINS query expression can contain query operators that enable logical, proximity, thesaural, fuzzy, and wildcard searching. Querying with stored expressions is also possible. Within the query expression, you can use grouping characters to alter operator precedence.

With CONTAINS, you can also use the ABOUT query to query document themes.

See Also:

"Query Operators for CONTAINS" in this chapter. 

CATSEARCH Operator

With the CATSEARCH operator, you specify your query expression with the text_query operator and your optional structured criteria with the structured_query argument. The text_query argument is limited to querying words and phrases. You can use logical operations, such as logical and, or, and not.

With structured_query argument, you specify your structured criteria. You can use the following SQL operations:

You can also use ORDER BY clause to order your output.

See Also:

"Query Operators for CATSEARCH" in this chapter. 

MATCHES Operator

The MATCHES operator takes a document as input and finds all rows in a query table that match it. You do not specify query expressions in the MATCHES operator.

Case-Sensitive Searching

Oracle Text supports case-sensitivity for word and ABOUT queries.

Word Queries

Word queries are case-insensitive by default. This means that a query on the term dog returns the rows in your text table that contain the word dog, Dog, or DOG.

You can enable case-sensitive searching by enabling the mixed_case attribute in your BASIC_LEXER index preference. With a case-sensitive index, your queries must be issued in exact case. This means that a query on Dog matches only documents with Dog. Documents with dog or DOG are not returned as hits.

Stopwords and Case-Sensitivity

If you have case-sensitivity enabled for word queries and you issue a query on a phrase containing stopwords and non-stopwords, you must specify the correct case for the stopwords. For example, a query on this boy talks to that girl does not return text that contains the phrase This boy talks to that girl, assuming this is a stopword.

ABOUT Queries

ABOUT queries give the best results when your query is formulated with proper case. This is because the normalization of your query is based on the knowledge catalog which is case-sensitive. Attention to case is required especially for words that have different meanings depending on case, such as turkey the bird and Turkey the country.

However, you need not enter your query in exact case to obtain relevant results from an ABOUT query. The system does its best to interpret your query. For example, if you enter a query of ORACLE and the system does not find this concept in the knowledge catalog, the system might use Oracle as a related concept for look-up.

Query Feedback

Feedback information provides broader term, narrower term, and related term information for a specified query with a context index. You obtain this information programatically with the CTX_QUERY.HFEEDBACK procedure.

Broader term, narrower term, and related term information is useful for suggesting other query terms to the user in your query application.

The feedback information returned is obtained from the knowledge base and contains only those terms that are also in the index. This increases the chances that terms returned from HFEEDBACK produce hits over the currently indexed document set.

See Also:

Oracle Text Reference for more information about using CTX_QUERY.HFEEDBACK 

Query Explain Plan

Explain plan information provides a graphical representation of the parse tree for a CONTAINS query expression. You can obtain this information programatically with the CTX_QUERY.EXPLAIN procedure.

Explain plan information tells you how a query is expanded and parsed without having the system execute the query. Obtaining explain information is useful for knowing the expansion for a particular stem, wildcard, thesaurus, fuzzy, soundex, or ABOUT query. Parse trees also show the following information:


Go to previous page Go to beginning of chapter Go to next page
Oracle
Copyright © 1996-2001, Oracle Corporation.

All Rights Reserved.
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback