4 Querying with Oracle Text

This chapter describes Oracle Text querying and associated features.

This chapter contains the following topics:

4.1 Overview of Queries

The basic Oracle Text query takes a query expression, usually a word with or without operators, as input. Oracle Text returns all documents (previously indexed) that satisfy the expression along with a relevance score for each document. Scores can be used to order the documents in the result set.

To enter an Oracle Text query, use the SQL SELECT statement. Depending on the type of index you create, you use either the CONTAINS or CATSEARCH operator in the WHERE clause. You can use these operators programatically wherever you can use the SELECT statement, such as in PL/SQL cursors.

Use the MATCHES operator to classify documents with a CTXRULE index.

4.1.1 Querying with CONTAINS

When you create an index of type CONTEXT, you must use the CONTAINS operator to enter your query. An index of type CONTEXT is suited for indexing collections of large coherent documents.

With the CONTAINS operator, you can use a number of operators to define your search criteria. These operators enable you to enter logical, proximity, fuzzy, stemming, thesaurus and wildcard searches. With a correctly configured index, you can also enter section searches on documents that have internal structure such as HTML and XML.

With CONTAINS, you can also use the ABOUT operator to search on document themes.

4.1.1.1 CONTAINS SQL Example

In the SELECT statement, specify the query in the WHERE clause with the CONTAINS operator. Also specify the SCORE operator to return the score of each hit in the hitlist. The following example shows how to enter a query:

SELECT SCORE(1), title from news WHERE CONTAINS(text, 'oracle', 1) > 0;

You can order the results from the highest scoring documents to the lowest scoring documents using the ORDER BY clause as follows:

SELECT SCORE(1), title from news 
           WHERE CONTAINS(text, 'oracle', 1) > 0
           ORDER BY SCORE(1) DESC;

The CONTAINS operator must always be followed by the > 0 syntax, which specifies that the score value returned by the CONTAINS operator must be greater than zero for the row to be returned.

When the SCORE operator is called in the SELECT statement, the CONTAINS operator must reference the score label value in the third parameter as in the previous example.

4.1.1.2 CONTAINS PL/SQL Example

In a PL/SQL application, you can use a cursor to fetch the results of the query.

The following example enters a CONTAINS query against the NEWS table to find all articles that contain the word oracle. The titles and scores of the first ten hits are output.

declare 
  rowno number := 0; 
begin 
  for c1 in (SELECT SCORE(1) score, title FROM news 
              WHERE CONTAINS(text, 'oracle', 1) > 0
              ORDER BY SCORE(1) DESC) 
  loop 
    rowno := rowno + 1; 
    dbms_output.put_line(c1.title||': '||c1.score); 
    exit when rowno = 10; 
  end loop; 
end; 

This example uses a cursor FOR loop to retrieve the first ten hits. An alias score is declared for the return value of the SCORE operator. The score and title are output to standard out using cursor dot notation.

4.1.1.3 Structured Query with CONTAINS

A structured query, also called a mixed query, is a query that has a CONTAINS predicate to query a text column and has another predicate to query a structured data column.

To enter a structured query, you specify the structured clause in the WHERE condition of the SELECT statement.

For example, the following SELECT statement returns all articles that contain the word oracle that were written on or after October 1, 1997:

SELECT SCORE(1), title, issue_date from news 
           WHERE CONTAINS(text, 'oracle', 1) > 0
           AND issue_date >= ('01-OCT-97') 
           ORDER BY SCORE(1) DESC;

Note:

Even though you can enter structured queries with CONTAINS, consider creating a CTXCAT index and issuing the query with CATSEARCH, which offers better structured query performance.

4.1.2 Querying with CATSEARCH

When you create an index of type CTXCAT, you must use the CATSEARCH operator to enter your query. An index of type CTXCAT is best suited when your application stores short text fragments in the text column and other associated information in related columns.

For example, an application serving an online auction site might have a table that stores item description in a text column and associated information such as date and price in other columns. With a CTXCAT index, you can create b-tree indexes on one or more of these columns. The result is that when you use the CATSEARCH operator to search a CTXCAT index, query performance is generally faster for mixed queries.

The operators available for CATSEARCH queries are limited to logical operations such as AND or OR. The operators you can use to define your structured criteria are greater than, less than, equality, BETWEEN, and IN.

4.1.2.1 CATSEARCH SQL Query

A typical query with CATSEARCH might include a structured clause as follows to find all rows that contain the word camera ordered by the bid_close date:

SELECT FROM auction WHERE CATSEARCH(title, 'camera', 'order by bid_close desc')> 0;

The type of structured query you can enter depends on how you create your sub-indexes.

As shown in the previous example, you specify the structured part of a CATSEARCH query with the third structured_query parameter. The columns you name in the structured expression must have a corresponding sub-index.

For example, assuming that category_id and bid_close have a sub-index in the ctxcat index for the AUCTION table, enter the following structured query:

SELECT FROM auction WHERE CATSEARCH(title, 'camera', 'category_id=99 order by bid_close desc')> 0;
4.1.2.2 CATSEARCH Example

The following example shows a field section search against a CTXCAT index using CONTEXT grammar by means of a query template in a CATSEARCH query.

-- Create and populate table
create table BOOKS (ID number, INFO varchar2(200), PUBDATE DATE);
 
insert into BOOKS values(1, '<author>NOAM CHOMSKY</author><subject>CIVIL
   RIGHTS</subject><language>ENGLISH</language><publisher>MIT
   PRESS</publisher>', '01-NOV-2003');
 
insert into BOOKS values(2, '<author>NICANOR PARRA</author><subject>POEMS 
  AND ANTIPOEMS</subject><language>SPANISH</language>
  <publisher>VASQUEZ</publisher>', '01-JAN-2001');
 
insert into BOOKS values(1, '<author>LUC SANTE</author><subject>XML
  DATABASE</subject><language>FRENCH</language><publisher>FREE
  PRESS</publisher>', '15-MAY-2002');
 
commit;
 
-- Create index set and section group
exec ctx_ddl.create_index_set('BOOK_INDEX_SET');
exec ctx_ddl.add_index('BOOK_INDEX_SET','PUBDATE');
 
exec ctx_ddl.create_section_group('BOOK_SECTION_GROUP',
      'BASIC_SECTION_GROUP');
exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','AUTHOR','AUTHOR');
exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','SUBJECT','SUBJECT');
exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','LANGUAGE','LANGUAGE');
exec ctx_ddl.add_field_section('BOOK_SECTION_GROUP','PUBLISHER','PUBLISHER'); 
 
-- Create index
create index books_index on books(info) indextype is ctxsys.ctxcat
  parameters('index set book_index_set section group book_section_group');
 
-- Use the index
-- Note that: even though CTXCAT index can be created with field sections, it
-- cannot be accessed using CTXCAT grammar (default for CATSEARCH).
-- We need to use query template with CONTEXT grammar to access field 
-- sections with CATSEARCH
 
select  id, info from books
where catsearch(info,
'<query>
      <textquery grammar="context">
              NOAM within author and english within language
      </textquery>
 </query>',
'order by pubdate')>0; 

4.1.3 Querying with MATCHES

When you create an index of type CTXRULE, you must use the MATCHES operator to classify your documents. The CTXRULE index is essentially an index on the set of queries that define your classifications.

For example, if you have an incoming stream of documents that need to be routed according to content, you can create a set of queries that define your categories. You create the queries as rows in a text column. It is possible to create this type of table with the CTX_CLS.TRAIN procedure.

You then index the table to create a CTXRULE index. When documents arrive, you use the MATCHES operator to classify each document

4.1.3.1 MATCHES SQL Query

A MATCHES query finds all rows in a query table that match a given document. Assuming that a table querytable has a CTXRULE index associated with it, enter the following query:

SELECT classification FROM querytable WHERE MATCHES(query_string,:doc_text) > 0;

Note the bind variable :doc_text which contains the document CLOB to be classified.

Combining everything into a simple example:

   create table queries (
      query_id      number,
      query_string  varchar2(80)
    );

    insert into queries values (1, 'oracle');
    insert into queries values (2, 'larry or ellison');
    insert into queries values (3, 'oracle and text');
    insert into queries values (4, 'market share');

    create index queryx on queries(query_string)
      indextype is ctxsys.ctxrule;

    select query_id from queries
     where matches(query_string, 
                   'Oracle announced that its market share in databases 
                    increased over the last year.')>0

This query will return queries 1 (the word oracle appears in the document) and 4 (the phrase market share appears in the document), but not 2 (neither the word larry nor the word ellison appears, and not 3 (there is no text in the document, so it does not match the query).

Note that, in this example, the document was passed in as a string for simplicity. Typically, your document would be passed in a bind variable.

The document text used in a matches query can be VARCHAR2 or CLOB. It does not accept BLOB input, so you cannot match filtered documents directly. Instead, you must filter the binary content to CLOB using the AUTO_FILTER filter. For the following example, we make two assumptions: one, that the document data is in the bind variable :doc_blob; and, two, that we have already defined a policy, my_policy, that CTX_DOC.POLICY_FILTER can use. For example:

  declare
    doc_text clob;
  begin
    -- create a temporary CLOB to hold the document text
    doc_text := dbms_lob.createtemporary(doc_text, TRUE, DBMS_LOB.SESSION);
 
    -- create a simple policy for this example
    ctx_ddl.create_preference(preference_name => 'fast_filter',
                        object_name       => 'AUTO_FILTER');
    ctx_ddl.set_attribute(preference_name => 'fast_filter',
                        attribute_name    => 'OUTPUT_FORMATTING',
                        attribute_value   => 'FALSE');
    ctx_ddl.create_policy(policy_name     => 'my_policy',
                        filter            => 'fast_filter);

    -- call ctx_doc.policy_filter to filter the BLOB to CLOB data
    ctx_doc.policy_filter('my_policy', :doc_blob, doc_text, FALSE);

    -- now do the matches query using the CLOB version
    for c1 in (select * from queries where matches(query_string, doc_text)>0)
    loop
      -- do what you need to do here
    end loop;

    dbms_lob.freetemporary(doc_text);
  end;

The procedure CTX_DOC.POLICY_FILTER filters the BLOB into the CLOB data, because you need to get the text into a CLOB to enter a MATCHES query. It takes as one argument the name of a policy you have already created with CTX_DDL.CREATE_POLICY.

See Also:

Oracle Text Reference for information on CTX_DOC.POLICY_FILTER

If your file is text in the database character set, then you can create a BFILE and load it to a CLOB using the function DBMS_LOB.LOADFROMFILE, or you can use UTL_FILE to read the file into a temp CLOB locator.

If your file needs AUTO_FILTER filtering, then you can load the file into a BLOB instead, and call CTX_DOC.POLICY_FILTER, as previously shown.

See Also:

Classifying Documents in Oracle Text for more extended classification examples

4.1.3.2 MATCHES PL/SQL Example

The following example assumes that the table of queries profiles has a CTXRULE index associated with it. It also assumes that the table newsfeed contains a set of news articles to be categorized.

This example loops through the newsfeed table, categorizing each article using the MATCHES operator. The results are stored in the results table.

PROMPT  Populate the category table based on newsfeed articles
PROMPT
set serveroutput on;
declare
  mypk   number;
  mytitle varchar2(1000);
  myarticles clob;
  mycategory varchar2(100);
  cursor doccur is select pk,title,articles from newsfeed;
  cursor mycur is  select category from profiles where matches(rule, myarticles)>0;  
  cursor rescur is select category, pk, title from results order by category,pk;

begin
  dbms_output.enable(1000000);
  open doccur;
  loop
    fetch doccur into mypk, mytitle, myarticles;
    exit when doccur%notfound;
    open mycur;
    loop
      fetch mycur into mycategory;
      exit when mycur%notfound;
      insert into results values(mycategory, mypk, mytitle);
    end loop;
    close mycur;
    commit;
  end loop;
  close doccur;
  commit;

end;

The following example displays the categorized articles by category.

PROMPT  display the list of articles for every category
PROMPT
set serveroutput on;

declare
  mypk   number;
  mytitle varchar2(1000);
  mycategory varchar2(100);
  cursor catcur is select category from profiles order by category;
  cursor rescur is select pk, title from results where category=mycategory order by pk;

begin
  dbms_output.enable(1000000);
  open catcur;
  loop
    fetch catcur into mycategory;
    exit when catcur%notfound;
    dbms_output.put_line('********** CATEGORY: '||mycategory||' *************');
open rescur;
    loop
      fetch rescur into mypk, mytitle;
      exit when rescur%notfound;
dbms_output.put_line('**  ('||mypk||'). '||mytitle);
    end loop;
    close rescur;
    dbms_output.put_line('**');
dbms_output.put_line('*******************************************************');
  end loop;
  close catcur; 
end;

See Also:

Classifying Documents in Oracle Text for more extended classification examples

4.1.4 Word and Phrase Queries

A word query is a query on a word or phrase. For example, to find all the rows in your text table that contain the word dog, enter a query specifying dog as your query term.

You can enter word queries with both CONTAINS and CATSEARCH SQL operators. However, phrase queries are interpreted differently.

4.1.4.1 CONTAINS Phrase Queries

If multiple words are contained in a query expression, separated only by blank spaces (no operators), the string of words is considered a phrase and Oracle Text searches for the entire string during a query.

For example, to find all documents that contain the phrase international law, enter your query with the phrase international law.

4.1.4.2 CATSEARCH Phrase Queries

With the CATSEARCH operator, the AND operator is inserted between words in phrases. For example, a query such as international law is interpreted as international AND law.

4.1.5 Querying Stopwords

Stopwords are words for which Oracle Text does not create an index entry. They are usually common words in your language that are unlikely to be searched on by themselves.

Oracle Text includes a default list of stopwords for your language. This list is called a stoplist. For example, in English, the words this and that are defined as stopwords in the default stoplist. You can modify the default stoplist or create new stoplists with the CTX_DDL package. You can also add stopwords after indexing with the ALTER INDEX statement.

You cannot query on a stopword by itself or on a phrase composed of only stopwords. For example, a query on the word this returns no hits when this is defined as a stopword.

You can query on phrases that contain stopwords as well as non-stopwords such as this boy talks to that girl. This is possible because the Oracle Text index records the position of stopwords even though it does not create an index entry for them.

When you include a stopword within your query phrase, the stopword matches any word. For example, the query:

'Jack was big'

matches phrases such as Jack is big and Jack grew big assuming was is a stopword. Note that this query matches grew, even though it is not a stopword.

Starting with Oracle Database 12c Release 2 (12.2), stopwords and any unary operators on stopwords are ignored at the initial stages of a query resulting in different query results than earlier releases. For example:

SQL> select count(1) from tabx where contains(text,'$the')>0; 
 . 
  COUNT(1) 
 ---------- 
        0

The above query does not return any documents as the is a stopword and $ operator along with the stopword is ignored during query processing.

SQL> select count(1) from tabx where contains(text,'first and $the')>0; 
 . 
  COUNT(1) 
 ---------- 
        2

The above query returns documents containing first as the is a stopword, so it is ignored along with the $ operator. No expansion is performed irrespective of the specified $ operator.

4.1.6 ABOUT Queries and Themes

An ABOUT query is a query on a document theme. A document theme is a concept that is sufficiently developed in the text. For example, an ABOUT query on US politics might return documents containing information about US presidential elections and US foreign policy. Documents need not contain the exact phrase US politics to be returned.

During indexing, document themes are derived from the knowledge base, which is a hierarchical list of categories and concepts that represents a view of the world. Some examples of themes in the knowledge catalog are concrete concepts such as jazz music, football, or Nelson Mandela. Themes can also be abstract concepts such as happiness or honesty.

During indexing, the system can also identify and index document themes that are sufficiently developed in the document, but do not exist in the knowledge base.

You can augment the knowledge base to define concepts and terms specific to your industry or query application. When you do so, ABOUT queries are more precise for the added concepts.

ABOUT queries perform best when you create a theme component in your index. Theme components are created by default for English and French.

Querying Stopthemes

Oracle Text enables you to query on themes with the ABOUT operator. A stoptheme is a theme that is not to be indexed. You can add and remove stopthemes with the CTX_DDL package. You can add stopthemes after indexing with the ALTER INDEX statement.

4.1.7 Query Expressions

A query expression is everything in between the single quotes in the text_query argument of the CONTAINS or CATSEARCH operator. What you can include in a query expression in a CONTAINS query is different from what you can include in a CATSEARCH operator.

4.1.7.1 CONTAINS Operators

A CONTAINS query expression can contain query operators that enable logical, proximity, thesaural, fuzzy, and wildcard searching. Querying with stored expressions is also possible. Within the query expression, you can use grouping characters to alter operator precedence. This book refers to these operators as the CONTEXT grammar.

With CONTAINS, you can also use the ABOUT query to query document themes.

See Also:

"The CONTEXT Grammar"

4.1.7.2 CATSEARCH Operator

With the CATSEARCH operator, you specify your query expression with the text_query argument and your optional structured criteria with the structured_query argument. The text_query argument enables you to query words and phrases. You can use logical operations, such as logical and, or, and not. This book refers to these operators as the CTXCAT grammar.

If you want to use the much richer set of operators supported by the CONTEXT grammar, you can use the query template feature with CATSEARCH.

With structured_query argument, you specify your structured criteria. You can use the following SQL operations:

  • =

  • <=

  • >=

  • >

  • <

  • IN

  • BETWEEN

You can also use ORDER BY clause to order your output.

See Also:

"The CTXCAT Grammar"

4.1.7.3 MATCHES Operator

Unlike CONTAINS and CATSEARCH, MATCHES does not take a query expression as input.

Instead, the MATCHES operator takes a document as input and finds all rows in a query (rule) table that match it. As such, you can use MATCHES to classify documents according to the rules they match.

See Also:

"Querying with MATCHES"

4.1.8 Case-Sensitive Searching

Oracle Text supports case-sensitivity for word and ABOUT queries.

4.1.8.1 Word Queries

Word queries are not case-insensitive by default. This means that a query on the term dog returns the rows in your text table that contain the word dog, but not Dog, or DOG.

You can enable or disable case-sensitive searching with the MIXED_CASE attribute in your BASIC_LEXER index preference. With a case-sensitive index, your queries must be entered in exact case. For example, a query on Dog matches only documents with Dog. Documents with dog or DOG are not returned as hits.

To enable case-insensitive searching, set the MIXED_CASE attribute in your BASIC_LEXER index preference to NO.

Stopwords and Case-Sensitivity

If you have case-sensitivity enabled for word queries and you enter a query on a phrase containing stopwords and non-stopwords, then you must specify the correct case for the stopwords. For example, a query on the dog does not return text that contains The Dog, assuming that the is a stopword.

4.1.8.2 ABOUT Queries

ABOUT queries give the best results when your query is formulated with proper case. This is because the normalization of your query is based on the knowledge catalog which is case-sensitive. Attention to case is required especially for words that have different meanings depending on case, such as turkey the bird and Turkey the country.

However, you need not enter your query in exact case to obtain relevant results from an ABOUT query. The system does its best to interpret your query. For example, if you enter a query of ORACLE and the system does not find this concept in the knowledge catalog, the system might use Oracle as a related concept for look-up.

4.1.9 Query Feedback

Feedback information provides broader term, narrower term, and related term information for a specified query with a context index. You obtain this information programatically with the CTX_QUERY.HFEEDBACK procedure.

Broader term, narrower term, and related term information is useful for suggesting other query terms to the user in your query application.

The feedback information returned is obtained from the knowledge base and contains only those terms that are also in the index. This increases the chances that terms returned from HFEEDBACK produce hits over the currently indexed document set.

See Also:

Oracle Text Reference for more information about using CTX_QUERY.HFEEDBACK

4.1.10 Query Explain Plan

Explain plan information provides a graphical representation of the parse tree for a CONTAINS query expression. You can obtain this information programatically with the CTX_QUERY.EXPLAIN procedure.

Explain plan information tells you how a query is expanded and parsed without having the system execute the query. Obtaining explain information is useful for knowing the expansion for a particular stem, wildcard, thesaurus, fuzzy, soundex, or ABOUT query. Parse trees also show the following information:

  • Order of execution

  • ABOUT query normalization

  • Query expression optimization

  • Stop-word transformations

  • Breakdown of composite-word tokens for supported languages

    See Also:

    Oracle Text Reference for more information about using CTX_QUERY.EXPLAIN

4.1.11 Using a Thesaurus in Queries

Oracle Text enables you to define a thesaurus for your query application.

Defining a custom thesaurus enables you to process queries more intelligently. Because users of your application might not know which words represent a topic, you can define synonyms or narrower terms for likely query terms. You can use the thesaurus operators to expand your query to include thesaurus terms.

4.1.12 About Document Section Searching

Section searching enables you to narrow text queries down to sections within documents.

Section searching can be implemented when your documents have internal structure, such as HTML and XML documents. For example, you can define a section for the <H1> tag that enables you to query within this section using the WITHIN operator.

You can set the system to automatically create sections from XML documents.

You can also define attribute sections to search attribute text in XML documents.

Note:

Section searching is supported for only word queries with a CONTEXT index.

4.1.13 Using Query Templates

Query templates are an alternative to the existing query languages. Rather than passing a query string to CONTAINS or CATSEARCH, you pass a structured document that contains the query string in a tagged element. Within this structured document, or query template, you can enable additional query features:

4.1.14 Query Rewrite

Query applications sometimes parse end user queries, interpreting a query string in one or more ways using different operator combinations. For example, if a user enters a query of kukui nut, your application might enter the queries {kukui nut} and {kukui or nut} to increase recall.

The query rewrite feature enables you to submit a single query that expands the original query into the rewritten versions. The results are returned with no duplication.

You specify your rewrite sequences with the query template feature. The rewritten versions of the query are executed efficiently with a single call to CONTAINS or CATSEARCH.

The following template defines a query rewrite sequence. The query of {kukui nut} is rewritten as follows:

{kukui} {nut}

{kukui} ; {nut}

{kukui} AND {nut}

{kukui} ACCUM {nut}

The query rewrite template for these transformations is as follows:

select id from docs where CONTAINS (text,
 '<query>
   <textquery lang="ENGLISH" grammar="CONTEXT"> kukui nut
     <progression>
       <seq><rewrite>transform((TOKENS, "{", "}", " "))</rewrite></seq>
       <seq><rewrite>transform((TOKENS, "{", "}", " ; "))</rewrite></seq>
       <seq><rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite></seq>
       <seq><rewrite>transform((TOKENS, "{", "}", "ACCUM"))</rewrite></seq>
     </progression>
   </textquery>
  <score datatype="INTEGER" algorithm="COUNT"/>
</query>')>0;

4.1.15 Query Relaxation

Query relaxation enables your application to execute the most restrictive version of a query first, progressively relaxing the query until the required number of hits are obtained.

For example, your application might search first on black pen and then the query is relaxed to black NEAR pen to obtain more hits.

The following query template defines a query relaxation sequence. The query of black pen is entered in sequence as the following:

{black} {pen}

{black} NEAR {pen}

{black} AND {pen}

{black} ACCUM {pen}

The query relaxation template for these transformations is as follows:

select id from docs where CONTAINS (text,
 '<query>
   <textquery lang="ENGLISH" grammar="CONTEXT">
     <progression>
       <seq>{black} {pen}</seq>
       <seq>{black} NEAR {pen}</seq>
       <seq>{black} AND {pen}</seq>
       <seq>{black} ACCUM {pen}</seq>
     </progression>
   </textquery>
   <score datatype="INTEGER" algorithm="COUNT"/>
</query>')>0;

Query hits are returned in this sequence with no duplication as long as the application needs results.

Query relaxation is most effective when your application needs the top-N hits to a query, which you can obtain with the DOMAIN_INDEX_SORT hint or in a PL/SQL cursor.

Using query templating to relax a query as such is more efficient than re-executing a query.

4.1.16 Query Language

When you use the MULTI_LEXER to index a column containing documents in different languages, you can specify which language lexer to use during querying. You do so using the lang parameter in the query template, which specifies the document-level lexer.

select id from docs where CONTAINS (text,
'<query><textquery lang="french">bon soir</textquery></query>')>0;

See Also:

Oracle Text Reference for information on LANGUAGE and lang with ALTER INDEX and document sublexer

4.1.17 Ordering By SDATA Sections

You can order the query results according to the content of SDATA sections using the <order> and <orderkey> elements of the query template.

In the following example, the first level of ordering is done on the SDATA section price, which is sorted in the ascending order. The second and third level of ordering is done by the SDATA section pub_date and score, both of which are sorted in the descending order.

select id from docs where CONTAINS (text, '
<query>
   <textquery lang="ENGLISH" grammar="CONTEXT"> Oracle </textquery>
   <score datatype="INTEGER" algorithm="COUNT"/>
   <order>
       <orderkey> SDATA(price) ASC </orderkey>
       <orderkey> SDATA(pub_date) DESC </orderKey>
       <orderkey> Score DESC </orderkey>
   </order>
</query>', 1)>0;

Note:

You can add additional SDATA sections to an index without completely rebuilding the index. Refer to the ADD SDATA SECTION parameter string under ALTER INDEX in Oracle Text Reference.

See Also:

Oracle Text Reference for syntax of <order> and <orderkey> elements of the query template

4.1.18 Alternative and User-defined Scoring

You can use query templating to specify alternative scoring algorithms to use, other than the default, to customize how CONTAINS is scored, and to enable SDATA to be used as part of the scoring expressions. In this way, you can mathematically define the scoring expression using not only pre-defined scoring components, but also SDATA components.

With alternative user-defined scoring, you can:

  • Specify the scoring expressions of terms by defining arithmetic expressions that define how the query should be scored, using

    • predefined scoring algorithms: DISCRETE, OCCURRENCE, RELEVANCE, and COMPLETION;

    • arithmetic operations: plus, minus, multiply, divide;

    • arithmetic functions: ABS(n), finding the absolute value of n ; LOG(n), finding the base-10 logarithmic value of n;

    • Numeric literals.

  • Specify the scoring expressions at the term level.

  • Specify terms that should not be taken into account when calculating the score.

  • Specify how the score from child elements of OR and AND operators should be merged.

  • Use SDATA that stores numeric or DATETIME values to affect the final score of the document.

The following example specifies an alternative scoring algorithm:

select id from docs where CONTAINS (text,
'<query>        
 <textquery grammar="CONTEXT" lang="english"> mustang  </textquery>     
 <score datatype="float" algorithm="DEFAULT"/>     
</query>')>0

The following query templating example includes SDATA values as part of the final score:

select id from docs where CONTAINS (text,
'<query>
<textquery grammar="CONTEXT" lang="english"> mustang </textquery>
<score datatype="float" algorithm="DEFAULT" normalization_expr ="doc_score+SDATA(price)"/>
</query>')>0"

4.1.19 Alternative Grammar

Query template enables you to use the CONTEXT grammar with CATSEARCH queries and vice-versa.

select id from docs where CONTAINS (text,
'<query> 
  <textquery grammar="CTXCAT">San Diego</textquery>
  <score datatype="integer"/>
</query>')>0;

4.1.20 Query Analysis

Oracle Text enables you to create a log of queries and to analyze the queries it contains. For example, suppose you have an application that searches a database of large animals, and your analysis of its queries shows that users are continually searching for the word mouse; this analysis might induce you to rewrite your application so that a search for mouse redirects the user to a database of small animals instead of simply returning an unsuccessful search.

With query analysis, you can find out

  • Which queries were made

  • Which queries were successful

  • Which queries were unsuccessful

  • How many times each query was made

You can combine these factors in various ways, such as determining the 50 most frequent unsuccessful queries made by your application.

You start query logging with CTX_OUTPUT.START_QUERY_LOG. The query log will contain all queries made to all context indexes that the program is using until a CTX_OUTPUT.END_QUERY_LOG procedure is entered. Use CTX_REPORT.QUERY_LOG_SUMMARY to get a report of queries made.

See Also:

Oracle Text Reference for syntax and examples for these procedures

4.1.21 Other Query Features

In your query application, you can use other query features such as proximity searching. Table 4-1 lists some of these features.

Table 4-1 Other Oracle Text Query Features

Feature Description Implement With

Case Sensitive Searching

Enables you to search on words or phrases exactly as entered in the query. For example, a search on Roman returns documents that contain Roman and not roman.

BASIC_LEXER when you create the index

Base Letter Conversion

Queries words with or without diacritical marks such as tildes, accents, and umlauts. For example, with a Spanish base-letter index, a query of energía matches documents containing both energía and energia.

BASIC_LEXER when you create the index

Word Decompounding

(German and Dutch)

Enables searching on words that contain specified term as sub-composite.

BASIC_LEXER when you create the index

Alternate Spelling

(German, Dutch, and Swedish)

Searches on alternate spellings of words.

BASIC_LEXER when you create the index

Proximity Searching

Searches for words near one another.

NEAR operator when you enter the query

Expanded operator containing the functionality of PHRASE, NEAR and AND operators.

Breaks a document into clumps based on the given query. Each clump is classified based on a primary feature, and scored based on secondary features. The final document score adds clump scores such that the ordering of primary features determines the initial ordering of document scores.

NEAR2 operator when you enter the query

Stemming

Searches for words with same root as specified term.

$ operator at when you enter the query

Fuzzy Searching

Searches for words that have similar spelling to specified term.

FUZZY operator when you enter the query

Query Explain Plan

Generates query parse information.

CTX_QUERY.EXPLAIN PL/SQL procedure after you index

Hierarchical Query Feedback

Generates broader term, narrower term and related term information for a query.

CTX_QUERY.HFEEDBACK PL/SQL procedure after you index.

Browse index

Browses the words around a seed word in the index.

CTX_QUERY.BROWSE_WORDS PL/SQL after you index.

Count hits

Counts the number of hits in a query.

CTX_QUERY.COUNT_HITS PL/SQL procedure after you index.

Stored Query Expression

Stores the text of a query expression for later reuse in another query.

CTX_QUERY.STORE_SQE PL/SQL procedure after you index.

Thesaural Queries

Uses a thesaurus to expand queries.

Thesaurus operators such as SYN and BT as well as the ABOUT operator.

Use CTX_THES package to maintain thesaurus.

4.2 The CONTEXT Grammar

The CONTEXT grammar is the default grammar for CONTAINS. With this grammar, you can add complexity to your searches with operators. You use the query operators in your query expression. For example, the logical operator AND enables you to search for all documents that contain two different words. The ABOUT operator enables you to search on concepts.

You can also use the WITHIN operator for section searching, the NEAR operator for proximity searches, the stem, fuzzy, and thesaural operators for expanding a query expression.

With CONTAINS, you can also use the CTXCAT grammar with the query template feature.

The following sections describe some of the Oracle Text operators.

See Also:

Oracle Text Reference for complete information about using query operators

4.2.1 ABOUT Query

Use the ABOUT operator in English or French to query on a concept. The query string is usually a concept or theme that represents the idea to be searched on. Oracle Text returns the documents that contain the theme.

Word information and theme information are combined into a single index. To enter a theme query, your index must have a theme component which is created by default in English and French.

Enter a theme query using the ABOUT operator inside the query expression. For example, to retrieve all documents that are about politics, write your query as follows:

SELECT SCORE(1), title FROM news 
           WHERE CONTAINS(text, 'about(politics)', 1) > 0
           ORDER BY SCORE(1) DESC;

See Also:

Oracle Text Reference for more information about using the ABOUT operator

4.2.2 Logical Operators

Logical operators such as AND or OR allow you to limit your search criteria in a number of ways. Table 4-2 describes some of these operators.

Table 4-2 Logical Operators

Operator Symbol Description Example Expression

AND

&

Use the AND operator to search for documents that contain at least one occurrence of each of the query terms.

Score returned is the minimum of the operands.

'cats AND dogs'
'cats & dogs'

OR

|

Use the OR operator to search for documents that contain at least one occurrence of any of the query terms.

Score returned is the maximum of the operands.

'cats | dogs'
'cats OR dogs'

NOT

~

Use the NOT operator to search for documents that contain one query term and not another.

To obtain the documents that contain the term animals but not dogs, use the following expression:

'animals ~ dogs'

ACCUM

,

Use the ACCUM operator to search for documents that contain at least one occurrence of any of the query terms. The accumulate operator ranks documents according to the total term weight of a document.

The following query returns all documents that contain the terms dogs, cats and puppies giving the highest scores to the documents that contain all three terms:

'dogs, cats, puppies'

EQUIV

=

Use the EQUIV operator to specify an acceptable substitution for a word in a query.

The following example returns all documents that contain either the phrase alsatians are big dogs or German shepherds are big dogs:

'German shepherds=alsatians are big dogs'

4.2.3 Section Searching and HTML and XML

Section searching is useful when your document set is HTML or XML. For HTML, you can define sections using embedded tags and then use the WITHIN operator to search these sections.

For XML, you can have the system automatically create sections for you. You can query with the WITHIN operator or with the INPATH operator for path searching.

4.2.4 Proximity Queries with NEAR, NEAR_ACCUM, and NEAR2 Operators

You can search for terms that are near to one another in a document with the NEAR operator.

For example, to find all the documents where dog is within 6 words of cat, enter the following query:

'near((dog, cat), 6)'

The NEAR operator is now modified to change how the distance is measured between phrases in NESTED NEAR.

The NEAR_ACCUM operator combines the functionality of the NEAR operator with that of the ACCUM operator. Like NEAR, it returns terms that are within a given proximity of each other; however, if one term is not found, it ranks documents according to the frequency of the occurrence of the term that is found.

The NEAR2 operator combines the functionality of PHRASE, NEAR, and AND operators. In addition, the NEAR2 operator can optionally use position information to boost the scores of its hits. That is, if one phrase hit occurs at the beginning of a document, and another at the end of the document, then a higher weight is given to the first hit as compared to the second hit.

See Also:

Oracle Text Reference for more information about using the NEAR, NEAR_ACCUM, and NEAR2 operators

4.2.5 Fuzzy, Stem, Soundex, Wildcard and Thesaurus Expansion Operators

You can expand your queries into longer word lists with operators such as wildcard, fuzzy, stem, soundex, and thesaurus.

See Also:

4.2.6 Using CTXCAT Grammar

You can use the CTXCAT grammar in CONTAINS queries. To do so, use a query template specification in the text_query parameter of CONTAINS.

You might take advantage of the CTXCAT grammar when you need an alternative and simpler query grammar.

See Also:

Oracle Text Reference for more information about using these operators

4.2.7 Stored Query Expressions

You can use the procedure CTX_QUERY.STORE_SQE to store the definition of a query without storing any results. Referencing the query with the CONTAINS SQL operator references the definition of the query. In this way, stored query expressions make it easy for defining long or frequently used query expressions.

Stored query expressions are not attached to an index. When you call CTX_QUERY.STORE_SQE, you specify only the name of the stored query expression and the query expression.

The query definitions are stored in the Text data dictionary. Any user can reference a stored query expression.

See Also:

Oracle Text Reference to learn more about the syntax of CTX_QUERY.STORE_SQE

4.2.7.1 Defining a Stored Query Expression

You define and use a stored query expression as follows:

  1. Call CTX_QUERY.STORE_SQE to store the queries for the text column. With STORE_SQE, you specify a name for the stored query expression and a query expression.
  2. Call the stored query expression in a query expression using the SQE operator. Oracle Text returns the results of the stored query expression in the same way it returns the results of a regular query. The query is evaluated at the time the stored query expression is called.

    You can delete a stored query expression using REMOVE_SQE.

4.2.7.2 SQE Example

The following example creates a stored query expression called disaster that searches for documents containing the words tornado, hurricane, or earthquake:

begin
ctx_query.store_sqe('disaster', 'tornado | hurricane | earthquake');
end;

To execute this query in an expression, write your query as follows:

SELECT SCORE(1), title from news 
   WHERE CONTAINS(text, 'SQE(disaster)', 1) > 0
   ORDER BY SCORE(1);

See Also:

Oracle Text Reference to learn more about the syntax of CTX_QUERY.STORE_SQE

4.2.8 Calling PL/SQL Functions in CONTAINS

You can call user-defined functions directly in the CONTAINS clause as long as the function satisfies the requirements for being named in a SQL statement. The caller must also have EXECUTE privilege on the function.

For example, assuming the function french returns the French equivalent of an English word, you can search on the French word for cat by writing:

SELECT SCORE(1), title from news 
   WHERE CONTAINS(text, french('cat'), 1) > 0
   ORDER BY SCORE(1);

See Also:

Oracle Database SQL Language Reference for more information about creating user functions and calling user functions from SQL

4.2.9 Optimizing for Response Time

A CONTAINS query optimized for response time provides a fast solution for when you need the highest scoring documents from a hitlist.

The following example returns the first twenty hits to standard out. This example uses the FIRST_ROWS(n) hint and a cursor.

declare 
cursor c is  
  select /*+ FIRST_ROWS(20) */ title, score(1) score 
    from news where contains(txt_col, 'dog', 1) > 0 order by score(1) desc; 
begin 
  for c1 in c 
  loop 
    dbms_output.put_line(c1.score||':'||substr(c1.title,1,50)); 
    exit when c%rowcount = 21; 
  end loop; 
end; 
/

Besides using query hints, there are other factors that can influence query response time such as:

4.2.10 Counting Hits

To count the number of hits returned from a query with only a CONTAINS predicate, you can use CTX_QUERY.COUNT_HITS in PL/SQL or COUNT(*) in a SQL SELECT statement.

If you want a rough hit count, you can use CTX_QUERY.COUNT_HITS in estimate mode (EXACT parameter set to FALSE). With respect to response time, this is the fastest count you can get.

To count the number of hits returned from a query that contains a structured predicate, use the COUNT(*) function in a SELECT statement.

4.2.10.1 SQL Count Hits Example

To find the number of documents that contain the word oracle, enter the query with the SQL COUNT function as follows:

SELECT count(*) FROM news WHERE CONTAINS(text, 'oracle', 1) > 0;
4.2.10.2 Counting Hits with a Structured Predicate

To find the number of documents returned by a query with a structured predicate, use COUNT(*) as follows:

SELECT COUNT(*) FROM news WHERE CONTAINS(text, 'oracle', 1) > 0 and author = 'jones';
4.2.10.3 PL/SQL Count Hits Example

To find the number of documents that contain the word oracle, use COUNT_HITS as follows:

declare count number;
begin
  count := ctx_query.count_hits(index_name => my_index, text_query => 'oracle', exact => TRUE);
 dbms_output.put_line('Number of docs with oracle:');
 dbms_output.put_line(count);
end;

See Also:

Oracle Text Reference to learn more about the syntax of CTX_QUERY.COUNT_HITS

4.2.11 Using DEFINESCORE and DEFINEMERGE for User-defined Scoring

The DEFINESCORE operator enables you to define how the score for a term or phrase is to be calculated. The DEFINEMERGE operator defines how to merge scores of child elements of AND and OR operators. You can also use the alternative scoring template with SDATA to affect the final scoring of the document.

See Also:

4.3 The CTXCAT Grammar

The CTXCAT grammar is the default grammar for CATSEARCH. This grammar supports logical operations such as AND and OR as well as phrase queries.

The CATSEARCH query operators have the following syntax:

Table 4-3 CATSEARCH Query Operator Syntax

Operation Syntax Description of Operation

Logical AND

a b c

Returns rows that contain a, b and c.

Logical OR

a | b | c

Returns rows that contain a, b, or c.

Logical NOT

a - b

Returns rows that contain a and not b.

hyphen with no space

a-b

Hyphen treated as a regular character.

For example, if the hyphen is defined as skipjoin, words such as web-site treated as the single query term website.

Likewise, if the hyphen is defined as a printjoin, words such as web-site treated as web site with the space in the CTXCAT query language.

" "

"a b c"

Returns rows that contain the phrase "a b c".

For example, entering "Sony CD Player" means return all rows that contain this sequence of words.

( )

(A B) | C

Parentheses group operations. This query is equivalent to the CONTAINS query (A &B) | C.

You can use the CONTEXT grammar in CATSEARCH queries. To do so, use a query template specification in the text_query parameter.

You might use the CONTAINS grammar as such when you need to enter proximity, thesaurus, or ABOUT queries with a CTXCAT index.

See Also:

Oracle Text Reference for more information about using these operators