Table of Contents

Customizing the Query Syntax Expansion

Oracle Ultra Search uses the Oracle Text engine to index and search documents. When an end-user specifies a certain query string, Oracle Ultra Search takes that string and transforms it into an Oracle Text query expression. This process is called "Query Syntax Expansion".

  Related Topics

Oracle Ultra Search provides a default Query Syntax Expansion implementation The code for this implementation is contained in the WK_QUERYEXP PL/SQL package. It can be viewed in the $ORACLE_HOME/ultrasearch /admin/wk0queryexp.pkb file on the Oracle Server host.

The purpose of this document is to provide you with an understanding of how you can customize the Query Syntax Expansion implementation to suit your organization's preferences.

Default Query Syntax Expansion Implementation

The default Query Syntax Expansion implementation directly affects the following.

  1. The way the end-user enters a query string (which we refer to as the "End-User Query Syntax")
  2. The way the documents matching the query are scored (which we simply refer to as "Scoring")
  3. The way the end-user's query string is transformed into an Oracle Text query string (which we refer to as the "Expansion Rules")

(1) End-User Query Syntax

The end-user query syntax defined by the default Query Syntax Expansion implementation is similar to the standard text query syntax employed by most search engines on the web.

Token

A token is defined as a string enclosed in double-quotes ("). It may be a single word or a phrase.

Operators

The default implementation defines 3 operators. They are the [+], [-] and [*] operators. It must be emphasized that these operators are defined by the default implementation. You have full flexibility of changing these operators to whatever you prefer in your own custom implementation.

Plus operator [+] The plus operator specifies that the token immediately following it must appear in all documents included in the search result.

Minus operator [-] The minus operator specifies that the token immediately following it cannot appear in any document included in the search result.

Asterisk [*] The asterisk operator specifies a wildcard search. It matches zero or more characters. A token starting with the asterisk is ignored. The asterisk can only be specified at the end (right side) or middle of a token. For example, "hel*o" and "hell*" use the asterisk correctly but "*ello" is illegal.

Summary

The following table summarizes the rules that govern the Ultra Search end-user query syntax:

Note: All end-user query strings are encased in square braces. For example, the end user query string Oracle Applications is notated as [Oracle Applications].

Rule Description
Single word search Entering one word will find documents that contain that word. For example, searching for [Oracle] will find all documents that contain the word "Oracle" anywhere in that document.
Multiple word search Entering more than one word will find documents that each contain any of those words in any order. For example, searching for [Oracle Applications] will find documents that contain "Oracle" or "Applications" or "Oracle Applications".
Compulsory inclusion [+] Attaching a [+] in front of a word requires that the word must be found in all matching documents. For example, searching for [Oracle + Applications] will only find documents that contain the word "Applications". Note: in a multiple word search, you can attach a [+] in front of every token including the very first token.
Compulsory exclusion [-]

Attaching a [-] in front of a word requires that the word must not be found in all matching documents. For example, searching for [Oracle - Applications] will only find documents that do not contain the word "Applications". Note: in a multiple word search, you can attach a [-] in front of every token except the very first token.

Phrase Matching ["..."] Putting quotes around a set of words will only find documents that contain that precise phrase. For example, searching for ["Oracle Applications"] will retrieve only documents that contain the string "Oracle Applications".
Wildcard Matching [*] Attaching a [*] to the right-hand side of a word will return left side partial matches. For example, searching for the string [Ora*] will retrieve documents that contain all words beginning with "Ora" such as "Oracle" and "Orator". You can also insert an asterisk in the middle of a word. For example, searching for the string [A*e] will retrieve documents that contain words such as "Apple", "Ate", "Ape" etc. Wildcard Matching is requires more computational processing power and is generally slower than other types of queries.

(2) Scoring

There are three ways documents are matched against an end-user query string. These three ways are referred to as scoring "classes". Documents are scored and ranked higher if they satisfy the requirements for a higher class. Within each class, documents are also ranked differently depending on how well they match the conditions of that scoring class.

Class 1 is the most heavily weighted class. The score is derived from the number of occurrences of a precise phrase in a document. A document that has more instances of the precise phrase will be have a higher score than another document that has fewer occurrences of the precise phrase.

Class 2 is the next more heavily weighted class. In this class, the closer the tokens appear in a document, the higher the score becomes. For example, an end-user query string [Oracle Applications Financials] may result in 3 documents found. None of the three documents contain the precise phrase "Oracle Applications Financials". However, document X contains the all three tokens "Oracle", "Applications" and "Financials" in the same sentence separated by other words. Document Y, contains the individual tokens in the same paragraph but in different sentences. Document Z contains the same three tokens but each token resides in different paragraphs. In this scenario, document X has the highest score because the tokens are closest together. Likewise, Y has a higher score than Z.

Class 3 is the least weighted class. A document that has more tokens gets a higher score. For example, an end-user query string [Oracle Applications Financials] may result in 3 documents found. Document X might contain all three tokens. Document Y, may contain the tokens "Oracle" and "Applications" only. Document Z may only contain the token "Oracle". In this scenario, document X has a higher score than Y. Likewise, Y has a higher score than Z.

(3) Expansion Rules

As mentioned earlier, the end-user query is expanded to an Oracle Text query. The expanded query string rules are captured in BNF (Backus Naur Form) notation. Again, these rules are the rules that Ultra Search uses as a default Query Syntax Expansion implementation.

Rules

First, we will state the rules that define an expanded query:

<expanded query> ::= (<expression> within <title section>)*2, <expression>

<expression> ::= <generic query expression> | <simple query expression>

<generic query expression> ::= (([ <plus expression>*100 & ]) (<main expression>)) [ <minus expression> ]

<simple query expression> ::= (<phrase expression>)*2, (<main expression>)

<main expression> ::= (<near expression>)*2, (<accum expression>)

Then, we define some terms and their meanings which explain some of the terms used in the rules above:

A <plus expression> is an AND expression of all plus tokens.

A <minus expression> is a NOT expression of all minus tokens.

A <phrase expression> is a PHRASE formed by all tokens in the <main expression>

A <near expression> is a NEAR expression of all tokens but minus tokens.

An <accum expression> is an ACCUMULATE expression of all tokens but minus tokens.

A <simple query expression> is used only when the end-user query 
has multiple tokens and does not have any operator or a double-quote.

Otherwise, a <generic query expression> is used. 
If there is no token that is neither plus token nor minus token, 
the <plus expression> and the <accum expression> are eliminated.       
      

Examples of applying the rules

The following table illustrates how the Default Query Syntax Expansion implementation converts end-user query strings to Oracle Context compatible query strings.

End-user query string Expanded query string understandable by Oracle Text
[Oracle]
((({Oracle}) within TITLE__31)*2,({Oracle})) 
[Oracle + Applications]
((((({Applications})*10)*10&(({Oracle};{Applications})*2,({Oracle},{Applications
}))) within TITLE__31)*2,((({Applications})*10)*10&(({Oracle};{Applications})*2,
({Oracle},{Applications}))))   
[Oracle - Applications]
(((({Oracle})~{Applications}) within TITLE__31)*2,(({Oracle})~{Applications}))
["Oracle Applications"]
((({Oracle Applications}) within TITLE__31)*2,({Oracle Applications})) 
[Ora*]
((((Ora%)) within TITLE__31)*2,((Ora%)))
[Oracle Applications]
(((({Oracle Applications})*2,(({Oracle};{Applications})*2,({Oracle},{Application
s}))) within TITLE__31)*2,(({Oracle Applications})*2,(({Oracle};{Applications})*
2,({Oracle},{Applications}))))    

Customizing the rules

You can customize this expansion to suit your organization's purposes by defining and implementing your own Query Syntax Expansion. To do so, you will need to understand the requirements of Oracle Text queries. Explaining the details of Oracle Text queries is beyond the scope of this document. Please refer to the Oracle9i Text Application Developer's Guide and Oracle9i Text Reference to understand the requirements of Oracle Text queries. After you have understood how Oracle Text queries work, you can proceed to the next section on what you must do to implement your customized expansion rules.

Customizing the rules

Assuming that you have read the Oracle9i Text Application Developer's Guide and Oracle9i Text Reference documents, you are now ready to customize Ultra Search to use your very own implementation of the Query Syntax Expansion.

To facilitate this, Ultra Search allows you to modify the WK_QUERYEXP package. In that package are defined two PL/SQL functions which you must edit. The functions are expand_main and expand_attr. Expand_main is applied to the query string entered by the end-user. Expand_attr is applied to each search attribute specified in an advanced search. The return value of each expand_attr function is appended to the return value of the expand_main function. This resultant query string is what's given to Oracle Text to query on.

The expand_main function

This procedure takes the query string entered in the basic search box or advanced search box and converts it to an Oracle Text query string according to your custom Query Syntax Expansion implementation rules.

CREATE OR REPLACE FUNCTION expand_main(query varchar2)
RETURN varchar2
AS
  newqry varchar2(4000);
BEGIN
  newqry := <Convert the input query string into an 
             Oracle Text query string according to 
             your custom rules>
  return newqry;  
END;

The expand_attr function

This procedure is applied to each search attribute in an Advanced Search. It takes each attribute and converts it to an Oracle Text query string according to your custom Query Syntax Expansion implementation rules.

CREATE OR REPLACE FUNCTION expand_attr(query varchar2)
RETURN varchar2
AS
  newqry varchar2(4000);
BEGIN
  newqry := <Convert a search attribute into an 
             Oracle Text query string according to 
             your custom rules>
  return newqry;
END;

Important notes

  1. All customized functions are instance specific and should be defined in the schema of the Ultra Search instance user.
  2. Make sure that they are executed with definers-rights.

Example of combining the expand_main return value with the expand_attr values

The following example illustrates how the Default Query Syntax Expansion implementation converts the end-user query string Oracle Applications to an Oracle Context compatible query string. The additional clause added by the introduction of the two search attributes is highlighted in bold.

End-user query string Expanded query string understandable by Oracle Text
[Oracle Applications]
(((({Oracle Applications})*2,(({Oracle};{Applications})*2,({Oracle},{Application
s}))) within TITLE__31)*2,(({Oracle Applications})*2,(({Oracle};{Applications})*
2,({Oracle},{Applications}))))    
[Oracle Applications] with the Title attribute restricted to "MyTitle" and the Author attribute restricted to "MyAuthor"
((((({Oracle Applications})*2,(({Oracle};{Applications})*2,({Oracle},{Applicatio
ns}))) within TITLE__31)*2,(({Oracle Applications})*2,(({Oracle};{Applications})
*2,({Oracle},{Applications})))))&(((({MyTitle}) WITHIN TITLE__31)&(({MyAuthor})
WITHIN AUTHOR__32))*10)*10