Oracle Ultra Search provides a default Query Syntax Expansion implementation
The code for this implementation is contained in the WK_QUERYEXP PL/SQL
package. It can be viewed in the $ORACLE_HOME/ultrasearch /admin/wk0queryexp.pkb
file on the Oracle Server host.
The purpose of this document is to provide you with an understanding
of how you can customize the Query Syntax Expansion implementation to
suit your organization's preferences.
Default Query Syntax Expansion Implementation
The default Query Syntax Expansion implementation directly affects
the following.
- The way the end-user enters a query string (which we refer to as
the "End-User Query Syntax")
- The way the documents matching the query are scored (which we simply
refer to as "Scoring")
- The way the end-user's query string is transformed into an Oracle
Text query string (which we refer to as the "Expansion Rules")
(1) End-User Query Syntax
The end-user query syntax defined by the default Query Syntax Expansion
implementation is similar to the standard text query syntax employed
by most search engines on the web.
Token
A token is defined as a string enclosed in double-quotes (").
It may be a single word or a phrase.
Operators
The default implementation defines 3 operators. They are the [+],
[-] and [*] operators. It must be emphasized that these operators
are defined by the default implementation. You have full flexibility
of changing these operators to whatever you prefer in your own custom
implementation.
Plus operator [+] The plus operator specifies that the token
immediately following it must appear in all documents included in
the search result.
Minus operator [-] The minus operator specifies that the token
immediately following it cannot appear in any document included in
the search result.
Asterisk [*] The asterisk operator specifies a wildcard search.
It matches zero or more characters. A token starting with the asterisk
is ignored. The asterisk can only be specified at the end (right side)
or middle of a token. For example, "hel*o" and "hell*"
use the asterisk correctly but "*ello" is illegal.
Summary
The following table summarizes the rules that govern the Ultra Search
end-user query syntax:
Note: All end-user query strings are encased in square braces. For
example, the end user query string Oracle Applications is notated
as [Oracle Applications].
Rule |
Description |
Single word search |
Entering one word will find documents that
contain that word. For example, searching for [Oracle] will
find all documents that contain the word "Oracle" anywhere
in that document. |
Multiple word search |
Entering more than one word will find
documents that each contain any of those words in any order. For
example, searching for [Oracle Applications] will find documents
that contain "Oracle" or "Applications" or "Oracle
Applications". |
Compulsory inclusion [+] |
Attaching a [+] in front of a word
requires that the word must be found in all matching documents.
For example, searching for [Oracle + Applications] will only
find documents that contain the word "Applications". Note:
in a multiple word search, you can attach a [+] in front
of every token including the very first token. |
Compulsory exclusion [-] |
Attaching a [-] in front of a word requires that the word
must not be found in all matching documents. For example,
searching for [Oracle - Applications] will only find documents
that do not contain the word "Applications". Note: in
a multiple word search, you can attach a [-] in front of
every token except the very first token.
|
Phrase Matching ["..."] |
Putting quotes around a set of words will
only find documents that contain that precise phrase. For example,
searching for ["Oracle Applications"] will retrieve
only documents that contain the string "Oracle Applications".
|
Wildcard Matching [*] |
Attaching a [*] to the right-hand
side of a word will return left side partial matches. For example,
searching for the string [Ora*] will retrieve documents that
contain all words beginning with "Ora" such as "Oracle"
and "Orator". You can also insert an asterisk in the middle
of a word. For example, searching for the string [A*e] will retrieve
documents that contain words such as "Apple", "Ate", "Ape" etc.
Wildcard Matching is requires more computational processing power
and is generally slower than other types of queries. |
(2) Scoring
There are three ways documents are matched against an end-user query
string. These three ways are referred to as scoring "classes".
Documents are scored and ranked higher if they satisfy the requirements
for a higher class. Within each class, documents are also ranked differently
depending on how well they match the conditions of that scoring class.
Class 1 is the most heavily weighted class. The score is derived from
the number of occurrences of a precise phrase in a document. A document
that has more instances of the precise phrase will be have a higher
score than another document that has fewer occurrences of the precise
phrase.
Class 2 is the next more heavily weighted class. In this class, the
closer the tokens appear in a document, the higher the score becomes.
For example, an end-user query string [Oracle Applications Financials]
may result in 3 documents found. None of the three documents contain
the precise phrase "Oracle Applications Financials". However,
document X contains the all three tokens "Oracle", "Applications"
and "Financials" in the same sentence separated by
other words. Document Y, contains the individual tokens in the same
paragraph but in different sentences. Document Z contains the same
three tokens but each token resides in different paragraphs. In this
scenario, document X has the highest score because the tokens are closest
together. Likewise, Y has a higher score than Z.
Class 3 is the least weighted class. A document that has more tokens
gets a higher score. For example, an end-user query string [Oracle
Applications Financials] may result in 3 documents found. Document
X might contain all three tokens. Document Y, may contain the tokens
"Oracle" and "Applications" only. Document Z may
only contain the token "Oracle". In this scenario, document
X has a higher score than Y. Likewise, Y has a higher score than Z.
(3) Expansion Rules
As mentioned earlier, the end-user query is expanded to an Oracle Text
query. The expanded query string rules are captured in BNF (Backus Naur
Form) notation. Again, these rules are the rules that Ultra Search uses
as a default Query Syntax Expansion implementation.
Rules
First, we will state the rules that define an expanded query:
<expanded query> ::= (<expression> within <title section>)*2, <expression>
<expression> ::= <generic query expression> | <simple query expression>
<generic query expression> ::= (([ <plus expression>*100 & ]) (<main expression>)) [ <minus expression> ]
<simple query expression> ::= (<phrase expression>)*2, (<main expression>)
<main expression> ::= (<near expression>)*2, (<accum expression>)
Then, we define some terms and their meanings which explain some of
the terms used in the rules above:
A <plus expression> is an AND expression of all plus tokens.
A <minus expression> is a NOT expression of all minus tokens.
A <phrase expression> is a PHRASE formed by all tokens in the <main expression>
A <near expression> is a NEAR expression of all tokens but minus tokens.
An <accum expression> is an ACCUMULATE expression of all tokens but minus tokens.
A <simple query expression> is used only when the end-user query
has multiple tokens and does not have any operator or a double-quote.
Otherwise, a <generic query expression> is used.
If there is no token that is neither plus token nor minus token,
the <plus expression> and the <accum expression> are eliminated.
Examples of applying the rules
The following table illustrates how the Default Query Syntax Expansion
implementation converts end-user query strings to Oracle Context compatible
query strings.
End-user
query string |
Expanded
query string understandable by Oracle Text |
[Oracle] |
((({Oracle}) within TITLE__31)*2,({Oracle}))
|
[Oracle + Applications] |
((((({Applications})*10)*10&(({Oracle};{Applications})*2,({Oracle},{Applications
}))) within TITLE__31)*2,((({Applications})*10)*10&(({Oracle};{Applications})*2,
({Oracle},{Applications}))))
|
[Oracle - Applications] |
(((({Oracle})~{Applications}) within TITLE__31)*2,(({Oracle})~{Applications}))
|
["Oracle Applications"] |
((({Oracle Applications}) within TITLE__31)*2,({Oracle Applications}))
|
[Ora*] |
((((Ora%)) within TITLE__31)*2,((Ora%)))
|
[Oracle Applications] |
(((({Oracle Applications})*2,(({Oracle};{Applications})*2,({Oracle},{Application
s}))) within TITLE__31)*2,(({Oracle Applications})*2,(({Oracle};{Applications})*
2,({Oracle},{Applications}))))
|
Customizing the rules
You can customize this expansion to suit your organization's purposes
by defining and implementing your own Query Syntax Expansion. To do
so, you will need to understand the requirements of Oracle Text queries.
Explaining the details of Oracle Text queries is beyond the scope
of this document. Please refer to the Oracle9i Text Application
Developer's Guide and Oracle9i Text Reference to understand
the requirements of Oracle Text queries. After you have understood
how Oracle Text queries work, you can proceed to the next section
on what you must do to implement your customized expansion rules.
Customizing the rules
Assuming that you have read the Oracle9i Text Application Developer's
Guide and Oracle9i Text Reference documents, you are now
ready to customize Ultra Search to use your very own implementation
of the Query Syntax Expansion.
To facilitate this, Ultra Search allows you to modify the WK_QUERYEXP
package. In that package are defined two PL/SQL functions which you
must edit. The functions are expand_main and expand_attr. Expand_main
is applied to the query string entered by the end-user. Expand_attr
is applied to each search attribute specified in an advanced search.
The return value of each expand_attr function is appended to the return
value of the expand_main function. This resultant query string is what's
given to Oracle Text to query on.
The expand_main function
This procedure takes the query string entered in the basic search
box or advanced search box and converts it to an Oracle Text query
string according to your custom Query Syntax Expansion implementation
rules.
CREATE OR REPLACE FUNCTION expand_main(query varchar2)
RETURN varchar2
AS
newqry varchar2(4000);
BEGIN
newqry := <Convert the input query string into an
Oracle Text query string according to
your custom rules>
return newqry;
END;
The expand_attr function
This procedure is applied to each search attribute in an Advanced
Search. It takes each attribute and converts it to an Oracle Text
query string according to your custom Query Syntax Expansion implementation
rules.
CREATE OR REPLACE FUNCTION expand_attr(query varchar2)
RETURN varchar2
AS
newqry varchar2(4000);
BEGIN
newqry := <Convert a search attribute into an
Oracle Text query string according to
your custom rules>
return newqry;
END;
Important notes
- All customized functions are instance specific and should be defined
in the schema of the Ultra Search instance user.
- Make sure that they are executed with definers-rights.
Example of combining the expand_main return value with the expand_attr
values
The following example illustrates how the Default Query Syntax Expansion
implementation converts the end-user query string Oracle Applications
to an Oracle Context compatible query string. The additional clause
added by the introduction of the two search attributes is highlighted
in bold.
End-user query string |
Expanded query string understandable
by Oracle Text |
[Oracle Applications] |
(((({Oracle Applications})*2,(({Oracle};{Applications})*2,({Oracle},{Application
s}))) within TITLE__31)*2,(({Oracle Applications})*2,(({Oracle};{Applications})*
2,({Oracle},{Applications}))))
|
[Oracle Applications] with the
Title attribute restricted to "MyTitle" and the Author
attribute restricted to "MyAuthor" |
((((({Oracle Applications})*2,(({Oracle};{Applications})*2,({Oracle},{Applicatio
ns}))) within TITLE__31)*2,(({Oracle Applications})*2,(({Oracle};{Applications})
*2,({Oracle},{Applications})))))&(((({MyTitle}) WITHIN TITLE__31)&(({MyAuthor})
WITHIN AUTHOR__32))*10)*10
|
|