Administrator Guide

     Previous  Next    Open TOC in new window  Open Index in new window  View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

Default Behavior of Search Service

This appendix describes the default behavior of the portal searches. This information is available to users through online help. This appendix includes the following sections:

 


Types of Search

The portal provides basic and advanced search tools for typical and advanced users, respectively. The fundamental search syntax and behavior are the same in basic and advanced search, but basic search adds automatic broadening, ranking features, and syntax correction. The following table specifies the search type implemented in the search tools available through different areas of the portal.

Table E-1 Portal Areas
Portal Area
Search Type
Description
Banner search
Basic
Searches the following portal objects: banner fields, the Knowledge Directory, portlets, communities, users, Collaboration items, and Publisher items.
Advanced search
Advanced
Allows composition of complex queries on specific document or object properties. Allows searches on date fields as well as text fields. Allows restriction to specific object type. Advanced search also enables searching of all (or any combination of) indexable portal objects, including many which are not searched in banner search, such as content crawlers, jobs, and Web services.
Federated Search
n/a
Federated search allows you to query multiple search Web services and receive collated results. Portal search can be included as one of the search services. The portal search option from this page behaves similarly to basic search, except only documents in the Knowledge Directory are searched. Spell correction, Best Bets, and other customizations made with the Search Results Manager do not apply.
Object selection
Basic
Search functionality enables end users to search for portlets when adding portlets to pages or search for communities when joining communities.
Administrative object search
Basic
Administrators can search the Administrative Objects Directory, optionally filtering by folder and object type. Search for specific kinds of portal objects is also integrated into the creation of various kinds of administrative objects. For instance, when creating a remote content crawler, the administrator is presented with the option of searching the available content source objects.
Filters
Advanced
Allows you to create an advanced search query that documents must match to be allowed into a particular folder in the Knowledge Directory.
Snapshot Query
Advanced
A search query that allows you to specify conditions for searching portal objects and, optionally, display the results in a Content Snapshot Portlet and/or e-mail the results to users. You can limit your search by language, object type, folder, property, and text conditions.

 


Search Syntax

This section describes the expected behavior of supported search syntax. It includes the following topics:

Operator Modes

The Search Service parses queries to determine which of the following operator modes to use for the query:

The following table summarizes the behavior of operators.

Table E-2 Search Operators
Operator
Meaning
Alternate
<AND>
Boolean operator that connects terms that must both match the items returned.
AND,
'&'(ampersand)
<OR>
Boolean operator that connects terms in which either can match the items returned.
OR, ACCRUE, ANY, '|'(vertical bar), ','(comma)
<NOT>
Items must not match the term.
NOT, AND NOT
<NEAR>
Terms must occur within N words of each other, regardless of order.
NEAR, <NEAR/25>
<ORDER>
First term must precede the second term.
 
<WORD>
Turns off stemming or alternate case, requiring exact spelling.
 
<PHRASE>
Terms must appear as sequential terms in a phrase.
Surround terms in " (double quotes)
<SENTENCE>
Same as <NEAR/10>.
 
<PARAGRAPH>
Same as <NEAR/50>.
 
+(plus)
Term must appear in the items returned.
 
-(minus)
Term must not appear in the items returned.
 
*(asterisk)
The wildcard specifies that the result must match 0 or more characters at the beginning or end of a word.
 

There are certain circumstances in which a user can unintentionally invoke a more advanced search mode by inadvertently using operators. Examples include the following queries:

Table E-3 Example Queries
Query
Equivalent to...
The young and the restless
"the young" <AND> "the restless"
File not found
to file <AND> <NOT> found
Error -217439239
to Error <AND> <NOT> 217439239

In each of these examples, enclosing the query in double quotes yields the desired effect.

Precedence and Parentheses

The Internet Style mode operators '+' and '-' take precedence over the other search operators. For example, +big dog <order> cat matches all documents that contain the term big, boosting the ranking of any documents that contain any of the three terms dog, or cat.

Within query operators mode, the operators have the following precedence classes, from greatest to least:

Parentheses can be used to override operator precedence. The following two queries are equivalent (the parentheses do not effect the semantics of the search).

This search matches documents that meet one of two conditions:

On the other hand, the parentheses in the following query override the default operator precedence:

a and b near (c or d)

This search matches documents containing the terms a and b and either c or d, where b is in close proximity to c or d.

Punctuation

Punctuation is treated specially. The following rules describe the interpretation of punctuation characters.

Case Sensitivity

All searches are case-insensitive, except when the <WORD> operator is used.

Table E-4 Case Sensitivity Examples
Query
Matches
BEA
Items containing BEA, bea, or any other case variant.
"Search Service"
Items containing the phrase Search Service or any other case variant.
<WORD> BEA
Items containing BEA, but not bea or Bea.

Stemming

Word stemming is applied to all individual terms in the search query, except within quoted phrases, or when the <WORD> operator is used. The stemming of query terms means that a query term will match documents containing morphological variants of that term. For example, a search for dogs AND go would match a document containing the terms dog and went. (This example applies to English; stemming employs language-specific information and depends on the user's locale and the language used to index the document.)

Wildcards

The wildcard operator is used to search for prefixes, suffixes, and substrings of indexed terms. Wildcards cannot be used within quoted phrases.

Table E-5 Wildcard Examples
Search Type
Query
Matches
prefix
cat*
Finds all documents containing terms that start with cat, such as caterpillar.
suffix
*cat
Finds all documents with terms that end in cat, such as tomcat.
substring
*cat*
Finds all documents with terms that contain cat, such as tomcats. Mid-string wildcard expressions must contain at least three characters (for example, *abc* is legal but *bc* is not).

Terms generated by wildcard expansion are not stemmed.

Wildcard expansion is performed internally by replacing each pattern with a limited list of terms that match the pattern before actually executing the query. Very broad wildcard expressions might therefore return a partial list of results.

Quoted Phrases

A quoted phrase in the user search query matches only documents that contain the given sequence of terms. For instance, a search for "big dog" will not match a document that contains the terms big and dog if it does not contain the phrase big dog. Stemming is not applied to terms within a quoted phrase. Also, wildcards cannot be used within quoted phrases.

Thesaurus Expansion

If thesaurus expansion is enabled, then thesaurus expansion is applied to the individual terms in a basic search. Thesaurus expansion is applied in all three search modes (Internet Style, Query Operators, and Bag of Words). Thesaurus expansion is not applied to quoted phrases. If a term is expanded by a thesaurus entry, then it is not eligible for automatic spelling correction.

Unlike automatic spell correction, which is applied only as a fallback when the non-corrected terms do not match any documents, if thesaurus expansion is enabled, then it is always applied to all individual search terms.

Search Language

Documents and portal objects are indexed with a language setting that determines how word breaking and stemming are applied. When a user issues a search query, word breaking and stemming are applied according to the user account locale settings. Search results are best when the language used for the search matches the language of the documents being searched. However, searches are normally applied to documents in all languages. Cross-language searches do not benefit from localized stemming and word breaking, but can still return useful results.

The advanced search page offers the ability to restrict searches to a particular language.

When searching portal content via the Search box in the portal banner, the text of the query is processed using the language setting of the user interface. If the portal interface is German, the query is tokenized and stemmed using German language rules, providing optimal search results for documents indexed using German linguistic rules.

If the search collection contains documents in other languages, you can still retrieve them with a query using the appropriate text (assuming the user interface permits entry of the necessary characters). Typing English words into the search box of a portal using a German interface applies German linguistic rules to the query text. Because English stemming is not used, the query is not able to match alternate English word forms; however, English language documents containing the entered words are retrieved.

Although you can enter Asian language text into a European language search box (if a compatible character encoding is used), you should limit the text to a single word or manually separate words with white space to be able to match Asian content in the search collection.

The Advanced Search page provides additional functionality for searching in a multi-language document collection. A pop-up list allows the user to select the language to use for query processing. Linguistic rules for tokenizing and stemming the selected language are used when processing the query text. Among other things, this means that Asian text can be entered without unnecessary white space.

The query operators recognized by Simple Search and Advanced Search are sensitive to the language setting. For example, the AND operator can be specified as "UND" when the query is processed as German. Localized operators are available for the following languages: English, Danish, Dutch, Finnish, French, German, Italian, Norwegian (Bokmal), Norwegian (Nynorsk), Portuguese, and Spanish. All other languages use English operators.

Examples

The descriptions of searches below do not include any of the query expansion or ranking techniques that are employed in basic search. Except where otherwise noted, all matches are case-insensitive.

Table E-6 Query Examples
Query
Expected Behavior
Dog
Searches for documents containing any stem variant of Dog.
<WORD> Dog
Searches for documents containing Dog as specified exactly with no stemming or lowercasing. This is the only case-sensitive form of search.
Big <PHRASE> Dog
Searches for documents containing the exact phrase big dog without stemming.
"Big Dog"
Same as Big <PHRASE> Dog.
cat AND dog
Searches for documents containing stem variants of cat and dog. Equivalent to cat <AND> dog.
cat <ALL> dog
Same as cat AND dog.
cat OR dog
Searches for documents containing stem variants of cat or dog.
cat, dog
Same as cat OR dog.
cat <ANY> dog
Same as cat OR dog.
cat <ACCRUE> dog
Same as cat OR dog.
cat NOT dog
Searches for documents containing stem variants of cat but not containing stem variants of dog.
cat AND NOT dog
Same as cat NOT dog.
cat NEAR dog
Finds stem variants of cat occurring near dog (default is within 25 words).
cat NEAR/15 dog
Finds stem variants of cat within 15 words of dog.
cat <ORDER><NEAR/15> dog
Finds stem variants of cat within 15 words before dog. Can also use more convenient syntax cat <ORDER NEAR/15> dog.
cat <ORDER> dog
Finds stem variants of cat anywhere before dog.
cat <SENTENCE> dog
Finds stem variants of cat within 10 words of dog.
cat <PARAGRAPH> dog
Finds stem variants of cat within 50 words of dog.
cat <XYZ> dog
Finds stem variants of cat and dog. The unsupported operator XYZ is ignored.
cat*
Finds all documents containing terms that start with cat, such as caterpillar.
*cat
Finds all documents with terms that end in cat such as tomcat.
*cat*
Finds all documents with terms that contain cat such as tomcats. Mid-string wildcard expressions must contain at least three characters (for example, *abc* is legal but *bc* is not).
dog *
Finds documents containing stem variants of dog. The singleton wildcard is treated as stray punctuation.
dog cat bird
Finds documents containing stem variants of all three terms, dog, cat, and bird. (Bag of Words mode)
big dog AND bird
Finds documents containing the phrase big dog, and stem variants of the term bird. (Query Operators mode with implicit phrase construction)
dog cat +bird
Finds documents containing stem variants of bird. The rank is boosted for documents containing stem variants of dog or cat. The words dog and cat are not joined into a phrase in Internet Style mode.
+dog -cat bird
Finds documents that contain stem variants of dog but do not contain stem variants of cat, and ranks documents with both dog and bird highest.
bird -cat
Finds documents that contain stem variants of bird but do not contain stem variants of cat.
bag-of-words
Searches for documents containing stem variants of the three terms: bag, of, and words. Punctuation marks are treated as spaces when quotation marks are not present.
"Mr. Jones"
Searches for the phrase mr. jones. Punctuation marks are considered part of the search string if they are included within quoted phrases.

 


Results Ranking

Search results are ranked according to relevance, by default. The following topics in this section describe the factors that determine relevance:

Term Frequency

The number of times a query term (or its stemmed and case variant forms) appears in a searchable item has a large influence on the relevance ranking of the item. All other things being equal, items which contain more instances of a query term will rank higher than items containing fewer instances. This is known as term frequency based ranking.

Metadata (field) Weighting

Basic searches are performed across several document fields, and some fields are weighted higher than other fields, so that, for instance, a match on an object name ranks higher than a match on an object description. By default, the fields searched are name, description, and full-text content. For information on modifying or adding field weights, see Modifying the Properties Searched and the Relevance Weight for Properties.

Phrases and Proximity

In basic search, Bag of Words mode employs special relevancy ranking features which emphasize phrase and proximity matches with the search phrase, even though the user did not employ quotes or proximity operators.

The search phrase terms are used to generate three queries:

The three queries combined with the OR operator into a single query, and the relevance ranking are designed to ensure that the results from group #1 always rank above group #2, which rank above group #3.

For example, if you enter "san francisco" hotels, the following queries would be generated:

The search results pages for banner and advanced search allow you to sort the search results by last-modified date, folder, or object type.

 


Basic Search Behavior

Basic search adds some special features in order to increase the chances that a search will return relevant results. As noted in the previous section, term proximity can boost the relevancy ranking in basic search. Automatic spelling correction is also applied only in basic search.

In basic search, if a user search query causes syntax errors in Internet Style mode or query operators mode, it is automatically retried in Bag of Words mode to be as forgiving as possible of user error. For example, if you enter dog and, this query would cause a syntax error in Query Operators mode, because it is missing the right-hand operand to and. The query would then be passed to Bag of Words mode, which would attach no special operator significance to and, and would therefore retrieve documents containing dog and and.

 


Advanced Search Behavior

Advanced search behavior is intended to support complex, precise queries. Therefore it generally does not employ the automatic broadening features of basic search, such as broad cross-field searching or automatic spell correction. Stemming, however, is applied in advanced search.

The Text Search portion of advanced search will search across name, description and full text content. Additional property criteria are applied only to the fields specifically selected in each criterion.

User queries that cause syntax errors in Internet Style mode or Query Operators mode will display an error message in the user interface; the search will not fall back to Bag of Words mode.


  Back to Top       Previous  Next