Skip Headers
Oracle® Fusion Middleware Administrator's Guide for Oracle WebCenter Interaction
10g Release 4 (10.3.3.0.0)

Part Number E14107-05
Go to Documentation Home
Home
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

F Default Behavior of Search Service

This appendix describes the default behavior of the portal searches, including search syntax and text search rules.

It includes the following sections:

About the Different Types of Search

The portal provides banner and advanced search tools for typical and advanced users, respectively. The fundamental search syntax and behavior are the same in banner and advanced search, but banner search adds automatic broadening, ranking features, and syntax correction. The following table specifies the search type implemented in the search tools available through different areas of the portal.

Portal Area Search Type Description

Banner search

Banner

Searches the following portal objects: banner fields, the Knowledge Directory, portlets, communities, users, Oracle WebCenter Collaboration items, and Publisher items.

Advanced search

Advanced

Allows composition of complex queries on specific document or object properties. Allows searches on date fields as well as text fields. Allows restriction to specific object type. Advanced search also enables searching of all (or any combination of) indexable portal objects, including many which are not searched in banner search, such as content crawlers, jobs, and Web services.

Federated Search

n/a

Federated search enables you to query multiple search Web services and receive collated results. Portal search can be included as one of the search services. The portal search option from this page behaves similarly to banner search, except only documents in the Knowledge Directory are searched. Spell correction, Best Bets, and other customizations made with the Search Results Manager do not apply.

Object selection

Banner

Search functionality enables end users to search for portlets when adding portlets to pages or search for communities when joining communities.

Administrative object search

Banner

Administrators can search the Administrative Objects Directory, optionally filtering by folder and object type. Search for specific kinds of portal objects is also integrated into the creation of various kinds of administrative objects. For instance, when creating a remote content crawler, the administrator is presented with the option of searching the available content source objects.

Filters

Advanced

Filters enable you to create an advanced search query that documents must match to be allowed into a particular folder in the Knowledge Directory.

Snapshot Query

Advanced

A search query that enables you to specify conditions for searching portal objects and, optionally, display the results in a Content Snapshot Portlet and/or e-mail the results to users. You can limit your search by language, object type, folder, property, and text conditions.


Elements of Search Syntax

There are several syntax elements that work together in search.

About Operator Modes

The Search Service parses queries to determine which operator modes to use for the query.

Bag of Words Mode

If the query does not include any search operators (+/-, AND, OR, NEAR, and so on), the Search Service parses the query in Bag of Words mode. Each word in the query must be present in all of the search results; the Boolean AND operator is implicit.

Query Operators Mode

If the query includes query operators, the Search Service parses the query in Query Operators mode.

Query operators AND, OR, NOT, and NEAR are spotted without any special marking (for example, cat AND dog), but all other operators must be surrounded by angle brackets (for example, <WORD>) to be recognized as having special meaning.

A query that contains three or more terms and an operator is parsed as if the terms on each side of the operator were quoted phrases.

Example: Search Service and Notification

This query is parsed as: "Search Service" AND Notification

Search operators are localized for the following European languages: English, Danish, Dutch, Finnish, French, German, Italian, Norwegian (Bokmal), Norwegian (Nynorsk), Portuguese, and Spanish. If you put angle brackets around the operators, the English versions are also recognized. For example, in the Spanish locale, the following queries are equivalent: perro Y gato, perro <AND> gato, and perro gato. However, perro AND gato is not equivalent in the Spanish locale, because AND is not surrounded by angle brackets.

Anything enclosed in angle brackets but not recognized as one of the supported operators is ignored.

Internet Style Mode

If the query includes operators common to internet search engines such as AltaVista and Google, the Search Service parses the search in Internet Style mode. All terms preceded by a plus (+) are required. All terms preceded by a minus (-) are excluded. If at least one term is preceded by a +, then any “plain” terms not preceded by a + or - are used to boost ranking of results, but are not required. For example, consider the following query: +dog -cat bird

This query returns documents that contain dog but do not contain cat, and ranks documents with both dog and bird highest. Compare this to a similar query: bird -cat

This query returns documents that contain bird but do not contain cat. Absent any + terms, the plain term bird is treated as a required term.

Search String Operators

Operator Description Example Search Text Example Search Results

<AND>

Alternative: AND, & (ampersand)

Connects two terms that must both be included in each item returned.

holiday <AND> schedule

Holiday Schedule

<OR>

Alternative: OR, ACCRUE, ANY, | (vertical bar), , (comma)

Connects two terms where at least one must be included in each item returned.

holiday <OR> vacation

Holiday Schedule, Christmas Holiday Party, Scheduling Vacation

<NOT>

Alternative: NOT, AND NOT

Term must not appear in items returned.

holiday <NOT> vacation

Holiday Schedule, Christmas Holiday Party

<NEAR/N>

Alternative: NEAR

Terms must appear within N words of each other, regardless of order, in items returned.

early <NEAR/10> retirement

Plan early for your retirement

<ORDER>

Both terms must appear in items returned, and the first term must precede the second term.

song <ORDER> bird

song bird (not bird song)

<WORD>

Turns off stemming, alternate case, and spell correction.

   

<PHRASE>

Alternative: Surround terms in " (double quotes)

Both terms must appear sequentially, in a phrase in items returned.

   

<SENTENCE>

Same as <NEAR/10>.

   

<PARAGRAPH>

Same as <NEAR/50>.

   

+ (plus)

Term must appear in the items returned.

   

- (minus)

Term must not appear in the items returned.

   

* (asterisk)

The wildcard specifies that the result must match 0 or more characters at the beginning or end of a word.

sub*

subdirectory, subject, subjective

> (right angle bracket)

The top best bet operator brings the user directly to the top best bet for a term, such as a community, document, or portlet.

>HR

You are navigated to the HR Community.


There are certain circumstances in which a user can unintentionally invoke a more advanced search mode by inadvertently using operators. Examples include the following queries:

Query Equivalent to...

The young and the restless

“the young” <AND> “the restless”

File not found

file <AND> <NOT> found

Error -217439239

Error <AND> <NOT> 217439239


In each of these examples, enclosing the query in double quotes yields the desired effect.

Precedence and Parentheses

The Internet Style mode operators '+' and '-' take precedence over the other search operators. For example, +big dog <order> cat matches all documents that contain the term big, boosting the ranking of any documents that contain any of the three terms dog, or cat.

Within query operators mode, the operators have the following precedence classes, from greatest to least:

  1. NEAR, ORDER, PHRASE, SENTENCE, PARAGRAPH

  2. NOT

  3. AND

  4. OR

Parentheses can be used to override operator precedence. The following two queries are equivalent (the parentheses do not effect the semantics of the search).

  • a and b near c or d

  • (a and (b near c)) or d

This search matches documents that meet one of two conditions:

  • The document contains the term d

  • The document contains the terms a, b, and c, with b and c in close proximity

On the other hand, the parentheses in the following query override the default operator precedence:

a and b near (c or d)

This search matches documents containing the terms a and b and either c or d, where b is in close proximity to c or d.

Punctuation

Punctuation is treated specially in searches.

The following rules describe the interpretation of punctuation characters.

  • Quotation marks are always interpreted as operators signifying a quoted phrase. It is therefore impossible to search for a quotation mark (there is no escape character, such as a backslash, which would remove the special significance of the quotation marks).

  • All other punctuation loses any special operator significance inside of quotation marks. (The same holds for all operators, such as AND.)

  • Outside of quotation marks, punctuation either has significance as an operator, or it is ignored. The following punctuation has special operator significance outside of quotation marks:

    • Left and right angle brackets(<>) enclose operators, as in <NEAR>

    • Comma (,) is treated as OR

    • Ampersand (&) is treated as AND

    • Vertical bar (|) is treated as OR

    • Plus (+) and minus (-) are interpreted as Internet Style syntax

    • Asterisk (*) is interpreted as a wildcard character

  • Punctuation is always split apart from adjoining alpha-numeric characters. For example, an advanced search for bag-of-words matches documents containing the three tokens bag, of, and words.

  • Underscore is treated as punctuation, meaning you must enclose a term containing an underscore in quotes to get an exact match (for example, "HOST_NAME"matches HOST_NAME, but without the quotes, it also matches HOST NAME).

    Symmetrical punctuation tokenization takes place on text stored in the index, so the explosion of a query term such as bag-of-words does not prevent the search from matching a document containing the phrase bag-of-words.

Note:

  • Terms generated by wildcard expansion are not stemmed.

  • Wildcard expansion is performed internally by replacing each pattern with a limited list of terms that match the pattern before actually executing the query. Very broad wildcard expressions might therefore return a partial list of results.

Case Sensitivity

All searches are case-insensitive, except when the <WORD> operator is used.

Table F-1 Case Sensitivity Examples

Query Matches

Oracle

Items containing Oracle, oracle, or any other case variant.

"Search Service"

Items containing the phrase Search Service or any other case variant.

<WORD> Oracle

Items containing Oracle, but not oracle or ORACLE.


Stemming

Word stemming is applied to all individual terms in the search query, except within quoted phrases, or when the <WORD> operator is used. The stemming of query terms means that a query term will match documents containing morphological variants of that term. For example, a search for dogs AND go would match a document containing the terms dog and went. (This example applies to English; stemming employs language-specific information and depends on the user's locale and the language used to index the document.)

Note:

  • Terms generated by wildcard expansion are not stemmed.

  • Stemming is not applied to terms within a quoted phrase.

Wildcards

The wildcard operator (*) is used to search for partial matches (prefixes, suffixes, and substrings) of indexed terms.

Wildcard expansion is performed internally by replacing each pattern with a limited list of terms that match the pattern before actually executing the query. Very broad wildcard expressions might therefore return a partial list of results.

Note:

  • Terms generated by wildcard expansion are not stemmed.

  • Wildcards cannot be used within quoted phrases.

Table F-2 Wildcard Examples

Search Type Query Matches

prefix

cat*

Finds all documents containing terms that start with cat, such as caterpillar.

suffix

*cat

Finds all documents with terms that end in cat, such as tomcat.

substring

*cat*

Finds all documents with terms that contain cat, such as tomcats. Mid-string wildcard expressions must contain at least three characters (for example, *abc* is legal but *bc* is not).


Quoted Phrases

A quoted phrase in the user search query matches only documents that contain the given sequence of terms. For instance, a search for "big dog" will not match a document that contains the terms big and dog if it does not contain the phrase big dog.

Note:

  • Stemming is not applied to terms within a quoted phrase.

  • Wildcards cannot be used within quoted phrases.

Thesaurus Expansion

Thesaurus expansion allows a term or phrase in a user's search to be replaced with a set of custom related terms before the actual search is performed. This feature improves search quality by handling unique, obscure, or industry-specific terminology.

Thesaurus expansion has the following characteristics:

  • It is applied to each term in a banner search.

  • It is applied in all three search modes (Internet Style, Query Operators, and Bag of Words).

  • It is not applied to quoted phrases.

  • If a term is expanded by a thesaurus entry, then it is not eligible for automatic spelling correction.

  • Unlike automatic spell correction, which is applied only as a fallback when the non-corrected terms do not match any documents, thesaurus expansion is always applied to all individual search terms.

How Language Settings Apply to Search

Documents and portal objects are indexed with a language setting that determines how word breaking and stemming are applied. When a user issues a search query, word breaking and stemming are applied according to the user account locale settings. Search results are best when the language used for the search matches the language of the documents being searched. However, searches are normally applied to documents in all languages. Cross-language searches do not benefit from localized stemming and word breaking, but can still return useful results.

The advanced search page offers the ability to restrict searches to a particular language.

  • The user account search preferences give the option of returning only documents that were indexed using the language of the locale.

  • Portal objects can have localized names and descriptions. Banner searches are performed against the default object names and descriptions and the names and descriptions of the locale.

When searching portal content through the Search box in the portal banner, the text of the query is processed using the language setting of the user interface. If the portal interface is German, the query is tokenized and stemmed using German language rules, providing optimal search results for documents indexed using German linguistic rules.

If the search collection contains documents in other languages, you can still retrieve them with a query using the appropriate text (assuming the user interface permits entry of the necessary characters). Typing English words into the search box of a portal using a German interface applies German linguistic rules to the query text. Because English stemming is not used, the query is not able to match alternate English word forms; however, English language documents containing the entered words are retrieved.

Although you can enter Asian language text into a European language search box (if a compatible character encoding is used), you should limit the text to a single word or manually separate words with white space to be able to match Asian content in the search collection.

The Advanced Search page provides additional functionality for searching in a multi-language document collection. A pop-up list allows the user to select the language to use for query processing. Linguistic rules for tokenizing and stemming the selected language are used when processing the query text.

The query operators recognized by Simple Search and Advanced Search are sensitive to the language setting. For example, the AND operator can be specified as “UND” when the query is processed as German. Localized operators are available for the following languages: English, Danish, Dutch, Finnish, French, German, Italian, Norwegian (Bokmal), Norwegian (Nynorsk), Portuguese, and Spanish. All other languages use English operators.

Search Service Language Support

The portal provides support for 61 languages.

Of the languages supported by the portal, the following languages include support for word stemming and compound decomposition. This additional information is used to enhance results of the full-text index.

  • Chinese (Simplified)

  • Chinese (Traditional)

  • Czech

  • Danish

  • Dutch

  • English

  • Finnish

  • French

  • German

  • Greek

  • Hungarian

  • Italian

  • Japanese

  • Korean

  • Norwegian (Bokmal)

  • Norwegian (Bokmal)

  • Polish

  • Portuguese

  • Russian

  • Spanish

  • Swedish

  • Turkish

The following languages are supported at a reduced level.

  • Afrikaans

  • Albanian

  • Arabic

  • Basque

  • Belarusian

  • Bengali

  • Bulgarian

  • Catalan

  • Cornish

  • Croatian

  • Esperanto

  • Estonian

  • Faeroese

  • Gallegan

  • Hebrew

  • Hindi

  • Icelandic

  • Indonesian

  • Irish

  • Kalaallisut

  • Konkani

  • Latvian

  • Lithuanian

  • Macedonian

  • Maltese

  • Manx

  • Marathi

  • Persian

  • Romanian

  • Serbian

  • Serbian-Croatian

  • Slovak

  • Slovenian

  • Swahili

  • Tamil

  • Telugu

  • Thai

  • Ukranian

  • Vietnamese

Using Text Search Rules

When you search for text, you generally can just type the text you are looking for (the search string). However, there are a few rules you should be aware of:

Note:

Search strings are case-insensitive; that is, uppercase A is the same as lowercase a.

Search Examples

The descriptions of searches below do not include any of the query expansion or ranking techniques that are employed in banner search. Except where otherwise noted, all matches are case-insensitive.

Query Expected Behavior

Dog

Searches for documents containing any stem variant of Dog.

<WORD> Dog

Searches for documents containing Dog as specified exactly with no stemming or lowercasing. This is the only case-sensitive form of search.

Big <PHRASE> Dog

Searches for documents containing the exact phrase big dog without stemming.

“Big Dog”

Same as Big <PHRASE> Dog.

cat AND dog

Searches for documents containing stem variants of cat and dog. Equivalent to cat <AND> dog.

cat <ALL> dog

Same as cat AND dog.

cat OR dog

Searches for documents containing stem variants of cat or dog.

cat, dog

Same as cat OR dog.

cat <ANY> dog

Same as cat OR dog.

cat <ACCRUE> dog

Same as cat OR dog.

cat NOT dog

Searches for documents containing stem variants of cat but not containing stem variants of dog.

cat AND NOT dog

Same as cat NOT dog.

cat NEAR dog

Finds stem variants of cat occurring near dog (default is within 25 words).

cat NEAR/15 dog

Finds stem variants of cat within 15 words of dog.

cat <ORDER><NEAR/15> dog

Finds stem variants of cat within 15 words before dog. Can also use more convenient syntax cat <ORDER NEAR/15> dog.

cat <ORDER> dog

Finds stem variants of cat anywhere before dog.

cat <SENTENCE> dog

Finds stem variants of cat within 10 words of dog.

cat <PARAGRAPH> dog

Finds stem variants of cat within 50 words of dog.

cat <XYZ> dog

Finds stem variants of cat and dog. The unsupported operator XYZ is ignored.

cat*

Finds all documents containing terms that start with cat, such as caterpillar.

*cat

Finds all documents with terms that end in cat such as tomcat.

*cat*

Finds all documents with terms that contain cat such as tomcats. Mid-string wildcard expressions must contain at least three characters (for example, *abc* is legal but *bc* is not).

dog *

Finds documents containing stem variants of dog. The singleton wildcard is treated as stray punctuation.

dog cat bird

Finds documents containing stem variants of all three terms, dog, cat, and bird. (Bag of Words mode)

big dog AND bird

Finds documents containing the phrase big dog, and stem variants of the term bird. (Query Operators mode with implicit phrase construction)

dog cat +bird

Finds documents containing stem variants of bird. The rank is boosted for documents containing stem variants of dog or cat. The words dog and cat are not joined into a phrase in Internet Style mode.

+dog -cat bird

Finds documents that contain stem variants of dog but do not contain stem variants of cat, and ranks documents with both dog and bird highest.

bird -cat

Finds documents that contain stem variants of bird but do not contain stem variants of cat.

bag-of-words

Searches for documents containing stem variants of the three terms: bag, of, and words. Punctuation marks are treated as spaces when quotation marks are not present.

“Mr. Jones”

Searches for the phrase mr. jones. Punctuation marks are considered part of the search string if they are included within quoted phrases.


How Search Results Are Ranked

Search results are ranked according to relevance, by default. There are several factors that determine relevance.

How Term Frequency Factors in Relevance

The number of times a query term (or its stemmed and case variant forms) appears in a searchable item has a large influence on the relevance ranking of the item. All other things being equal, items which contain more instances of a query term will rank higher than items containing fewer instances. This is known as term-frequency-based ranking.

About Metadata (Field) Weighting

Banner searches are performed across several document fields, and some fields are weighted higher than other fields, so that, for instance, a match on an object name ranks higher than a match on an object description. By default, the fields searched are name, description, and full-text content.

How Phrases and Proximity Factor in Relevance

In banner search, Bag of Words mode employs special relevancy ranking features which emphasize phrase and proximity matches with the search phrase, even though the user did not employ quotes or proximity operators.

The search phrase terms are used to generate three queries:

  1. All words joined together as a single phrase

  2. Stem variants of all words and all quoted phrases <ORDER><NEAR> each other

  3. Stem variants of all words and all quoted phrases joined together with AND

The three queries combined with the OR operator into a single query, and the relevance ranking are designed to ensure that the results from group 1 always rank above group 2, which rank above group 3.

For example, if you enter "san francisco" hotels, the following queries would be generated:

  • “san francisco hotels”

  • “san francisco” <ORDER><NEAR> hotels

  • “san francisco” AND hotels

The search results pages for banner and advanced search allow you to sort the search results by last-modified date, folder, or object type.

About Banner Search Behavior

Banner search adds some special features to increase the chances that a search will return relevant results.

Banner search has several characteristics:

About Advanced Search Behavior

Advanced search behavior is intended to support complex, precise queries. Therefore it generally does not employ the automatic broadening features of banner search, such as broad cross-field searching or automatic spell correction. Stemming, however, is applied in advanced search.

The Text Search portion of advanced search will search across name, description and full text content. Additional property criteria are applied only to the fields specifically selected in each criterion.

User queries that cause syntax errors in Internet Style mode or Query Operators mode will display an error message in the user interface; the search will not fall back to Bag of Words mode.