Oracle® Fusion Middleware Administrator's Guide for Oracle WebCenter Interaction 10g Release 3 (10.3.0.1) Part Number E14107-02 |
|
|
View PDF |
This appendix describes the default behavior of the portal searches.
The portal provides basic and advanced search tools for typical and advanced users, respectively. The fundamental search syntax and behavior are the same in basic and advanced search, but basic search adds automatic broadening, ranking features, and syntax correction. The following table specifies the search type implemented in the search tools available through different areas of the portal.
Portal Area | Search Type | Description |
---|---|---|
Banner search | Basic | Searches the following portal objects: banner fields, the Knowledge Directory, portlets, communities, users, Oracle WebCenter Collaboration items, and Publisher items. |
Advanced search | Advanced | Allows composition of complex queries on specific document or object properties. Allows searches on date fields as well as text fields. Allows restriction to specific object type. Advanced search also enables searching of all (or any combination of) indexable portal objects, including many which are not searched in banner search, such as content crawlers, jobs, and web services. |
Federated Search | n/a | Federated search allows you to query multiple search web services and receive collated results. Portal search can be included as one of the search services. The portal search option from this page behaves similarly to basic search, except only documents in the Knowledge Directory are searched. Spell correction, Best Bets, and other customizations made with the Search Results Manager do not apply. |
Object selection | Basic | Search functionality enables end users to search for portlets when adding portlets to pages or search for communities when joining communities. |
Administrative object search | Basic | Administrators can search the Administrative Objects Directory, optionally filtering by folder and object type. Search for specific kinds of portal objects is also integrated into the creation of various kinds of administrative objects. For instance, when creating a remote content crawler, the administrator is presented with the option of searching the available content source objects. |
Filters | Advanced | Allows you to create an advanced search query that documents must match to be allowed into a particular folder in the Knowledge Directory. |
Snapshot Query | Advanced | A search query that allows you to specify conditions for searching portal objects and, optionally, display the results in a Content Snapshot Portlet and/or e-mail the results to users. You can limit your search by language, object type, folder, property, and text conditions. |
There are several syntax elements that work together in search.
The Search Service parses queries to determine which operator modes to use for the query.
If the query does not include any search operators (+/-, AND, OR, NEAR, etc.), the Search Service parses the query in Bag of Words mode. Each word in the query must be present in all of the search results; the Boolean AND operator is implicit.
If the query includes query operators, the Search Service parses the query in Query Operators mode.
Query operators AND, OR, NOT, and NEAR are spotted without any special marking (for example, cat AND dog), but all other operators must be surrounded by angle brackets (for example, <WORD>) to be recognized as having special meaning.
A query that contains three or more terms and an operator is parsed as if the terms on each side of the operator were quoted phrases.
Example: Search Service and Notification
This query is parsed as: "Search Service" AND Notification
Search operators are localized for the following European languages: English, Danish, Dutch, Finnish, French, German, Italian, Norwegian (Bokmal), Norwegian (Nynorsk), Portuguese, and Spanish. If you put angle brackets around the operators, the English versions are also recognized. For example, in the Spanish locale, the following queries are equivalent: perro Y gato
, perro <AND> gato
, and perro gato
. However, perro AND gato
is not equivalent in the Spanish locale, because AND is not surrounded by angle brackets.
Anything enclosed in angle brackets but not recognized as one of the supported operators is ignored.
If the query includes operators common to internet search engines such as AltaVista and Google, the Search Service parses the search in Internet Style mode. All terms preceded by a plus (+) are required. All terms preceded by a minus (-) are excluded. If at least one term is preceded by a +, then any “plain” terms not preceded by a + or - are used to boost ranking of results, but are not required. For example, consider the following query: +dog -cat bird
This query returns documents that contain dog but do not contain cat, and ranks documents with both dog and bird highest. Compare this to a similar query: bird -cat
This query returns documents that contain bird but do not contain cat. Absent any + terms, the plain term bird is treated as a required term.
Operator | Description | Example Search Text | Example Search Results |
---|---|---|---|
<AND>
Alternative: AND, & (ampersand) |
Connects two terms that must both be included in each item returned. | holiday <AND> schedule | Holiday Schedule |
<OR>
Alternative: OR, ACCRUE, ANY, | (vertical bar), , (comma) |
Connects two terms where at least one must be included in each item returned. | holiday <OR> vacation | Holiday Schedule, Christmas Holiday Party, Scheduling Vacation |
<NOT>
Alternative: NOT, AND NOT |
Term must not appear in items returned. | holiday <NOT> vacation | Holiday Schedule, Christmas Holiday Party |
<NEAR/N>
Alternative: NEAR |
Terms must appear within N words of each other, regardless of order, in items returned. | early <NEAR/10> retirement | Plan early for your retirement |
<ORDER> | Both terms must appear in items returned, and the first term must precede the second term. | song <ORDER> bird | song bird (not bird song) |
<WORD> | Turns off stemming, alternate case, and spell correction. | ||
<PHRASE>
Alternative: Surround terms in " (double quotes) |
Both terms must appear sequentially, in a phrase in items returned. | ||
<SENTENCE> | Same as <NEAR/10>. | ||
<PARAGRAPH> | Same as <NEAR/50>. | ||
+ (plus) | Term must appear in the items returned. | ||
- (minus) | Term must not appear in the items returned. | ||
* (asterisk) | The wildcard specifies that the result must match 0 or more characters at the beginning or end of a word. | sub* | subdirectory, subject, subjective |
> (right angle bracket) | The top best bet operator brings the user directly to the top best bet for a term, such as a community, document, or portlet. | >HR | You are navigated to the HR Community. |
There are certain circumstances in which a user can unintentionally invoke a more advanced search mode by inadvertently using operators. Examples include the following queries:
Query | Equivalent to... |
---|---|
The young and the restless | “the young” <AND> “the restless” |
File not found | file <AND> <NOT> found |
Error -217439239 | Error <AND> <NOT> 217439239 |
In each of these examples, enclosing the query in double quotes yields the desired effect.
The Internet Style mode operators '+' and '-' take precedence over the other search operators. For example, +big dog <order> cat
matches all documents that contain the term big, boosting the ranking of any documents that contain any of the three terms dog, or cat.
Within query operators mode, the operators have the following precedence classes, from greatest to least:
NEAR, ORDER, PHRASE, SENTENCE, PARAGRAPH
NOT
AND
OR
Parentheses can be used to override operator precedence. The following two queries are equivalent (the parentheses do not effect the semantics of the search).
a and b near c or d
(a and (b near c)) or d
This search matches documents that meet one of two conditions:
The document contains the term d
The document contains the terms a, b, and c, with b and c in close proximity
On the other hand, the parentheses in the following query override the default operator precedence:
a and b near (c or d)
This search matches documents containing the terms a and b and either c or d, where b is in close proximity to c or d.
Punctuation is treated specially in searches.
The following rules describe the interpretation of punctuation characters.
Quotation marks are always interpreted as operators signifying a quoted phrase. It is therefore impossible to search for a quotation mark (there is no escape character, such as a backslash, which would remove the special significance of the quotation marks).
All other punctuation loses any special operator significance inside of quotation marks. (The same holds for all operators, such as AND.)
Outside of quotation marks, punctuation either has significance as an operator, or it is ignored. The following punctuation has special operator significance outside of quotation marks:
Left and right angle brackets(<>) enclose operators, as in <NEAR>
Comma (,) is treated as OR
Ampersand (&) is treated as AND
Vertical bar (|) is treated as OR
Plus (+) and minus (-) are interpreted as Internet Style syntax
Asterisk (*) is interpreted as a wildcard character
Punctuation is always split apart from adjoining alpha-numeric characters. For example, an advanced search for bag-of-words
matches documents containing the three tokens bag, of, and words.
Underscore is treated as punctuation. This means you must enclose a term containing an underscore in quotes to get an exact match (for example, "HOST_NAME"
matches HOST_NAME, but without the quotes, it also matches HOST NAME).
Symmetrical punctuation tokenization takes place on text stored in the index, so the explosion of a query term such as bag-of-words does not prevent the search from matching a document containing the phrase bag-of-words.
Note:
Wildcard expansion is performed internally by replacing each pattern with a limited list of terms that match the pattern before actually executing the query. Very broad wildcard expressions might therefore return a partial list of results.
All searches are case-insensitive, except when the <WORD> operator is used.
Word stemming is applied to all individual terms in the search query, except within quoted phrases, or when the <WORD> operator is used. The stemming of query terms means that a query term will match documents containing morphological variants of that term. For example, a search for dogs AND go
would match a document containing the terms dog and went. (This example applies to English; stemming employs language-specific information and depends on the user's locale and the language used to index the document.)
Note:
Terms generated by wildcard expansion are not stemmed.
Stemming is not applied to terms within a quoted phrase.
The wildcard operator (*) is used to search for partial matches (prefixes, suffixes, and substrings) of indexed terms.
Wildcard expansion is performed internally by replacing each pattern with a limited list of terms that match the pattern before actually executing the query. Very broad wildcard expressions might therefore return a partial list of results.
Note:
Terms generated by wildcard expansion are not stemmed.
Wildcards cannot be used within quoted phrases.
Table E-2 Wildcard Examples
Search Type | Query | Matches |
---|---|---|
prefix |
|
Finds all documents containing terms that start with cat, such as caterpillar. |
suffix |
|
Finds all documents with terms that end in cat, such as tomcat. |
substring |
|
Finds all documents with terms that contain cat, such as tomcats. Mid-string wildcard expressions must contain at least three characters (for example, *abc* is legal but *bc* is not). |
A quoted phrase in the user search query matches only documents that contain the given sequence of terms. For instance, a search for "big dog"
will not match a document that contains the terms big and dog if it does not contain the phrase big dog.
Note:
Stemming is not applied to terms within a quoted phrase.
Wildcards cannot be used within quoted phrases.
Thesaurus expansion allows a term or phrase in a user's search to be replaced with a set of custom related terms before the actual search is performed. This feature improves search quality by handling unique, obscure, or industry-specific terminology.
Thesaurus expansion has the following characteristics:
It is applied to each term in a basic search.
It is applied in all three search modes (Internet Style, Query Operators, and Bag of Words).
It is not applied to quoted phrases.
If a term is expanded by a thesaurus entry, then it is not eligible for automatic spelling correction.
Unlike automatic spell correction, which is applied only as a fallback when the non-corrected terms do not match any documents, thesaurus expansion is always applied to all individual search terms.
Documents and portal objects are indexed with a language setting that determines how word breaking and stemming are applied. When a user issues a search query, word breaking and stemming are applied according to the user account locale settings. Search results are best when the language used for the search matches the language of the documents being searched. However, searches are normally applied to documents in all languages. Cross-language searches do not benefit from localized stemming and word breaking, but can still return useful results.
The advanced search page offers the ability to restrict searches to a particular language.
The user account search preferences give the option of returning only documents that were indexed using the language of the locale.
Portal objects can have localized names and descriptions. Basic searches are performed against the default object names and descriptions and the names and descriptions of the locale.
When searching portal content via the Search box in the portal banner, the text of the query is processed using the language setting of the user interface. If the portal interface is German, the query is tokenized and stemmed using German language rules, providing optimal search results for documents indexed using German linguistic rules.
If the search collection contains documents in other languages, you can still retrieve them with a query using the appropriate text (assuming the user interface permits entry of the necessary characters). Typing English words into the search box of a portal using a German interface applies German linguistic rules to the query text. Because English stemming is not used, the query is not able to match alternate English word forms; however, English language documents containing the entered words are retrieved.
Although you can enter Asian language text into a European language search box (if a compatible character encoding is used), you should limit the text to a single word or manually separate words with white space to be able to match Asian content in the search collection.
The Advanced Search page provides additional functionality for searching in a multi-language document collection. A pop-up list allows the user to select the language to use for query processing. Linguistic rules for tokenizing and stemming the selected language are used when processing the query text. Among other things, this means that Asian text can be entered without unnecessary white space.
The query operators recognized by Simple Search and Advanced Search are sensitive to the language setting. For example, the AND operator can be specified as “UND” when the query is processed as German. Localized operators are available for the following languages: English, Danish, Dutch, Finnish, French, German, Italian, Norwegian (Bokmal), Norwegian (Nynorsk), Portuguese, and Spanish. All other languages use English operators.
The portal provides support for 61 languages.
Of the languages supported by the portal, the following languages include support for word stemming and compound decomposition. This additional information is used to enhance results of the full-text index.
Chinese (Simplified)
Chinese (Traditional)
Czech
Danish
Dutch
English
Finnish
French
German
Greek
Hungarian
Italian
Japanese
Korean
Norwegian (Bokmal)
Norwegian (Bokmal)
Polish
Portuguese
Russian
Spanish
Swedish
Turkish
The following languages are supported at a reduced level.
Afrikaans
Albanian
Arabic
Basque
Belarusian
Bengali
Bulgarian
Catalan
Cornish
Croatian
Esperanto
Estonian
Faeroese
Gallegan
Hebrew
Hindi
Icelandic
Indonesian
Irish
Kalaallisut
Konkani
Latvian
Lithuanian
Macedonian
Maltese
Manx
Marathi
Persian
Romanian
Serbian
Serbian-Croatian
Slovak
Slovenian
Swahili
Tamil
Telugu
Thai
Ukranian
Vietnamese
When you search for text, you generally can just type the text you are looking for (the search string). However, there are a few rules you should be aware of:
Note:
Search strings are case-insensitive; that is, uppercase A is the same as lowercase a.To find objects or documents containing all terms in your search string, separate your terms with spaces.
This is the same as using AND.
To find objects or documents containing one or more of the terms in your search string, separate your terms with commas.
This is the same as using OR.
To search for an exact phrase, type quotation marks (“) around the phrase.
To specify that a term must be included in each result, type a plus (+) in front of the term.
To exclude a term from the results, type a minus (-) in front of the term.
Note:
Do not include a space after the plus or minus.
Do not use the plus or minus in the same search with other search string operators.
The descriptions of searches below do not include any of the query expansion or ranking techniques that are employed in basic search. Except where otherwise noted, all matches are case-insensitive.
Query | Expected Behavior |
---|---|
Dog | Searches for documents containing any stem variant of Dog. |
<WORD> Dog | Searches for documents containing Dog as specified exactly with no stemming or lowercasing. This is the only case-sensitive form of search. |
Big <PHRASE> Dog | Searches for documents containing the exact phrase big dog without stemming. |
“Big Dog” | Same as Big <PHRASE> Dog. |
cat AND dog | Searches for documents containing stem variants of cat and dog. Equivalent to cat <AND> dog. |
cat <ALL> dog | Same as cat AND dog. |
cat OR dog | Searches for documents containing stem variants of cat or dog. |
cat, dog | Same as cat OR dog. |
cat <ANY> dog | Same as cat OR dog. |
cat <ACCRUE> dog | Same as cat OR dog. |
cat NOT dog | Searches for documents containing stem variants of cat but not containing stem variants of dog. |
cat AND NOT dog | Same as cat NOT dog. |
cat NEAR dog | Finds stem variants of cat occurring near dog (default is within 25 words). |
cat NEAR/15 dog | Finds stem variants of cat within 15 words of dog. |
cat <ORDER><NEAR/15> dog | Finds stem variants of cat within 15 words before dog. Can also use more convenient syntax cat <ORDER NEAR/15> dog. |
cat <ORDER> dog | Finds stem variants of cat anywhere before dog. |
cat <SENTENCE> dog | Finds stem variants of cat within 10 words of dog. |
cat <PARAGRAPH> dog | Finds stem variants of cat within 50 words of dog. |
cat <XYZ> dog | Finds stem variants of cat and dog. The unsupported operator XYZ is ignored. |
cat* | Finds all documents containing terms that start with cat, such as caterpillar. |
*cat | Finds all documents with terms that end in cat such as tomcat. |
*cat* | Finds all documents with terms that contain cat such as tomcats. Mid-string wildcard expressions must contain at least three characters (for example, *abc* is legal but *bc* is not). |
dog * | Finds documents containing stem variants of dog. The singleton wildcard is treated as stray punctuation. |
dog cat bird | Finds documents containing stem variants of all three terms, dog, cat, and bird. (Bag of Words mode) |
big dog AND bird | Finds documents containing the phrase big dog, and stem variants of the term bird. (Query Operators mode with implicit phrase construction) |
dog cat +bird | Finds documents containing stem variants of bird. The rank is boosted for documents containing stem variants of dog or cat. The words dog and cat are not joined into a phrase in Internet Style mode. |
+dog -cat bird | Finds documents that contain stem variants of dog but do not contain stem variants of cat, and ranks documents with both dog and bird highest. |
bird -cat | Finds documents that contain stem variants of bird but do not contain stem variants of cat. |
bag-of-words | Searches for documents containing stem variants of the three terms: bag, of, and words. Punctuation marks are treated as spaces when quotation marks are not present. |
“Mr. Jones” | Searches for the phrase mr. jones. Punctuation marks are considered part of the search string if they are included within quoted phrases. |
Search results are ranked according to relevance, by default. There are several factors that determine relevance.
The number of times a query term (or its stemmed and case variant forms) appears in a searchable item has a large influence on the relevance ranking of the item. All other things being equal, items which contain more instances of a query term will rank higher than items containing fewer instances. This is known as term-frequency-based ranking.
Basic searches are performed across several document fields, and some fields are weighted higher than other fields, so that, for instance, a match on an object name ranks higher than a match on an object description. By default, the fields searched are name, description, and full-text content.
In basic search, Bag of Words mode employs special relevancy ranking features which emphasize phrase and proximity matches with the search phrase, even though the user did not employ quotes or proximity operators.
The search phrase terms are used to generate three queries:
All words joined together as a single phrase
Stem variants of all words and all quoted phrases <ORDER><NEAR> each other
Stem variants of all words and all quoted phrases joined together with AND
The three queries combined with the OR operator into a single query, and the relevance ranking are designed to ensure that the results from group 1 always rank above group 2, which rank above group 3.
For example, if you enter "san francisco" hotels
, the following queries would be generated:
“san francisco hotels”
“san francisco” <ORDER><NEAR> hotels
“san francisco” AND hotels
The search results pages for banner and advanced search allow you to sort the search results by last-modified date, folder, or object type.
Basic search adds some special features in order to increase the chances that a search will return relevant results.
Basic search has several characteristics:
In basic search, if a user search query causes syntax errors in Internet Style mode or query operators mode, it is automatically retried in Bag of Words mode to be as forgiving as possible of user error. For example, if you enter dog and, this query would cause a syntax error in Query Operators mode, because it is missing the right-hand operand to and. The query would then be passed to Bag of Words mode, which would attach no special operator significance and would therefore retrieve documents containing dog and and.
Term proximity can boost the relevancy ranking in basic search.
Automatic spelling correction is applied only in basic search.
Advanced search behavior is intended to support complex, precise queries. Therefore it generally does not employ the automatic broadening features of basic search, such as broad cross-field searching or automatic spell correction. Stemming, however, is applied in advanced search.
The Text Search portion of advanced search will search across name, description and full text content. Additional property criteria are applied only to the fields specifically selected in each criterion.
User queries that cause syntax errors in Internet Style mode or Query Operators mode will display an error message in the user interface; the search will not fall back to Bag of Words mode.