The phrase module states that results containing the user’s query as an exact phrase, or a subset of the exact phrase, should be considered more relevant than matches simply containing the user’s search terms scattered throughout the text.

Records that have the phrase are ranked higher than records which do not contain the phrase.

The phrase module has a variety of options that you use to customize its behavior. The phrase options are:

The various options can go in parentheses, including considerFieldRanks.

Ranking based on length of subphrases

When you configure the phrase module, you have the option of enabling subphrasing.

Subphrasing ranks results based on the length of their subphrase matches. In other words, results that match three terms are considered more relevant than results that match two terms, and so on. A subphrase is defined as a contiguous subset of the query terms the user entered, in the order that he or she entered them. For example, the query “fax cover sheets” contains the subphrases “fax”, “cover”, “sheets”, “fax cover”, “cover sheets”, and “fax cover sheets”, but not “fax sheets”.

Content contained inside nested quotes in a phrase is treated as one term. For example, consider the following phrase:

The quoted text (“to be or not to be”) is treated as one query term, so this example consists of four query terms even though it has a total of nine words.

When subphrasing is not enabled, results are ranked into two strata: those that matched the entire phrase and those that did not.

Using approximate matching

Approximate matching provides higher-performance matching, as compared to the standard phrase module, with somewhat less exact results.

With approximate matching enabled, the phrase module looks at a limited number of positions in each result that a phrase match could possibly exist, rather than all the positions. Only this limited number of possible occurrences is considered, regardless of whether there are later occurrences that are better, more relevant matches.

The approximate setting is appropriate in cases where the runtime performance of the standard phrase module is inadequate because of large result contents and/or high site load.

Applying thesaurus and stemming

Applying thesaurus and stemming adjustments to the original phrase is generically known as query expansion.

With query expansion enabled, the phrase module ranks results that match a phrase’s expanded forms in the same stratum as results that match the original phrase. Consider the following example:

The query “US government” is expanded to “United States government” for matching purposes, but the phrase module gives a score of two to any results matching “United States government” because the original, unexpanded version of the query, “US government”, only had two terms.

Summary of phrase option interactions

The three configuration settings for the phrase module can be used in a variety of combinations for different effects. The following table summarizes the behavior of each combination.

Subphrase

Approximate

Expansion

Description

Off

Off

Off

Default. Ranks results into two strata: those that match the user’s query as a whole phrase, and those that do not.

Off

Off

On

Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not.

Off

On

Off

Ranks results into two strata: those that match the original query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record.

Off

On

On

Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record.

On

Off

Off

Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase.

On

Off

On

Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Extend subphrases to facilitate matching but rank based on the length of the original subphrase (before extension). Note This combination can have a negative performance impact on query throughput.

On

On

Off

Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Look only at the first possible phrase match within each record.

On

On

On

Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Expand the query to facilitate matching but rank based on the length of the original subphrase (before extension). Look only at the first possible phrase match within each record.

Note: You should only use one phrase module in any given search interface and set all of your options in it.

Results with multiple matches

If a single result has multiple subphrase matches, either within the same field or in several different fields, the result is slotted into a stratum based on the length of the longest subphrase match.

Stop words and phrase behavior

When using the phrase module, stop words are always treated like non-stop word terms and stratified accordingly.

For example, the query “raining cats and dogs” will result in a rank of two for a result containing “fat cats and hungry dogs” and a rank of three for a result containing “fat cats and dogs” (this example assumes subphrase is enabled).

Cross-field matches and phrase behavior

An entire phrase, or subphrase, must appear in a single field in order for it to be considered a match. (In other words, matches created by concatenating fields are not considered by the phrase module.)

Treatment of wildcards with the phrase module

The phrase module translates each wildcard in a query into a generic placeholder for a single term.

Note: Only the asterisk (*) is supported as a wildcard.

For example, the query “sparkling w* wine” becomes “sparkling * wine” during phrase relevance ranking, where “*” indicates a single term. This generic wildcard replacement causes slightly different behavior depending on whether subphrasing is enabled.

When subphrasing is not enabled, all results that match the generic version of the wildcard phrase exactly are still placed into the first stratum. It is important, however, to understand what constitutes a matching result from the phrase module’s point of view.

Consider the search query “sparkling w* wine” with the MatchAny mode enabled. In MatchAny mode, search results only need to contain one of the requested terms to be valid, so a list of search results for this query could contain phrases that look like this:

sparkling white wine
sparkling refreshing wine
sparkling wet wine
sparkling soda
wine cooler

When phrase relevance ranking is applied to these search results, the phrase module looks for matches to “sparkling * wine” not “sparkling w* wine.” Therefore, there are three results—”sparkling white wine,” “sparkling refreshing wine,” and “sparkling wet wine”—that are considered phrase matches for the purposes of ranking.

These results are placed in the first stratum. The other two results are placed in the second stratum. When subphrasing is enabled, the behavior becomes a bit more complex. Again, we have to remember that wildcards become generic placeholders and match any single term in a result. This means that any subphrase that is adjacent to a wildcard will, by definition, match at least one additional term (the wildcard). Because of this behavior, subphrases break down differently. The subphrases for “cold sparkling w* wine” break down into the following (note that w* changes to *):

Cold
sparkling
* wine
cold sparkling *
sparkling * wine
cold sparkling * wine

Notice that the subphrases “sparkling,” “wine,” and “cold sparkling” are not included in this list. Because these subphrases are adjacent to the wildcard, we know that the subphrases will match at least one additional term.

Therefore, these subphrases are subsumed by the “sparkling *”, “* wine”, and “cold sparkling *” subphrases. Like regular subphrase, stratification is based on the number of terms in the subphrase, and the wildcard placeholders are counted toward the length of the subphrase. To continue the example above, results that contain “cold” get a score of one, results that contain “sparkling *” get a score of two, and so on. Again, this is the case even if the matching result phrases are different, for example, “sparkling white” and “sparkling soda.” Finally, it is important to note that, while the wildcard can be replaced by any term, a term must still exist. In other words, search results that contain the phrase “sparkling wine” are not acceptable matches for the phrase “sparkling * wine” because there is no term to substitute for the wildcard. Conversely, the phrase “sparkling cold white wine” is also not a match because each wildcard can be replaced by one, and only one, term. Even when wildcards are present, results must contain the correct number of terms, in the correct order, for them to be considered phrase matches by the phrase module.


Copyright © 1997, 2017 Oracle and/or its affiliates. All rights reserved. Legal Notices