Oracle Commerce Guided Search

the question is "to be or not to be"

The quoted text ("to be or not to be") is treated as one query term, so this example consists of four query terms even though it has a total of nine words.

When subphrasing is not enabled, results are ranked into two strata: those that matched the entire phrase and those that did not.

Using approximate matching

Approximate matching provides higher-performance matching, as compared to the standard Phrase module, with somewhat less exact results.

With approximate matching enabled, the Phrase module looks at a limited number of positions in each result that a phrase match could possibly exist, rather than all the positions. Only this limited number of possible occurrences is considered, regardless of whether there are later occurrences that are better, more relevant matches.

The approximate setting is appropriate in cases where the runtime performance of the standard Phrase module is inadequate because of large result contents and/or high site load.

Applying spelling correction, thesaurus, and stemming

Applying spelling correction, thesaurus, and stemming adjustments to the original phrase is generically known as query expansion.

With query expansion enabled, the Phrase module ranks results that match a phrase’s expanded forms in the same stratum as results that match the original phrase.

Consider the following example:

A thesaurus entry exists that expands "US" to "United States".
The user queries for "US government".

The query "US government" is expanded to "United States government" for matching purposes, but the Phrase module gives a score of two to any results matching "United States government" because the original, unexpanded version of the query, "US government", only had two terms.

Summary of Phrase option interactions

The three configuration settings for the Phrase module can be used in a variety of combinations for different effects.

The following matrix describes the behavior of each combination.

Subphrase	Approximate	Expansion	Description
Off	Off	Off	Default. Ranks results into two strata: those that match the user’s query as a whole phrase, and those that do not.
Off	Off	On	Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not.
Off	On	Off	Ranks results into two strata: those that match the original query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record.
Off	On	On	Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record.
On	Off	Off	Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase.
On	Off	On	Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Extend subphrases to facilitate matching but rank based on the length of the original subphrase (before extension). Note This combination can have a negative performance impact on query throughput.
On	On	Off	Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Look only at the first possible phrase match within each record.
On	On	On	Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Expand the query to facilitate matching but rank based on the length of the original subphrase (before extension). Look only at the first possible phrase match within each record.

Note

You should only use one Phrase module in any given search interface and set all of your options in it.

Effect of search modes on Phrase behavior

Oracle Commerce provides a variety of search modes to facilitate matching during search (MatchAny, MatchAll, MatchPartial, and so on).

These modes only determine which results match a user’s query, they have no effect on how the results are ranked after the matches have been found. Therefore, the Phrase module works as described in this section, regardless of search mode. The one exception to this rule is MatchBoolean. Phrase, like the other relevance ranking modules, is never applied to the results of MatchBoolean queries.

Results with multiple matches

If a single result has multiple subphrase matches, either within the same field or in several different fields, the result is slotted into a stratum based on the length of the longest subphrase match.

Stop words and Phrase behavior

When using the Phrase module, stop words are always treated like non-stop word terms and stratified accordingly.

For example, the query “raining cats and dogs” will result in a rank of two for a result containing “fat cats and hungry dogs” and a rank of three for a result containing “fat cats and dogs” (this example assumes subphrase is enabled).

Cross-field matches and Phrase behavior

An entire phrase, or subphrase, must appear in a single field in order for it to be considered a match.

(In other words, matches created by concatenating fields are not considered by the Phrase module.)

Treatment of wildcards with the Phrase module

The Phrase module translates each wildcard in a query into a generic placeholder for a single term.

For example, the query “sparkling w* wine” becomes “sparkling * wine” during phrase relevance ranking, where “*” indicates a single term. This generic wildcard replacement causes slightly different behavior depending on whether subphrasing is enabled.

When subphrasing is not enabled, all results that match the generic version of the wildcard phrase exactly are still placed into the first stratum. It is important, however, to understand what constitutes a matching result from the Phrase module’s point of view.

Consider the search query “sparkling w* wine” with the MatchAny mode enabled. In MatchAny mode, search results only need to contain one of the requested terms to be valid, so a list of search results for this query could contain phrases that look like this:

sparkling white wine
sparkling refreshing wine
sparkling wet wine
sparkling soda
wine cooler

When phrase relevance ranking is applied to these search results, the Phrase module looks for matches to “sparkling * wine” not “sparkling w* wine.” Therefore, there are three results—”sparkling white wine,” “sparkling refreshing wine,” and “sparkling wet wine”—that are considered phrase matches for the purposes of ranking. These results are placed in the first stratum. The other two results are placed in the second stratum.

When subphrasing is enabled, the behavior becomes a bit more complex. Again, we have to remember that wildcards become generic placeholders and match any single term in a result. This means that any subphrase that is adjacent to a wildcard will, by definition, match at least one additional term (the wildcard). Because of this behavior, subphrases break down differently. The subphrases for “cold sparkling w* wine” break down into the following (note that w* changes to *):

cold
sparkling *
* wine
cold sparkling *
sparkling * wine
cold sparkling * wine

Notice that the subphrases “sparkling,” “wine,” and “cold sparkling” are not included in this list. Because these subphrases are adjacent to the wildcard, we know that the subphrases will match at least one additional term. Therefore, these subphrases are subsumed by the “sparkling *”, “* wine”, and “cold sparkling *” subphrases.

Like regular subphrase, stratification is based on the number of terms in the subphrase, and the wildcard placeholders are counted toward the length of the subphrase. To continue the example above, results that contain “cold” get a score of one, results that contain “sparkling *” get a score of two, and so on. Again, this is the case even if the matching result phrases are different, for example, “sparkling white” and “sparkling soda.”

Finally, it is important to note that, while the wildcard can be replaced by any term, a term must still exist. In other words, search results that contain the phrase “sparkling wine” are not acceptable matches for the phrase “sparkling * wine” because there is no term to substitute for the wildcard. Conversely, the phrase “sparkling cold white wine” is also not a match because each wildcard can be replaced by one, and only one, term. Even when wildcards are present, results must contain the correct number of terms, in the correct order, for them to be considered phrase matches by the Phrase module.

Notes about the Phrase module

Keep the following points in mind when using the Phrase module.

If a query contains only one word, then that word constitutes the entire phrase and all of the matching results will be put into one stratum (score = 1). However, the module can rank the results into two strata: one for records that contain the phrase and a lower-ranking stratum for records that do not contain the phrase.
Because of the way hyphenated words are positionally indexed, Oracle recommends that you enable subphrase if your results contain hyphenated words.

Note

MDEX Engine Developer's Guide