Oracle Commerce Cloud Service

phrase module

The phrase module states that results containing the user’s query as an exact phrase, or a subset of the exact phrase, should be considered more relevant than matches simply containing the user’s search terms scattered throughout the text.

Records that have the phrase are ranked higher than records which do not contain the phrase.

The phrase module has a variety of options that you use to customize its behavior. The phrase options are:

Rank based on length of subphrases
Use approximate subphrase/phrase matching
Apply spell correction, thesaurus, and stemming

The various options can go in parentheses, including considerFieldRanks.

Ranking based on length of subphrases

When you configure the phrase module, you have the option of enabling subphrasing.

Subphrasing ranks results based on the length of their subphrase matches. In other words, results that match three terms are considered more relevant than results that match two terms, and so on. A subphrase is defined as a contiguous subset of the query terms the user entered, in the order that he or she entered them. For example, the query “fax cover sheets” contains the subphrases “fax”, “cover”, “sheets”, “fax cover”, “cover sheets”, and “fax cover sheets”, but not “fax sheets”.

Content contained inside nested quotes in a phrase is treated as one term. For example, consider the following phrase:

the question is “to be or not to be”

The quoted text (“to be or not to be”) is treated as one query term, so this example consists of four query terms even though it has a total of nine words.

When subphrasing is not enabled, results are ranked into two strata: those that matched the entire phrase and those that did not.

Subphrase	Approximate	Expansion	Description
Off	Off	Off	Default. Ranks results into two strata: those that match the user’s query as a whole phrase, and those that do not.
Off	Off	On	Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not.
Off	On	Off	Ranks results into two strata: those that match the original query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record.
Off	On	On	Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record.
On	Off	Off	Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase.
On	Off	On	Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Extend subphrases to facilitate matching but rank based on the length of the original subphrase (before extension). Note This combination can have a negative performance impact on query throughput.
On	On	Off	Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Look only at the first possible phrase match within each record.
On	On	On	Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Expand the query to facilitate matching but rank based on the length of the original subphrase (before extension). Look only at the first possible phrase match within each record. Note: You should only use one phrase module in any given search interface and set all of your options in it.

Results with multiple matches

If a single result has multiple subphrase matches, either within the same field or in several different fields, the result is slotted into a stratum based on the length of the longest subphrase match.

Stop words and phrase behavior

When using the phrase module, stop words are always treated like non-stop word terms and stratified accordingly.

For example, the query “raining cats and dogs” will result in a rank of two for a result containing “fat cats and hungry dogs” and a rank of three for a result containing “fat cats and dogs” (this example assumes subphrase is enabled).

Cross-field matches and phrase behavior

An entire phrase, or subphrase, must appear in a single field in order for it to be considered a match. (In other words, matches created by concatenating fields are not considered by the phrase module.)

Treatment of wildcards with the phrase module

The phrase module translates each wildcard in a query into a generic placeholder for a single term.

Note: Only the asterisk (*) is supported as a wildcard.

For example, the query “sparkling w* wine” becomes “sparkling * wine” during phrase relevance ranking, where “*” indicates a single term. This generic wildcard replacement causes slightly different behavior depending on whether subphrasing is enabled.

When subphrasing is not enabled, all results that match the generic version of the wildcard phrase exactly are still placed into the first stratum. It is important, however, to understand what constitutes a matching result from the phrase module’s point of view.

Consider the search query “sparkling w* wine” with the MatchAny mode enabled. In MatchAny mode, search results only need to contain one of the requested terms to be valid, so a list of search results for this query could contain phrases that look like this:

sparkling white wine
sparkling refreshing wine
sparkling wet wine
sparkling soda
wine cooler

When phrase relevance ranking is applied to these search results, the phrase module looks for matches to “sparkling * wine” not “sparkling w* wine.” Therefore, there are three results—”sparkling white wine,” “sparkling refreshing wine,” and “sparkling wet wine”—that are considered phrase matches for the purposes of ranking.

These results are placed in the first stratum. The other two results are placed in the second stratum. When subphrasing is enabled, the behavior becomes a bit more complex. Again, we have to remember that wildcards become generic placeholders and match any single term in a result. This means that any subphrase that is adjacent to a wildcard will, by definition, match at least one additional term (the wildcard). Because of this behavior, subphrases break down differently. The subphrases for “cold sparkling w* wine” break down into the following (note that w* changes to *):

Cold
sparkling
* wine
cold sparkling *
sparkling * wine
cold sparkling * wine

Notice that the subphrases “sparkling,” “wine,” and “cold sparkling” are not included in this list. Because these subphrases are adjacent to the wildcard, we know that the subphrases will match at least one additional term.

Therefore, these subphrases are subsumed by the “sparkling *”, “* wine”, and “cold sparkling *” subphrases. Like regular subphrase, stratification is based on the number of terms in the subphrase, and the wildcard placeholders are counted toward the length of the subphrase. To continue the example above, results that contain “cold” get a score of one, results that contain “sparkling *” get a score of two, and so on. Again, this is the case even if the matching result phrases are different, for example, “sparkling white” and “sparkling soda.” Finally, it is important to note that, while the wildcard can be replaced by any term, a term must still exist. In other words, search results that contain the phrase “sparkling wine” are not acceptable matches for the phrase “sparkling * wine” because there is no term to substitute for the wildcard. Conversely, the phrase “sparkling cold white wine” is also not a match because each wildcard can be replaced by one, and only one, term. Even when wildcards are present, results must contain the correct number of terms, in the correct order, for them to be considered phrase matches by the phrase module.

Extending Oracle Commerce Cloud