The Phrase module states that results containing the user’s query as an exact phrase, or a subset of the exact phrase, should be considered more relevant than matches simply containing the user’s search terms scattered throughout the text.
Records that have the phrase are ranked higher than records which do not contain the phrase.
The Phrase module has a variety of options that you use to customize its behavior.
The Phrase options are:
When you add the Phrase module in the Relevance Ranking Modules editor, you are presented with the following editor that allows you to set these options.
When you configure the Phrase module, you have the option of enabling subphrasing.
Subphrasing ranks results based on the length of their subphrase matches. In other words, results that match three terms are considered more relevant than results that match two terms, and so on.
A subphrase is defined as a contiguous subset of the query terms the user entered, in the order that he or she entered them. For example, the query "fax cover sheets" contains the subphrases "fax", "cover", "sheets", "fax cover", "cover sheets", and "fax cover sheets", but not "fax sheets".
Content contained inside nested quotes in a phrase is treated as one term. For example, consider the following phrase:
the question is "to be or not to be"
The quoted text ("to be or not to be") is treated as one query term, so this example consists of four query terms even though it has a total of nine words.
When subphrasing is not enabled, results are ranked into two strata: those that matched the entire phrase and those that did not.
Approximate matching provides higher-performance matching, as compared to the standard Phrase module, with somewhat less exact results.
With approximate matching enabled, the Phrase module looks at a limited number of positions in each result that a phrase match could possibly exist, rather than all the positions. Only this limited number of possible occurrences is considered, regardless of whether there are later occurrences that are better, more relevant matches.
The approximate setting is appropriate in cases where the runtime performance of the standard Phrase module is inadequate because of large result contents and/or high site load.
Applying spelling correction, thesaurus, and stemming adjustments to the original phrase is generically known as query expansion.
With query expansion enabled, the Phrase module ranks results that match a phrase’s expanded forms in the same stratum as results that match the original phrase.
Consider the following example:
The query "US government" is expanded to "United States government" for matching purposes, but the Phrase module gives a score of two to any results matching "United States government" because the original, unexpanded version of the query, "US government", only had two terms.
The three configuration settings for the Phrase module can be used in a variety of combinations for different effects.
The following matrix describes the behavior of each combination.
Subphrase |
Approximate |
Expansion |
Description |
---|---|---|---|
Off |
Off |
Off |
Default. Ranks results into two strata: those that match the user’s query as a whole phrase, and those that do not. |
Off |
Off |
On |
Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not. |
Off |
On |
Off |
Ranks results into two strata: those that match the original query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record. |
Off |
On |
On |
Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record. |
On |
Off |
Off |
Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. |
On |
Off |
On |
Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Extend subphrases to facilitate matching but rank based on the length of the original subphrase (before extension). NoteThis combination can have a negative performance impact on query throughput.
|
On |
On |
Off |
Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Look only at the first possible phrase match within each record. |
On |
On |
On |
Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Expand the query to facilitate matching but rank based on the length of the original subphrase (before extension). Look only at the first possible phrase match within each record. |
Note
You should only use one Phrase module in any given search interface and set all of your options in it.
Oracle Commerce provides a variety of search modes to facilitate matching during search (MatchAny, MatchAll, MatchPartial, and so on).
These modes only determine which results match a user’s query, they have no effect on how the results are ranked after the matches have been found. Therefore, the Phrase module works as described in this section, regardless of search mode. The one exception to this rule is MatchBoolean. Phrase, like the other relevance ranking modules, is never applied to the results of MatchBoolean queries.
If a single result has multiple subphrase matches, either within the same field or in several different fields, the result is slotted into a stratum based on the length of the longest subphrase match.
When using the Phrase module, stop words are always treated like non-stop word terms and stratified accordingly.
For example, the query “raining cats and dogs” will result in a rank of two for a result containing “fat cats and hungry dogs” and a rank of three for a result containing “fat cats and dogs” (this example assumes subphrase is enabled).
An entire phrase, or subphrase, must appear in a single field in order for it to be considered a match.
(In other words, matches created by concatenating fields are not considered by the Phrase module.)
The Phrase module translates each wildcard in a query into a generic placeholder for a single term.
For example, the query “sparkling w* wine” becomes “sparkling * wine” during phrase relevance ranking, where “*” indicates a single term. This generic wildcard replacement causes slightly different behavior depending on whether subphrasing is enabled.
When subphrasing is not enabled, all results that match the generic version of the wildcard phrase exactly are still placed into the first stratum. It is important, however, to understand what constitutes a matching result from the Phrase module’s point of view.
Consider the search query “sparkling w* wine” with the MatchAny mode enabled. In MatchAny mode, search results only need to contain one of the requested terms to be valid, so a list of search results for this query could contain phrases that look like this:
sparkling white wine sparkling refreshing wine sparkling wet wine sparkling soda wine cooler
When phrase relevance ranking is applied to these search results, the Phrase module looks for matches to “sparkling * wine” not “sparkling w* wine.” Therefore, there are three results—”sparkling white wine,” “sparkling refreshing wine,” and “sparkling wet wine”—that are considered phrase matches for the purposes of ranking. These results are placed in the first stratum. The other two results are placed in the second stratum.
When subphrasing is enabled, the behavior becomes a bit more complex. Again, we have to remember that wildcards become generic placeholders and match any single term in a result. This means that any subphrase that is adjacent to a wildcard will, by definition, match at least one additional term (the wildcard). Because of this behavior, subphrases break down differently. The subphrases for “cold sparkling w* wine” break down into the following (note that w* changes to *):
cold sparkling * * wine cold sparkling * sparkling * wine cold sparkling * wine
Notice that the subphrases “sparkling,” “wine,” and “cold sparkling” are not included in this list. Because these subphrases are adjacent to the wildcard, we know that the subphrases will match at least one additional term. Therefore, these subphrases are subsumed by the “sparkling *”, “* wine”, and “cold sparkling *” subphrases.
Like regular subphrase, stratification is based on the number of terms in the subphrase, and the wildcard placeholders are counted toward the length of the subphrase. To continue the example above, results that contain “cold” get a score of one, results that contain “sparkling *” get a score of two, and so on. Again, this is the case even if the matching result phrases are different, for example, “sparkling white” and “sparkling soda.”
Finally, it is important to note that, while the wildcard can be replaced by any term, a term must still exist. In other words, search results that contain the phrase “sparkling wine” are not acceptable matches for the phrase “sparkling * wine” because there is no term to substitute for the wildcard. Conversely, the phrase “sparkling cold white wine” is also not a match because each wildcard can be replaced by one, and only one, term. Even when wildcards are present, results must contain the correct number of terms, in the correct order, for them to be considered phrase matches by the Phrase module.
Keep the following points in mind when using the Phrase module.
If a query contains only one word, then that word constitutes the entire phrase and all of the matching results will be put into one stratum (score = 1). However, the module can rank the results into two strata: one for records that contain the phrase and a lower-ranking stratum for records that do not contain the phrase.
Because of the way hyphenated words are positionally indexed, Oracle recommends that you enable subphrase if your results contain hyphenated words.