Configure the Ranking of Records in Search Results
You can determine the order in which records appear in search results by configuring a relevance ranking strategy.
![]()
This section applies to Open Storefront Framework
(OSF).
The relevance ranking strategy is an ordered list of one or more relevance ranking modules, each of which uses different criteria to sort the records in the search results.
Note:
Retail Digital Commerce includes a basic default relevancy
ranking strategy. Modify this default to order search
results in a way that is helpful to your shoppers. Successful strategies
typically include the modules Phrase, MaxField, Glom, Static, and WFreq.
This section covers the following topics:
Understand how Relevance Ranking Modules sort Results
Relevance ranking modules are applied in the order in which they are listed in the relevance ranking strategy. For example, the following JSON format configuration of a relevance ranking strategy,
"relRankStrategy": "exact(considerFieldRanks),glom,static(quantity_sold,descending)"invokes modules named exact, glom, and static, in that order.
The first module applies its criteria to sort records into various strata. Each stratum contains records that have the same relevance ranking according to the first module’s criteria. The next module sorts the records in each stratum into substrata according to its own criteria; each substratum contains records of the same relevance ranking.
The sorting continues in this fashion until all modules have been applied, or until there are no further ties among records. If any ties remain after all modules have been invoked, the ties are resolved by a default sorting rule.
The field and maxfield modules take into account the priority of records. A record’s priority
corresponds to the position in the search interface’s fields array of the member that the record matches. Records matched to
members closer to the beginning of the fields array
have higher priority than records matched to members closer to the
end. If a record matches more than one member, its priority is based
on the member that is closest to the beginning of the fields array.
The exact, first, nterms, and proximity modules can optionally
take a parameter named considerFieldRanks. The considerFieldRanks parameter indicates that the module should
further sort records according to their priority, after the module
has sorted records according to its own criteria. In the relevance
ranking strategy, the parameter is specified in parentheses after
the module name; for example:
"relRankStrategy": "exact(considerFieldRanks),glom,static(quantity_sold,
descending)"For information about the relevance ranking modules, see Understand each relevance ranking module in detail.
Understand which modules are more useful for commerce applications
The most commonly used modules in commerce applications are as follows:
phrase(all options turned on)glommaxfieldstatic
The following modules are less commonly used:
fieldwfreq
The following modules are not ordinarily useful in commerce applications:
firstinterpntermsproximitystemthesaurus
Oracle recommends against the use of the following modules:
exact, because of its effect on performance. Usephraseinstead.freq, because it adds up the counts of every word, affecting performance.
Configure your Relevance Ranking Strategy
You configure your relevance ranking strategy by exporting the entire search configuration for the cloud application, editing the part of the exported configuration that applies to the relevance ranking strategy, and then re-importing the entire search configuration. To do this, follow these steps:
- Issue the following
GETcommand, which exports the entire search configuration for the cloud application in aZIPfile:GET /gsadmin/v1/cloud/content/system/rankingRules/defaultStandardSearchFor more information about the GET endpoint, see Export all configuration in ZIP format.
-
Back up the ZIP file before opening it or extracting any of its contents.
Important:
When you re-import the edited search configuration, you overwrite all existing search configuration. For this reason, it is important to keep a back-up of the original configuration.
- Unzip the zip file and extract the JSON file containing the search configuration.
- Open the JSON file containing the search configuration and find
the
relRankStrategyattribute of theresultsListobject, in thecontentItemobject named “Guided Search Service”.Note: If the
resultsListobject ortherelRankStrategyattribute is not defined, you must add them in the location shown in this example. - Edit the value of the
relRankStrategyattribute to specify the relevance ranking modules that you want the strategy to comprise. The order in which you specify the modules is significant. For more information, see Understand how relevance ranking modules sort results . - Zip up the entire search configuration, including the edits that you made to the configuration of the relevance ranking strategy.
- Initiate the following POST command, which imports the search
configuration in the ZIP file:
POST /gsadmin/v1/cloud.zip - If your relevance ranking strategy does not produce that results that you intended, make a copy of your backed up search configuration, edit the relevance ranking strategy in the copy to produce the intended results, and import the copy.
Understand each Relevance Ranking Module in Detail
This section contains detailed descriptions of the relevance ranking modules.
exact Module
The exact module groups results into three strata based on how
well they match the query string. This includes the following:
- The highest stratum contains results whose complete text matches the user’s query exactly.
- The middle stratum contains results that contain the user’s query but are not an exact match.
- The lowest stratum contains all other types of matches, such as matches that would not be matches without synonyms.
The exact module can optionally be specified
with the considerFieldRanks parameter, as follows:
exact(considerFieldRanks)
Specifying this
parameter causes the exact module to sort records
according to their priorities in your search interface after it has
sorted them according to its own criteria.
Important:
The exact module is computationally expensive,
especially on large text fields. It is intended for use only on small
text fields (such as dimension values or small property values such
as part IDs). Use of this module in these cases will result in very
poor performance and/or application failures due to request timeouts.
The phrase module, with and without approximation
turned on, does similar but less complex ranking that can be used
as a higher performance substitute.
field Module
The field module ranks records according to their priority in
your search interface. A record’s rank is the priority of the search
interface member that the record matches. If a record matches more
than one member, the record’s rank corresponds to the highest priority
among the matching members.
For example, suppose that:
- Record A matches the third, sixth, and eighth members in the fields array of your search interface.
- Record B matches the first, fourth, and seventh members.
The earliest member that Record B matches is the first member, and the earliest member that Record A matches is the third member. As a result, Record B has the higher priority and appears before Record A in the search results.
first Module
Note:
The first module is not commonly used in commerce
applications.
Designed primarily for use with unstructured data, the first module ranks documents by how close the query terms are to the beginning of the document. The first module groups its results into strata of different sizes. The strata are not the same size, because while the first word is probably more relevant than the tenth word, the 301st is probably not significantly more relevant than the 310th word. This module assumes that the closer a word is to the beginning of a document, the more likely it is to be relevant.
The first module works as follows:
When
the query has a single term, the first module retrieves
the first absolute position of the word in the document, then calculates
which stratum contains that position. The score for this document
is based upon that stratum; earlier strata are better than later strata.
When the query has multiple terms, first determines
the first absolute position for each of the query terms, and then
calculates the median position. This median is treated as the position
of this query in the document and can be used with stratification
as described in the single word case.
With query expansion (using
stemming or the thesaurus), the first module treats
expanded terms as if they occurred in the source query. For example,
the phrase glucose intolerance would be corrected to glucose intloerance
(with intloerance spell-corrected to intolerance). first then continues
as it does in the non-expansion case. The first position of each term
is computed and the median of these is taken.
In a partially
matched query, where only some of the query terms cause a document
to match, first behaves as if the intersection of
terms that occur in the document and terms that occur in the original
query were the entire query. For example, if the query cat bird dog
is partially matched to a document on the terms cat and bird, then
the document is scored as if the query were cat bird. If no terms
match, then the document is scored in the lowest strata.
The first module is supported for wildcard queries.
The first module can optionally be specified with the considerFieldRanks parameter. Specifying this parameter
causes the exact module to sort records according to their priorities
in your search interface after it has sorted them according to its
own criteria.
freq Module
The freq (frequency) module provides result scoring based on
the number of occurrences of the user’s query terms in the result
text.
Results with more occurrences of the user search terms are considered more relevant.
The score produced by the freq module for a result record is the sum of the frequencies
of all user search terms in all fields (properties or dimensions in
the search interface in question) that match a sufficient number of
terms. The number of terms depends on the match mode, such as all
terms in a MatchAll query, a sufficient number of
terms in a MatchPartial query, and so on. Cross-field
match records are assigned a score of zero. Total scores are capped
at 1024; in other words, if the sum of frequencies of the user search
terms in all matching fields is greater than or equal to 1024, the
record gets a score of 1024 from the freq module.
For example, suppose we have the following record:
{Title="test record", Abstract="this is a test", Text="one test this is"}A MatchAll search for “test this” causes freq to assign a score of 4, because this and test occur
a total of 4 times in the fields that match all search terms (Abstract and Text, in this case). The number
of phrase occurrences (just one in the Text field)
does not matter, only the sum of the individual word occurrences.
Also note that the occurrence of test in the Title field does not contribute to the score, since that field did not
match all of the terms.
A MatchAll search for
one record would hit this record, assuming that cross field matching
was enabled. But the record would get a score of zero from Freq, because no single field matches all of the terms. Freq ignores matches due to query expansion (that is, such
matches are given a rank of 0)
glom Module
The glom module ranks single-field matches ahead
of cross-field matches and also ahead of non-matches (records that
do not contain the search term). It serves as a useful tie-breaker
function in combination with the maxfield module
and is commonly used in commerce applications.
If you want a
strategy that ranks single-field matches first, cross-field matches
second, and no matches third, then use the glom module
followed by the nterms module. glom treats all matches the same, whether or not they are due to synonyms
or other forms of query expansion.
The glom module considers a single-field match to be one in which a single
field has enough terms to satisfy the conditions of the match mode.
or this reason, in MatchAny search mode, cross-field
matches are impossible, because a single term is sufficient to create
a match. Every match is considered to be a single-field match, even
if there were several search terms.
For MatchPartial search mode, if the required number of matches is two, the glom module considers a record to be a single-field match
if it has at least one field that contains two or more or the search
terms. You cannot rank results based on how many terms match within
a single field.
interp Module
The interp (interpreted) module assigns a score to each result
record based on the query processing techniques used to obtain the
match. These matching techniques include partial matching, cross-attribute
matching, thesaurus, and stemming matching.
Specifically, the Interpreted module ranks results as follows:
- All non-partial matches are ranked ahead of all partial matches.
- Within the above strata, all single-field matches are ranked ahead of all cross-field matches.
- Within the above strata, all thesaurus matches are ranked below all non-thesaurus matches.
- Within the above strata, all stemming matches are ranked below all non-stemming matches.
Note: Because the interp module comprises
the matching techniques of the spell, glom, stem, and thesaurus modules,
there is no need to add them to your relevance ranking strategy if
you are using interp.
proximity Module
The proximity module ranks how close the query terms
are to each other in a document by counting the number of intervening
words. It is designed primarily for use with unstructured data.
Like the first module, the proximity module groups
its results into variable sized strata, because the difference in
significance of an interval of one word and one of two words is usually
greater than the difference in significance of an interval of 21 words
and 22. If no terms match, the document is placed in the lowest stratum.
Single words and phrases get assigned to the best stratum because
there are no intervening words. When the query has multiple terms, proximity behaves as follows:
- All of the absolute positions for each of the query terms are computed.
- The smallest range that includes at least one instance of each of the query terms is calculated. This range’s length is given in number of words. The score for each document is the stratum that contains the difference of the range’s length and the number of terms in the query; smaller differences are better than larger differences.
Under query expansion (that is, stemming and the thesaurus), the expanded terms are treated as if they were in the query, so the proximity metric is computed using the locations of the expanded terms in the matching document.
For example, if a user searches for
“big cats” and a document contains the sentence, “Big Bird likes his
cat” (stemming takes cats to cat ), then the proximity metric is computed
just as if the sentence were, “Big Bird likes his cats.” The proximity module scores partially matched queries as if
the query contains only the matching terms. For example, if a user
searches for “cat dog fish” and a document is partially matched that
contains only cat and fish, then the document is scored as if the
query “cat fish” had been entered.
Note:
The proximity module does not work with Boolean searches, cross-field matching,
or wildcard search. It assigns all such matches a score of zero.
maxfield Module
This module ranks based on field priority and gives equal weight to cross-field matches.
The maxfield (Maximum
Field) module behaves in the same way as the field module, except
in how it scores cross-field matches. Unlike field, which assigns a static score to cross-field matches, maxfield selects the score of the highest-ranked field that contributed to
the match.
nterms Module
The nterms (number of terms) module assigns rank
based on the number of terms that it finds.
The nterms module ranks matches according to how many query terms they match.
For example, in a three-word query, results that match all three words
will be ranked above results that match only two, which will be ranked
above results that match only one, which will be ranked above results
that had no matches.
numfields Module
The numfields (number of fields) module ranks results
based on the number of fields in the associated search interface in
which a match occurs.
Note that the whole-field is counted rather than cross-field matches. Therefore, a result that matches two fields matches each field completely, while a cross-field match typically does not match any field completely.
numfields treats all matches the same, whether or not they are due to query
expansion. The numfields module is only useful in
conjunction with record search operations.
phrase Module
The phrase module states that results containing the user’s
query as an exact phrase, or a subset of the exact phrase, should
be considered more relevant than matches simply containing the user’s
search terms scattered throughout the text.
Records that have the phrase are ranked higher than records which do not contain the phrase.
The phrase module has a variety of
options that you use to customize its behavior. The phrase options
are as follows:
- Rank based on length of subphrases
- Use approximate subphrase/phrase matching
- Apply spell correction, thesaurus, and stemming
The various options can go in parentheses, including considerFieldRanks.
Ranking based on length of subphrases
When you configure the phrase module, you have the option of enabling subphrasing.
Subphrasing ranks results based on the length of their subphrase matches. In other words, results that match three terms are considered more relevant than results that match two terms, and so on. A subphrase is defined as a contiguous subset of the query terms the user entered, in the order that he or she entered them. For example, the query “fax cover sheets” contains the subphrases “fax”, “cover”, “sheets”, “fax cover”, “cover sheets”, and “fax cover sheets”, but not “fax sheets”.
Content contained inside nested quotes in a phrase is treated as one term. For example, consider the following phrase:
the question is “to be or not to be”
The quoted text (“to be or not to be”) is treated as one query term, so this example consists of four query terms even though it has a total of nine words.
When subphrasing is not enabled, results are ranked into two strata: those that matched the entire phrase and those that did not.
Using approximate matching
Approximate matching provides higher-performance matching, as compared to the standard phrase module, with somewhat less exact results.
With approximate matching enabled, the phrase module looks for phrase matches in a limited number of positions in each result, rather than all the positions. Only this limited number of possible occurrences is considered, regardless of whether there are later occurrences that are better, more relevant matches.
The approximate setting is appropriate in cases where the runtime performance of the standard phrase module is inadequate because of large result contents and/or high site load.
Applying thesaurus and stemming
Applying thesaurus and stemming adjustments to the original phrase is generically known as query expansion.
With query expansion enabled, the phrase module ranks results that match a phrase’s expanded forms in the same stratum as results that match the original phrase. Consider the following example:
- A thesaurus entry exists that expands “US” to “United States”.
- The user queries for “US government”.
The query “US government” is expanded to “United States government” for matching purposes, but the phrase module gives a score of two to any results matching “United States government” because the original, unexpanded version of the query, “US government”, only had two terms.
Summary of phrase option interactions
The three configuration settings for the phrase module can be used in a variety of combinations for different effects. The following table summarizes the behavior of each combination.
| Subphrase | Approximate | Expansion | Description |
|---|---|---|---|
| Off | Off | Off | Default. Ranks results into two strata: those that match the user’s query as a whole phrase, and those that do not. |
| Off | Off | On | Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not. |
| Off | On | Off | Ranks results into two strata: those that match the original query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record. |
| Off | On | On | Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record. |
| On | Off | Off | Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. |
| On | Off | On | Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Extend subphrases to facilitate matching but rank based on the length of the original subphrase (before extension). Note This combination can have a negative performance impact on query throughput. |
| On | On | Off | Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Look only at the first possible phrase match within each record. |
| On | On | On |
Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Expand the query to facilitate matching but rank based on the length of the original subphrase (before extension). Look only at the first possible phrase match within each record. Note: You should only use one phrase module in any given search interface and set all of your options in it. |
Results with multiple matches
If a single result has multiple subphrase matches, either within the same field or in several different fields, the result is slotted into a stratum based on the length of the longest subphrase match.
Stop words and phrase behavior
When using the phrase module, stop words are always treated like non-stop word terms and
stratified accordingly.
For example, the query “raining cats and dogs” will result in a rank of two for a result containing “fat cats and hungry dogs” and a rank of three for a result containing “fat cats and dogs” (this example assumes subphrase is enabled).
Cross-field matches and phrase behavior
An entire phrase, or subphrase, must appear in a single field in order for it to be considered a match. (In other words, matches created by concatenating fields are not considered by the phrase module.)
Treatment of wildcards with the phrase module
The phrase module translates each wildcard in a query into a generic placeholder for a single term.
Note:
Only the asterisk (*) is supported as a wildcard.
For example, the query “sparkling w* wine” becomes “sparkling * wine” during phrase relevance ranking, where “*” indicates a single term. This generic wildcard replacement causes slightly different behavior depending on whether subphrasing is enabled.
When subphrasing is not enabled, all results that match the generic version of the wildcard phrase exactly are still placed into the first stratum. It is important, however, to understand what constitutes a matching result from the phrase module’s point of view.
Consider the search
query “sparkling w* wine” with the MatchAny mode
enabled. In MatchAny mode, search results only need
to contain one of the requested terms to be valid, so a list of search
results for this query could contain phrases that look like this:
sparkling white wine
sparkling
refreshing wine
sparkling wet wine
sparkling soda
wine cooler
When phrase relevance ranking is applied to these search results, the phrase module looks for matches to “sparkling * wine” not “sparkling w* wine.” Therefore, there are three results—”sparkling white wine,” “sparkling refreshing wine,” and “sparkling wet wine”—that are considered phrase matches for the purposes of ranking.
These results are placed in the first stratum. The other two results are placed in the second stratum. When subphrasing is enabled, the behavior becomes a bit more complex. Again, we have to remember that wildcards become generic placeholders and match any single term in a result. This means that any subphrase that is adjacent to a wildcard will, by definition, match at least one additional term (the wildcard). Because of this behavior, subphrases break down differently. The subphrases for “cold sparkling w* wine” break down into the following (note that w* changes to *):
Cold
sparkling
* wine
cold sparkling *
sparkling * wine
cold sparkling
* wine
Notice that the subphrases “sparkling,” “wine,” and “cold sparkling” are not included in this list. Because these subphrases are adjacent to the wildcard, we know that the subphrases will match at least one additional term.
Therefore, these subphrases are subsumed by the “sparkling *”, “* wine”, and “cold sparkling *” subphrases. Like regular subphrase, stratification is based on the number of terms in the subphrase, and the wildcard placeholders are counted toward the length of the subphrase. To continue the example above, results that contain “cold” get a score of one, results that contain “sparkling *” get a score of two, and so on. Again, this is the case even if the matching result phrases are different, for example, “sparkling white” and “sparkling soda.” Finally, it is important to note that, while the wildcard can be replaced by any term, a term must still exist. In other words, search results that contain the phrase “sparkling wine” are not acceptable matches for the phrase “sparkling * wine” because there is no term to substitute for the wildcard. Conversely, the phrase “sparkling cold white wine” is also not a match because each wildcard can be replaced by one, and only one, term. Even when wildcards are present, results must contain the correct number of terms, in the correct order, for them to be considered phrase matches by the phrase module.
static Module
The static module assigns rank based on a configurable sort
key.
The static module assigns a static or
constant data-specific value to each search result, depending on the
type of search operation performed and depending on optional parameters
that can be passed to the module.
For record search operations, the first parameter to the module specifies a property, which will define the sort order assigned by the module. The second parameter can be specified as ascending or descending to indicate the sort order to use for the specified property.
For example, using the module
static(Availability,descending)sorts the result records in descending order with respect to their
assignments from the Availability property. Using
the module
static(Title,ascending)sorts the result records in ascending order by their Title property assignments.
In a catalog application, setting the
static module by Price, descending leads to more
expensive products being displayed first.
For dimension search,
the first parameter can be specified as nbins, depth, or rank:
- Specifying
nbinscauses the static module to sort result dimension values by the number of associated records in the full data set. - Specifying
depthcauses the static module to sort result dimension values by their depth in the dimension hierarchy. - Specifying
rankcauses dimension values to be sorted by the ranks assigned to them for the application.
stem Module
The stem module ranks matches due to stemming below other kinds
of matches.
The stem module assigns a rank
of 0 to matches from stemming, and a rank of 1 from all other sources.
That is, it ignores all other sorts of query expansion.
thesaurus Module
The thesaurus module ranks matches due to thesaurus
entries below other sorts of matches. It a rank of 0 (the lowest possible
priority) to matches from the thesaurus, and a rank of 1 from all
other sources. That is, it ignores all other sorts of query expansion.
weighted frequency Module
Like the freq module, the wfreq (weighted frequency) module scores results based on the frequency
of user query terms in the result.
Additionally, the wfreq module weights the individual query term frequencies
for each result by the information content (overall frequency in the
complete data set) of each query term. Less frequent query terms (that
is, terms that would result in fewer search results) are weighted
more heavily than more frequently occurring terms.
Note:
The wfreq module ignores matches due to query expansion; that
is, it assigned the lowest possible priority to records included in
the search results list because of such matches.