Configure the ranking of records in search results

You can determine the order in which records appear in search results by configuring a relevance ranking strategy.

The relevance ranking strategy is an ordered list of one or more relevance ranking modules, each of which uses different criteria to sort the records in the search results.

Note: Oracle Commerce includes a basic default relevancy ranking strategy. Modify this default to order search results in a way that is helpful to your shoppers. Successful strategies typically include the modules Phrase, MaxField, Glom, Static, and WFreq.

This section covers the following topics:

Understand how relevance ranking modules sort results

Relevance ranking modules are applied in the order in which they are listed in the relevance ranking strategy. For example, the following JSON format configuration of a relevance ranking strategy,

"relRankStrategy": "exact(considerFieldRanks),glom,static(quantity_sold,descending)"

invokes modules named exact, glom, and static, in that order.

The first module applies its criteria to sort records into various strata. Each stratum contains records that have the same relevance ranking according to the first module’s criteria. The next module sorts the records in each stratum into substrata according to its own criteria; each substratum contains records of the same relevance ranking.

The sorting continues in this fashion until all modules have been applied, or until there are no further ties among records. If any ties remain after all modules have been invoked, the ties are resolved by a default sorting rule.

The field and maxfield modules take into account the priority of records. A record’s priority corresponds to the position in the search interface’s fields array of the member that the record matches. Records matched to members closer to the beginning of the fields array have higher priority than records matched to members closer to the end. If a record matches more than one member, its priority is based on the member that is closest to the beginning of the fields array.

The exact, first, nterms, and proximity modules can optionally take a parameter named considerFieldRanks. The considerFieldRanks parameter indicates that the module should further sort records according to their priority, after the module has sorted records according to its own criteria. In the relevance ranking strategy, the parameter is specified in parentheses after the module name; for example:

"relRankStrategy": "exact(considerFieldRanks),glom,static(quantity_sold,
   descending)"

For information about the relevance ranking modules, see Understand each relevance ranking module in detail.

Understand which modules are more useful for commerce applications

The most commonly used modules in commerce applications are as follows:

  • phrase (all options turned on)
  • glom
  • maxfield
  • static

The following modules are less commonly used:

  • field
  • wfreq

The following modules are not ordinarily useful in commerce applications:

  • first
  • interp
  • nterms
  • proximity
  • stem
  • thesaurus

Oracle recommends against the use of the following modules:

  • exact, because of its effect on performance. Use phrase instead.
  • freq, because it adds up the counts of every word, affecting performance.

Configure your relevance ranking strategy

You configure your relevance ranking strategy by exporting the entire search configuration for the cloud application, editing the part of the exported configuration that applies to the relevance ranking strategy, and then re-importing the entire search configuration. To do this, follow these steps:

  1. Issue the following GET command, which exports the entire search configuration for the cloud application in a ZIP file:
    GET /gsadmin/v1/cloud/content/system/rankingRules/defaultStandardSearch

    For more information about the GET endpoint, see Export all configuration in ZIP format.

  2. Back up the ZIP file before opening it or extracting any of its contents.

    IMPORTANT: When you re-import the edited search configuration, you overwrite all existing search configuration. For this reason, it is important to keep a back-up of the original configuration.

  3. Unzip the zip file and extract the JSON file containing the search configuration.
  4. Open the JSON file containing the search configuration and find the relRankStrategy attribute of the resultsList object, in the contentItem object named “Guided Search Service”.

    Note: If the resultsList object or therelRankStrategy attribute is not defined, you must add them in the location shown in this example.

  5. Edit the value of the relRankStrategy attribute to specify the relevance ranking modules that you want the strategy to comprise. The order in which you specify the modules is significant. For more information, see Understand how relevance ranking modules sort results .
  6. Zip up the entire search configuration, including the edits that you made to the configuration of the relevance ranking strategy.
  7. Initiate the following POST command, which imports the search configuration in the ZIP file:
    POST /gsadmin/v1/cloud.zip
  8. If your relevance ranking strategy does not produce that results that you intended, make a copy of your backed up search configuration, edit the relevance ranking strategy in the copy to produce the intended results, and import the copy.

Understand each relevance ranking module in detail

This section contains detailed descriptions of the relevance ranking modules.

exact module

The exact module groups results into three strata based on how well they match the query string. This includes the following:

  • The highest stratum contains results whose complete text matches the user’s query exactly.
  • The middle stratum contains results that contain the user’s query but are not an exact match.
  • The lowest stratum contains all other types of matches, such as matches that would not be matches without synonyms.

The exact module can optionally be specified with the considerFieldRanks parameter, as follows:

exact(considerFieldRanks)

Specifying this parameter causes the exact module to sort records according to their priorities in your search interface after it has sorted them according to its own criteria.

Important: The exact module is computationally expensive, especially on large text fields. It is intended for use only on small text fields (such as dimension values or small property values such as part IDs). Use of this module in these cases will result in very poor performance and/or application failures due to request timeouts. The phrase module, with and without approximation turned on, does similar but less complex ranking that can be used as a higher performance substitute.

field module

The field module ranks records according to their priority in your search interface. A record’s rank is the priority of the search interface member that the record matches. If a record matches more than one member, the record’s rank corresponds to the highest priority among the matching members.

For example, suppose that:

  • Record A matches the third, sixth, and eighth members in the fields array of your search interface.
  • Record B matches the first, fourth, and seventh members.

The earliest member that Record B matches is the first member, and the earliest member that Record A matches is the third member. As a result, Record B has the higher priority and appears before Record A in the search results.

first module

Note: The first module is not commonly used in commerce applications.

Designed primarily for use with unstructured data, the first module ranks documents by how close the query terms are to the beginning of the document. The first module groups its results into strata of different sizes. The strata are not the same size, because while the first word is probably more relevant than the tenth word, the 301st is probably not significantly more relevant than the 310th word. This module assumes that the closer a word is to the beginning of a document, the more likely it is to be relevant.

The first module works as follows:

When the query has a single term, the first module retrieves the first absolute position of the word in the document, then calculates which stratum contains that position. The score for this document is based upon that stratum; earlier strata are better than later strata.

When the query has multiple terms, first determines the first absolute position for each of the query terms, and then calculates the median position. This median is treated as the position of this query in the document and can be used with stratification as described in the single word case.

With query expansion (using stemming or the thesaurus), the first module treats expanded terms as if they occurred in the source query. For example, the phrase glucose intolerance would be corrected to glucose intloerance (with intloerance spell-corrected to intolerance). first then continues as it does in the non-expansion case. The first position of each term is computed and the median of these is taken.

In a partially matched query, where only some of the query terms cause a document to match, first behaves as if the intersection of terms that occur in the document and terms that occur in the original query were the entire query. For example, if the query cat bird dog is partially matched to a document on the terms cat and bird, then the document is scored as if the query were cat bird. If no terms match, then the document is scored in the lowest strata.

The first module is supported for wildcard queries.

The first module can optionally be specified with the considerFieldRanks parameter. Specifying this parameter causes the exact module to sort records according to their priorities in your search interface after it has sorted them according to its own criteria.

freq module

The freq (frequency) module provides result scoring based on the number of occurrences of the user’s query terms in the result text.

Results with more occurrences of the user search terms are considered more relevant.

The score produced by the freq module for a result record is the sum of the frequencies of all user search terms in all fields (properties or dimensions in the search interface in question) that match a sufficient number of terms. The number of terms depends on the match mode, such as all terms in a MatchAll query, a sufficient number of terms in a MatchPartial query, and so on. Cross-field match records are assigned a score of zero. Total scores are capped at 1024; in other words, if the sum of frequencies of the user search terms in all matching fields is greater than or equal to 1024, the record gets a score of 1024 from the freq module.

For example, suppose we have the following record:

{Title="test record", Abstract="this is a test", Text="one test this is"}

A MatchAll search for “test this” causes freq to assign a score of 4, because this and test occur a total of 4 times in the fields that match all search terms (Abstract and Text, in this case). The number of phrase occurrences (just one in the Text field) does not matter, only the sum of the individual word occurrences. Also note that the occurrence of test in the Title field does not contribute to the score, since that field did not match all of the terms.

A MatchAll search for one record would hit this record, assuming that cross field matching was enabled. But the record would get a score of zero from Freq, because no single field matches all of the terms. Freq ignores matches due to query expansion (that is, such matches are given a rank of 0)

glom module

The glom module ranks single-field matches ahead of cross-field matches and also ahead of non-matches (records that do not contain the search term). It serves as a useful tie-breaker function in combination with the maxfield module and is commonly used in commerce applications.

If you want a strategy that ranks single-field matches first, cross-field matches second, and no matches third, then use the glom module followed by the nterms module. glom treats all matches the same, whether or not they are due to synonyms or other forms of query expansion.

The glom module considers a single-field match to be one in which a single field has enough terms to satisfy the conditions of the match mode. or this reason, in MatchAny search mode, cross-field matches are impossible, because a single term is sufficient to create a match. Every match is considered to be a single-field match, even if there were several search terms.

For MatchPartial search mode, if the required number of matches is two, the glom module considers a record to be a single-field match if it has at least one field that contains two or more or the search terms. You cannot rank results based on how many terms match within a single field.

interp module

The interp (interpreted) module assigns a score to each result record based on the query processing techniques used to obtain the match. These matching techniques include partial matching, cross-attribute matching, thesaurus, and stemming matching.

Specifically, the Interpreted module ranks results as follows:

  1. All non-partial matches are ranked ahead of all partial matches.
  2. Within the above strata, all single-field matches are ranked ahead of all cross-field matches.
  3. Within the above strata, all thesaurus matches are ranked below all non-thesaurus matches.
  4. Within the above strata, all stemming matches are ranked below all non-stemming matches.

Note: Because the interp module comprises the matching techniques of the spell, glom, stem, and thesaurus modules, there is no need to add them to your relevance ranking strategy if you are using interp.

proximity module

The proximity module ranks how close the query terms are to each other in a document by counting the number of intervening words. It is designed primarily for use with unstructured data.

Like the first module, the proximity module groups its results into variable sized strata, because the difference in significance of an interval of one word and one of two words is usually greater than the difference in significance of an interval of 21 words and 22. If no terms match, the document is placed in the lowest stratum.

Single words and phrases get assigned to the best stratum because there are no intervening words. When the query has multiple terms, proximity behaves as follows:

  1. All of the absolute positions for each of the query terms are computed.
  2. The smallest range that includes at least one instance of each of the query terms is calculated. This range’s length is given in number of words. The score for each document is the stratum that contains the difference of the range’s length and the number of terms in the query; smaller differences are better than larger differences.

Under query expansion (that is, stemming and the thesaurus), the expanded terms are treated as if they were in the query, so the proximity metric is computed using the locations of the expanded terms in the matching document.

For example, if a user searches for “big cats” and a document contains the sentence, “Big Bird likes his cat” (stemming takes cats to cat ), then the proximity metric is computed just as if the sentence were, “Big Bird likes his cats.” The proximity module scores partially matched queries as if the query contains only the matching terms. For example, if a user searches for “cat dog fish” and a document is partially matched that contains only cat and fish, then the document is scored as if the query “cat fish” had been entered.

Note: The proximity module does not work with Boolean searches, cross-field matching, or wildcard search. It assigns all such matches a score of zero.

maxfield module

This module ranks based on field priority and gives equal weight to cross-field matches.

The maxfield (Maximum Field) module behaves in the same way as the field module, except in how it scores cross-field matches. Unlike field, which assigns a static score to cross-field matches, maxfield selects the score of the highest-ranked field that contributed to the match.

nterms module

The nterms (number of terms) module assigns rank based on the number of terms that it finds.

The nterms module ranks matches according to how many query terms they match. For example, in a three-word query, results that match all three words will be ranked above results that match only two, which will be ranked above results that match only one, which will be ranked above results that had no matches.

numfields module

The numfields (number of fields) module ranks results based on the number of fields in the associated search interface in which a match occurs.

Note that the whole-field is counted rather than cross-field matches. Therefore, a result that matches two fields matches each field completely, while a cross-field match typically does not match any field completely.

numfields treats all matches the same, whether or not they are due to query expansion. The numfields module is only useful in conjunction with record search operations.

phrase module

The phrase module states that results containing the user’s query as an exact phrase, or a subset of the exact phrase, should be considered more relevant than matches simply containing the user’s search terms scattered throughout the text.

Records that have the phrase are ranked higher than records which do not contain the phrase.

The phrase module has a variety of options that you use to customize its behavior. The phrase options are as follows:

  • Rank based on length of subphrases
  • Use approximate subphrase/phrase matching
  • Apply spell correction, thesaurus, and stemming

The various options can go in parentheses, including considerFieldRanks.

Ranking based on length of subphrases

When you configure the phrase module, you have the option of enabling subphrasing.

Subphrasing ranks results based on the length of their subphrase matches. In other words, results that match three terms are considered more relevant than results that match two terms, and so on. A subphrase is defined as a contiguous subset of the query terms the user entered, in the order that he or she entered them. For example, the query “fax cover sheets” contains the subphrases “fax”, “cover”, “sheets”, “fax cover”, “cover sheets”, and “fax cover sheets”, but not “fax sheets”.

Content contained inside nested quotes in a phrase is treated as one term. For example, consider the following phrase:

the question is “to be or not to be”

The quoted text (“to be or not to be”) is treated as one query term, so this example consists of four query terms even though it has a total of nine words.

When subphrasing is not enabled, results are ranked into two strata: those that matched the entire phrase and those that did not.

Using approximate matching

Approximate matching provides higher-performance matching, as compared to the standard phrase module, with somewhat less exact results.

With approximate matching enabled, the phrase module looks for phrase matches in a limited number of positions in each result, rather than all the positions. Only this limited number of possible occurrences is considered, regardless of whether there are later occurrences that are better, more relevant matches.

The approximate setting is appropriate in cases where the runtime performance of the standard phrase module is inadequate because of large result contents and/or high site load.

Applying thesaurus and stemming

Applying thesaurus and stemming adjustments to the original phrase is generically known as query expansion.

With query expansion enabled, the phrase module ranks results that match a phrase’s expanded forms in the same stratum as results that match the original phrase. Consider the following example:

  • A thesaurus entry exists that expands “US” to “United States”.
  • The user queries for “US government”.

The query “US government” is expanded to “United States government” for matching purposes, but the phrase module gives a score of two to any results matching “United States government” because the original, unexpanded version of the query, “US government”, only had two terms.

Summary of phrase option interactions

The three configuration settings for the phrase module can be used in a variety of combinations for different effects. The following table summarizes the behavior of each combination.

Subphrase Approximate Expansion Description
Off Off Off Default. Ranks results into two strata: those that match the user’s query as a whole phrase, and those that do not.
Off Off On Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not.
Off On Off Ranks results into two strata: those that match the original query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record.
Off On On Ranks results into two strata: those that match the original, or an extended version, of the query as a whole phrase, and those that do not. Look only at the first possible phrase match within each record.
On Off Off Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase.
On Off On Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Extend subphrases to facilitate matching but rank based on the length of the original subphrase (before extension). Note This combination can have a negative performance impact on query throughput.
On On Off Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Look only at the first possible phrase match within each record.
On On On

Ranks results into N strata where N equals the length of the query and each result’s score equals the length of its matched subphrase. Expand the query to facilitate matching but rank based on the length of the original subphrase (before extension). Look only at the first possible phrase match within each record.

Note: You should only use one phrase module in any given search interface and set all of your options in it.

Results with multiple matches

If a single result has multiple subphrase matches, either within the same field or in several different fields, the result is slotted into a stratum based on the length of the longest subphrase match.

Stop words and phrase behavior

When using the phrase module, stop words are always treated like non-stop word terms and stratified accordingly.

For example, the query “raining cats and dogs” will result in a rank of two for a result containing “fat cats and hungry dogs” and a rank of three for a result containing “fat cats and dogs” (this example assumes subphrase is enabled).

Cross-field matches and phrase behavior

An entire phrase, or subphrase, must appear in a single field in order for it to be considered a match. (In other words, matches created by concatenating fields are not considered by the phrase module.)

Treatment of wildcards with the phrase module

The phrase module translates each wildcard in a query into a generic placeholder for a single term.

Note: Only the asterisk (*) is supported as a wildcard.

For example, the query “sparkling w* wine” becomes “sparkling * wine” during phrase relevance ranking, where “*” indicates a single term. This generic wildcard replacement causes slightly different behavior depending on whether subphrasing is enabled.

When subphrasing is not enabled, all results that match the generic version of the wildcard phrase exactly are still placed into the first stratum. It is important, however, to understand what constitutes a matching result from the phrase module’s point of view.

Consider the search query “sparkling w* wine” with the MatchAny mode enabled. In MatchAny mode, search results only need to contain one of the requested terms to be valid, so a list of search results for this query could contain phrases that look like this:

sparkling white wine

sparkling refreshing wine

sparkling wet wine

sparkling soda

wine cooler

When phrase relevance ranking is applied to these search results, the phrase module looks for matches to “sparkling * wine” not “sparkling w* wine.” Therefore, there are three results—”sparkling white wine,” “sparkling refreshing wine,” and “sparkling wet wine”—that are considered phrase matches for the purposes of ranking.

These results are placed in the first stratum. The other two results are placed in the second stratum. When subphrasing is enabled, the behavior becomes a bit more complex. Again, we have to remember that wildcards become generic placeholders and match any single term in a result. This means that any subphrase that is adjacent to a wildcard will, by definition, match at least one additional term (the wildcard). Because of this behavior, subphrases break down differently. The subphrases for “cold sparkling w* wine” break down into the following (note that w* changes to *):

Cold

sparkling

* wine

cold sparkling *

sparkling * wine

cold sparkling * wine

Notice that the subphrases “sparkling,” “wine,” and “cold sparkling” are not included in this list. Because these subphrases are adjacent to the wildcard, we know that the subphrases will match at least one additional term.

Therefore, these subphrases are subsumed by the “sparkling *”, “* wine”, and “cold sparkling *” subphrases. Like regular subphrase, stratification is based on the number of terms in the subphrase, and the wildcard placeholders are counted toward the length of the subphrase. To continue the example above, results that contain “cold” get a score of one, results that contain “sparkling *” get a score of two, and so on. Again, this is the case even if the matching result phrases are different, for example, “sparkling white” and “sparkling soda.” Finally, it is important to note that, while the wildcard can be replaced by any term, a term must still exist. In other words, search results that contain the phrase “sparkling wine” are not acceptable matches for the phrase “sparkling * wine” because there is no term to substitute for the wildcard. Conversely, the phrase “sparkling cold white wine” is also not a match because each wildcard can be replaced by one, and only one, term. Even when wildcards are present, results must contain the correct number of terms, in the correct order, for them to be considered phrase matches by the phrase module.

static module

The static module assigns rank based on a configurable sort key.

The static module assigns a static or constant data-specific value to each search result, depending on the type of search operation performed and depending on optional parameters that can be passed to the module.

For record search operations, the first parameter to the module specifies a property, which will define the sort order assigned by the module. The second parameter can be specified as ascending or descending to indicate the sort order to use for the specified property.

For example, using the module

static(Availability,descending)

sorts the result records in descending order with respect to their assignments from the Availability property. Using the module

static(Title,ascending)

sorts the result records in ascending order by their Title property assignments.

In a catalog application, setting the static module by Price, descending leads to more expensive products being displayed first.

For dimension search, the first parameter can be specified as nbins, depth, or rank:

  • Specifying nbins causes the static module to sort result dimension values by the number of associated records in the full data set.
  • Specifying depth causes the static module to sort result dimension values by their depth in the dimension hierarchy.
  • Specifying rank causes dimension values to be sorted by the ranks assigned to them for the application.

stem module

The stem module ranks matches due to stemming below other kinds of matches.

The stem module assigns a rank of 0 to matches from stemming, and a rank of 1 from all other sources. That is, it ignores all other sorts of query expansion.

thesaurus module

The thesaurus module ranks matches due to thesaurus entries below other sorts of matches. It a rank of 0 (the lowest possible priority) to matches from the thesaurus, and a rank of 1 from all other sources. That is, it ignores all other sorts of query expansion.

weighted frequency module

Like the freq module, the wfreq (weighted frequency) module scores results based on the frequency of user query terms in the result.

Additionally, the wfreq module weights the individual query term frequencies for each result by the information content (overall frequency in the complete data set) of each query term. Less frequent query terms (that is, terms that would result in fewer search results) are weighted more heavily than more frequently occurring terms.

Note: The wfreq module ignores matches due to query expansion; that is, it assigned the lowest possible priority to records included in the search results list because of such matches.