Order facet values by statistical significance

To make it easier for shoppers to find the products that best meet their requirements, you can sort facet values (also known as dimension values) according to their statistical significance.

When facet values are sorted by statistical significance, shoppers are first presented with the facet values that are most relevant to their current navigation state.

Highlight relevant facet values

Sorting facet values by statistical significance is a useful technique when you want to highlight facet values that are relevant to the shopper’s current search rather than facet values that are generally popular. Sorting facet values by their statistical significance is especially useful when a facet has a large number of facet values. In such a case, the shopper is aided by having the facet values that are most relevant to the current navigation state presented first.

Sorting facet values by frequency, on the other hand, can be useful when the number of facet values is smaller -- for example, small enough to be displayed in a single facet values list.

The value of sorting by statistical significance is illustrated by the following use cases.

In a catalog with a feature facet containing 2,000 possible values, you can do the following:

  • Highlight the features that are most relevant to a search for “waterproof camera”, such as “waterproof”, “shockproof”, “dustproof”, or “GPS-enabled”.
  • Highlight the features that are most relevant to a search for “compact camera”, such as “built-in flash”, “autofocus”, “portrait mode”, or “landscape mode”.

In a catalog with a facet containing 20,000 possible values, you can do the following:

  • Highlight the product tags that are most relevant to a search for “landscape photograph”, such as “mountain”, “forest”, “ocean”.
  • Highlight the product tags that are most relevant to a search for “Venice canvas print”, such as “canal”, “boat”, or “reflection”.

In a catalog with a brand dimension containing 200 possible values, you can do the following:

  • Highlight the brands that specialize in digital SLR cameras when a shopper searches for “dslr camera”.
  • Highlight the brands that specialize in sports cameras when a shopper searches for “waterproof camera”.

Calculate statistical significance

A facet value's statistical significance is based on the difference between the background frequency of the facet value from its foreground frequency. Frequencies are defined as follows:

  • The background frequency is the number of records in the entire catalog that match the facet value.
  • The foreground frequency is the number of records in the current search that match the facet value.

A facet value's statistical significance is calculated only after all specified record filters and security filters have been applied to the set of records in the catalog. Because different sets of filters can be applied to a catalog on different sites, the statistical significance of a facet value can vary from site to site.

A facet value is considered statistically significant if its foreground frequency is higher than its background frequency. The statistical significance of a facet value increases as the relative difference between foreground and background frequencies of that facet value increases.

Note that the number of records tagged to each facet value is ignored when facet values are sorted by statistical significance. For example, three facet values might be sorted by statistical significance as follows:

naugahyde (10)
polyester (7)
leather (23)

For example, suppose that a brand facet has facet values named Rugged Cameras and Acme Camera Corporation, and that a shopper is shopping for “waterproof cameras”. The catalog, which contains a total of 100,000 records, includes the following:

  • 100 records for waterproof cameras manufactured by Acme Camera Corporation, out of a total of 2,000 records for cameras of any type manufactured by Acme Camera Corporation.
  • 20 records for waterproof cameras manufactured by Rugged Cameras. There are no other records in the catalog for cameras manufactured by Rugged Cameras, which makes only waterproof cameras

Selecting the facet value ‘Acme Camera Corporation’ would produce 100 matching records.

Selecting the facet value ‘Rugged Cameras’ would produce only 20 matching records. Nevertheless, you might want to list Rugged Cameras in the facet values list before Acme Camera Corporation, because it has a higher statistical significance.

Rugged Cameras has a higher statistical significance because its foreground frequency (20 matches out of 20 records in the search) is so much greater than its background frequency (20 matches out of 100,000 records in the catalog). Acme records shows a much lesser increase in foreground frequency over background frequency.

In most cases, a higher statistical significance of a facet value is a sign of a quality that has relevance or value for the shopper’s search. In the example above, the waterproof cameras manufactured by Rugged Cameras, which have a greater statistical significance, are preferable to those manufactured by Acme Camera Corporation because waterproof cameras are the specialty of Rugged Cameras, while Acme Camera Corporation is a generic manufacturer that does not specialize in one type of camera.

As with all configuration of facet sorting, sorting by statistical significance must be configured on a per-dimension basis. There is no mechanism to configure a default sort behavior across all dimensions.

Configure sorting by statistical significance.

To configure sorting by statistical significance, use the REST API for setting attributes. For more information, see Sample search and navigation REST API.

To import configuration for sorting facet values by statistical significance, execute an endpoint similar to the following:

http://host:port/gsadmin/v1/cloud/attributes/system/facet_name

where facet_name is the name of the facet whose facet values are to be sorted.

The body of the request must be a dimension object definition that includes a displayConfig attribute; for example:

 {
    "ecr:type": "dimension",
    "mergeAction": "UPDATE",
    "displayConfig": {
      "sort": "sig,desc"
    }
}

where sig,desc, specifies that facet values are sorted in descending order of their statistical significance. Because descending order is the default for sorting by statistical significance, you can specify sig instead of sig,desc.

Get statistical significance values for debugging

When you sort facet values by statistical significance, the statistical significance of each facet value is assigned to its DGraph.Significance property. The value of the DGraph.Significance property can be useful for debugging.

The following example illustrates the Dgraph.Significance property of the Rugged Sports Camera and Acme Camera Corporation facet values:

{
    "@type": "Facet valueMenu",
    "displayName": "Brand",
    "dimensionName": "product.brand",
    "multiSelect": true,
    "refinements": [
      {
        "label": "Rugged Sports Cameras",
        "link": "?Ntt=waterproof+cameras",
        "count": 20,
        "properties": {
        "DGraph.Significance": "99.9",
        "DGraph.Spec": "Rugged Sports Cameras"
        }
     },
     {
        "label": "Acme Camera Corporation",
        "link": "?Ntt=waterproof+cameras",
        "count": 100,
        "properties": {
           "DGraph.Significance": "12.0",
           "DGraph.Spec": "Acme Camera Corporation"
         }
      }
    ]
}