Robots.txt with Categories and Facets

When you use categories and facets on your website there are a few different scenarios you should consider because they impact how you create your robots.txt file. These scenarios are:

With faceted navigation and facets as URL paths, you can have an exceptionally high number of URLs, many of which produce duplicate or similar results. When a visitor, or search engine, selects multiple facets any number of options within each facet generates a new URL. When you multiply the number of facets by the number of options available plus any combination of them, you can see how the number of URLs can increase exponentially.

SuiteCommerce Advanced has some built in provisions for managing URLs for facet navigation. The URL always presents facets in the same order, regardless of the order the facets were applied. For example, two of your facets are color and size. If one visitor selects color and then size and another visitor selects size and then color, the URLs for both are identical.

This facet URL standardization prevents some duplication of content, but you can still end up with similar content on different URLs. Search engines see this as the same content on different pages and your search engine ranking may suffer for it. That is why it is important to review your facets and determine which ones the search engines should crawl and which ones they should not.

These are the rules to follow for facets as URL paths when you create the robots.txt file:

Site Uses Only Categories

When your site uses only categories, then you do not need to worry about preventing indexing of facet pages. However, you must still use the robots.txt file to specify any other pages on your site that you do not want indexed by the search engines.

Site Uses Categories and Facets

Some sites may be set up to use both categories and facets. In this situation, the configuration of your robots.txt file can become more complex depending on the structure of your site, the categorization of your products, and the configuration of your faceted navigation. If all of your item pages are accessible on the category pages, and your facets are configured to use URL paths for navigation, then you can configure the robots.txt file to prevent indexing of all facet pages. If you have some items that are only accessible with faceted navigation, then allow indexing of those facets and disallow indexing of the other facets.

Note:

If your site uses URL parameters for facets, there is no need to disallow facets in the robots.txt file because URL parameters do not impact SEO. The Facets as URL parameters feature is available with the Elbrus release of SuiteCommerce Advanced and later.

All Items Accessible Through Categories

If all items on your site are accessible by browsing categories, then you can block indexing of all facets. Search engines will index all pages on your site except the facet pages.

Here is an example of how to configure the robots.txt file in this scenario.

The site has two categories and two facets.

Categories

Facets

To prevent indexing of facets, the contents of the robots.txt file would be as follows:

          User-agent: *
Disallow: */color/*
Disallow: */size/* 

        

These entries in the robots.txt file let the search engine know to ignore any URL that includes the facets color or size. By using the * wildcards, the pages are not indexed, regardless of the position of the facet in the URL.

Some Items Accessible only Through Facet

Note:

If your site uses URL parameters for facets, there is no need to disallow facets in the robots.txt file because URL parameters do not impact SEO. The Facets as URL parameters feature is available with the Elbrus release of SuiteCommerce Advanced and later.

If you have some items on your site that are accessible only on a facet page and your site uses URL paths for facets, then you must identify those items and configure your robots.txt file to disallow some facets while allowing others. The first thing you must do is to identify the facets that should be indexed and allow them in the robots.txt file.

This example uses the same categories and facets as the previous example:

Categories

Facets

The possible URL combinations for items with one facet filter include:

You have the same combination for the women category.

For pages with multiple facet filters, the URLs include:

You have the same combination for the women category.

All of the items will be visible to search engines, if they are allowed to index the categories and the facet title. In this case the URLs example.com/category/[men or women]/color/ and example.com/category/[men or women]/size/ are the ones that should be indexed because those are the facets that show all items. For example /category/men/color/ or /category/men/size/. The search engine should not index any URL with multiple facets or any URL with a facet value. To achieve this indexing, you create the robots.txt as follows:

          User-agent: *
Disallow: /*/*/*/* 

        

The search engine interprets this disallow statement as follows:

1st * = category <- allowed

2nd * = men and women <- allowed

3rd * = facet title <- allowed

4th * = facet values <- disallowed and is not indexed

This lets the search engine know to ignore everything after the fourth slash, which, as you can see from the sample URLs are the facet values. By allowing the search engine to crawl the facet titles all products are shown by default with pagination.

Site Uses Facets as URL Paths

For sites that use only faceted search and no categories, you must determine the best way to block all facets that do not change the page content. For example, you may need to disallow the following facets:

Site Uses Facets as URL Parameters

If your site uses URL parameters for facets, you do not need to disallow facets in the robots.txt file because URL parameters do not impact SEO. For more information, see Facets as Parameters.

Related Topics

SEO and Item Reviews

General Notices