Robots.txt with Categories and Facets
When you use categories and facets on your website, there are a few scenarios you should consider since they affect how you create your robots.txt
file. These scenarios are:
With faceted navigation and facets as URL paths, you can end up with a huge number of URLs—many with duplicate or similar results. When a visitor or search engine selects multiple facets, every option in each facet can generate a new URL. Multiply the number of facets by the number of options, plus any combinations, and you'll see how the number of URLs can explode.
SuiteCommerce Advanced has some built-in ways to manage URLs for facet navigation. The URL always shows facets in the same order, no matter how they were applied. For example, two of your facets are color and size. If one visitor selects color and then size and another visitor selects size and then color, the URLs for both are identical.
This standardization prevents some duplicate content, but you can still get similar content on different URLs. Search engines see this as the same content on different pages, and your ranking can suffer because of it. That's why it's important to review your facets and decide which ones search engines should crawl and which they shouldn't.
These are the rules to follow for facets as URL paths when you create the robots.txt
file:
-
If the applied facet changes the content of the search result page, let search engines crawl it.
-
If the applied facet does not change the content of the search result page, use the
robots.txt
file to disallow indexing of that page.
Site Uses Only Categories
When your site uses only categories, you don't need to worry about stopping indexing of facet pages. But you still need to use the robots.txt
file to list any other pages you don't want search engines to index.
Site Uses Categories and Facets
Some sites may be set up to use both categories and facets. In this case, setting up your robots.txt
file can get more complex depending on your site's structure, how your products are categorized, and your faceted navigation settings. If all your item pages are accessible from the category pages and your facets use URL paths for navigation, you can set up your robots.txt
file to prevent indexing all facet pages. If you've got some items that are only accessible with faceted navigation, allow indexing for those facets and disallow it for the others.
If your site uses URL parameters for facets, you don't need to disallow facets in the robots.txt
file because URL parameters don't affect SEO. The Facets as URL parameters feature is available with the Elbrus release of SuiteCommerce Advanced and later.
All Items Accessible Through Categories
If all items on your site can be found by browsing categories, you can block indexing for all facets. Search engines will index all your pages except the facet pages.
Here's an example of how to set up the robots.txt
file for this scenario.
The site has two categories and two facets.
Categories
-
Men
-
Women
Facets
-
Color = red, green, blue
-
Size = L, M, S
To prevent indexing of facets, your robots.txt
file would look like this:
User-agent: *
Disallow: */color/*
Disallow: */size/*
These entries in the robots.txt
file tell search engines to ignore any URL with the facets color or size. By using * wildcards, these pages won't get indexed no matter where the facet appears in the URL.
Some Items Accessible only Through Facet
If your site uses URL parameters for facets, there is no need to disallow facets in the robots.txt
file because URL parameters do not impact SEO. The Facets as URL parameters feature is available with the Elbrus release of SuiteCommerce Advanced and later.
If you've got some items that are only accessible on a facet page and your site uses URL paths for facets, you'll need to identify those items and update your robots.txt
file to disallow some facets while allowing others. First, you need to figure out which facets should be indexed and make sure they're allowed in the robots.txt
file.
This example uses the same categories and facets as the previous example:
Categories
-
Men
-
Women
Facets
-
Color = red, green, blue
-
Size = L, M, S
Here are the possible URL combinations for items with one facet filter:
-
example.com/category/men/color/
-
example.com/category/men/color/red
-
example.com/category/men/color/green
-
example.com/category/men/color/blue
-
example.com/category/men/size/
-
example.com/category/men/size/l
-
example.com/category/men/size/m
-
example.com/category/men/size/s
You’ve got the same combinations for the women category.
For pages with multiple facet filters, the URLs look like this:
-
example.com/category/men/color/blue/size/l
-
example.com/category/men/color/blue/size/m
-
example.com/category/men/color/blue/size/s
-
example.com/category/men/color/red/size/l
-
example.com/category/men/color/red/size/m
-
example.com/category/men/color/red/size/s
-
example.com/category/men/color/green/size/l
-
example.com/category/men/color/green/size/m
-
example.com/category/men/color/green/size/s
You’ve got the same combinations for the women category.
All the items will be visible to search engines if they're allowed to index the categories and facet title. In this case, the URLs example.com/category/[men or women]/color/ and example.com/category/[men or women]/size/ are the ones that should be indexed, since they're the facets that show all items. For example /category/men/color/ or /category/men/size/. Search engines shouldn't index any URLs with multiple facets or with a facet value. To set up this kind of indexing, you'd create the robots.txt
file like this:
User-agent: *
Disallow: /*/*/*/*
The search engine interprets this disallow statement as follows:
1st * = category <- allowed
2nd * = men and women <- allowed
3rd * = facet title <- allowed
4th * = facet values <- disallowed and is not indexed
This tells the search engine to ignore everything after the fourth slash, which, as you can see from the sample URLs, are the facet values. By letting the search engine crawl the facet titles, all products are shown by default with pagination.
Site Uses Facets as URL Paths
For sites that use only faceted search and no categories, you’ll need to figure out the best way to block all facets that don’t change the page content. For example, you may need to disallow the following facets:
-
Price
-
Marketing Facets
-
Seasonal Facets
Site Uses Facets as URL Parameters
If your site uses URL parameters for facets, you don’t need to disallow facets in the robots.txt
file because URL parameters don't affect SEO. For more information, see Facets as Parameters.