Edit the robots.txt file

The robots.txt file controls how web robots access and index your store’s pages.

This section applies to both OSF and Storefront Classic. This section applies to Open Storefront Framework (OSF) and Storefront Classic.

For more information about robots.txt and the Robots Exclusion Protocol, visit www.robotstxt.org.

If you run multiple sites within a single instance of Commerce, each site has its own robots.txt file. See Configure Sites to learn how to create multiple sites.

To view your store’s current robots.txt file:

  1. Enter the following URL into your browser:

    https://[store url]/robots.txt

    where [store url] is the base URL for your store.

  2. Commerce displays the contents of the current robots.txt file.

If you run multiple sites and keep language versions in directories, for example, example.com/de/, example.com/es/, you do not need to create a separate robot.txt file for each version. This is only the case for multiple sites over sub domains.

The Commerce robots.txt file below shows the updated contents after these recommendations have been made:

User-agent: *
Disallow: /cart
Disallow: /en/cart
Disallow: /checkout
Disallow: /en/checkout
Disallow: /profile
Disallow: /en/profile
Disallow: /searchresults
Disallow: /en/searchresults
Disallow: /confirmation
Disallow: /en/confirmation
Disallow: /wishlist_settings
Disallow: /en/wishlist_settings
Disallow: /wishlist
Disallow: /en/wishlist

Sitemap: http://[store url]:8080/sitemap.xml

User-agent: * means that the exclusion rules should apply to all robots. You can replace the * (asterisk) with the name of a specific robot to exclude, for example, Googlebot, or Bingbot.

Each Disallow: /[page] entry indicates a page that robots should not visit. You should not remove any of the Disallow: entries from the default robots.txt file, though you might want to include additional pages that you want robots to ignore. If you are testing your store and do not want any robots to crawl any pages, you might want your robots.txt file to look like this:

User-agent: * Disallow: /

​If you plan to use your staging site as your production site when development and testing is complete, you will need to change the content in the robots.txt file to the custom settings presented above. If you tested on a separate staging domain, Commerce inputs a valid default robots.txt for you for your production storefront when you go live.

You cannot edit the robots.txt file in the administration UI. You must edit it with the Commerce Admin REST API. See Use the REST APIs for information about the REST APIs.

To update the robots.txt file, issue a PUT request to /ccadmin/v1/merchant/robots. The body of the request must include the entire contents of the file, in text/plain format.

When you update the robots.txt file, it will not be overwritten until the next PUT request is sent to /ccadmin/v1/merchant/robots.

If you run multiple sites within a single instance of Commerce, you must specify the site whose robots.txt file you are updating in the x-ccsite header in the PUT request. If you do not specify a site, the request updates the default site’s robots.txt file.

The following example shows a PUT request that adds your error page to the list of pages for robots to ignore.

PUT /ccadmin/v1/merchant/robots HTTP/1.1
Content-Type: text/plain
Authorization: Bearer <access_token>

{
User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /profile
Disallow: /searchresults
Disallow: /confirmation
Disallow: /wishlist_settings
Disallow: /wishlist
Disallow: /error
Sitemap: http://{occs-host}/sitemap.xml}

Note: The XML sitemap is an index of page URLs on your store that is available for crawling by search engines. It helps search engines to crawl your site more intelligently. Only pages, products, and collections that can be seen by anonymous shoppers (that is, visitors to your store who are not logged in) are included in the generated sitemaps. Each sitemap.xml includes a <lastmod> tag that provides the date and time the item was last published. See Understand XML sitemaps for more information.

Upload a custom robots.txt file

The updateRobotsFile endpoint allows you to upload a custom robots.txt file. However in previous versions of Commerce, when publishing or on startup, the RobotsManager automatically replaced this custom robots.txt with an auto-generated one. In this case it was advisable to contact support who were required to manually disable automatic robots.txt generation.

From the current version of Commerce, the updateRobotsFile endpoint automatically disables the automatic robots.txt file generation. Additionally, the new endpoint /ccadmin/v1/merchant/seoConfig allows you to query, or update, the status of automatic robots.txt file generation.

Understand internal search results pages

Internal search refers to the search option on your own website when internal search results pages are generated. To prevent creating low-quality pages, internal search results are excluded from crawling in the robots.txt file and are not found in Search Engine Results Pages (SERPs).

Note: You can use Google Analytics to track internal search queries and use the reports to regularly monitor for:
  • the discovery of new keywords that customers use when searching for your products.
  • hints of navigational issues - users may be looking for existing categories that are difficult to find in your main navigation.