Edit the robots.txt
file
The robots.txt
file controls how web robots access and index your store’s pages.
For more information about robots.txt
and the Robots Exclusion
Protocol, visit www.robotstxt.org.
If you run multiple sites within a single instance of Commerce, each site has its own robots.txt
file.
See Configure
Sites to learn how to create multiple
sites.
To view your store’s current robots.txt
file:
- Enter the following URL into your browser:
https://[store url]/robots.txt
where [store url] is the base URL for your store.
- Commerce displays the contents of the current
robots.txt
file.
If you run multiple sites and keep language versions in directories, for example, example.com/de/, example.com/es/, you do not need to create a separate robot.txt file for each version. This is only the case for multiple sites over sub domains.
The Commerce
robots.txt
file below shows the updated
contents after these recommendations have been made:
User-agent: *
Disallow: /cart
Disallow: /en/cart
Disallow: /checkout
Disallow: /en/checkout
Disallow: /profile
Disallow: /en/profile
Disallow: /searchresults
Disallow: /en/searchresults
Disallow: /confirmation
Disallow: /en/confirmation
Disallow: /wishlist_settings
Disallow: /en/wishlist_settings
Disallow: /wishlist
Disallow: /en/wishlist
Sitemap: http://[store url]:8080/sitemap.xml
User-agent: * means that the exclusion rules should apply to all robots. You can replace the * (asterisk) with the name of a specific robot to exclude, for example, Googlebot, or Bingbot.
Each Disallow: /[page] entry indicates a page that robots should not visit. You should not remove any of the Disallow: entries from the default robots.txt
file, though you might want to include additional pages that you want robots to ignore. If you are testing your store and do not want any robots to crawl any pages, you might want your robots.txt file to look like this:
User-agent: * Disallow: /
If you plan to use your staging site as your production site when development and testing is complete, you will need to change the content in the robots.txt file to the custom settings presented above. If you tested on a separate staging domain, Commerce inputs a valid default robots.txt for you for your production storefront when you go live.
You cannot edit the robots.txt file in the administration UI. You must edit it with the Commerce Admin REST API. See Use the REST APIs for information about the REST APIs.
To update the robots.txt file
, issue a PUT request to /ccadmin/v1/merchant/robots
. The body of the request must include the entire contents of the file, in text/plain format.
When you update the robots.txt
file, it will not be overwritten until the next PUT request is sent to /ccadmin/v1/merchant/robots
.
If you run multiple sites within a single instance of Commerce, you must specify the site whose
robots.txt
file you are
updating in the x-ccsite header in the PUT request. If you do
not specify a site, the request updates the default site’s
robots.txt
file.
The following example shows a PUT request that adds your error page to the list of pages for robots to ignore.
PUT /ccadmin/v1/merchant/robots HTTP/1.1
Content-Type: text/plain
Authorization: Bearer <access_token>
{
User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /profile
Disallow: /searchresults
Disallow: /confirmation
Disallow: /wishlist_settings
Disallow: /wishlist
Disallow: /error
Sitemap: http://{occs-host}/sitemap.xml}
Note: The XML sitemap is an index of page URLs on your store that is available for crawling by search engines. It helps search engines to crawl your site more intelligently. Only pages, products, and collections that can be seen by anonymous shoppers (that is, visitors to your store who are not logged in) are included in the generated sitemaps. Each sitemap.xml includes a <lastmod>
tag that provides the date and time the item was last published. See Understand XML sitemaps for more information.
Upload a custom robots.txt
file
The updateRobotsFile
endpoint allows you to
upload a custom robots.txt
file. However in previous versions of Commerce, when publishing or on startup, the RobotsManager
automatically replaced this custom
robots.txt
with an
auto-generated one. In this case it was advisable to
contact support who were required to manually
disable automatic robots.txt
generation.
From the current version of Commerce, the updateRobotsFile
endpoint automatically disables the automatic
robots.txt
file generation. Additionally, the new endpoint
/ccadmin/v1/merchant/seoConfig
allows you to query, or update, the status
of automatic robots.txt
file generation.
Understand internal search results pages
Internal search refers to the search option on your own website when internal search results pages are generated. To prevent creating low-quality pages, internal search results are excluded from crawling in the robots.txt
file and are not found in Search Engine Results Pages (SERPs).
- the discovery of new keywords that customers use when searching for your products.
- hints of navigational issues - users may be looking for existing categories that are difficult to find in your main navigation.