Oracle Commerce Cloud Service - Edit the robots.txt file

Edit the robots.txt file

The robots.txt file controls how web robots access and index your store’s pages. For more information about robots.txt and the Robots Exclusion Protocol, visit www.robotstxt.org.

If you run multiple sites within a single instance of Commerce Cloud, each site has its own robots.txt file. See Configure Sites in Extending Oracle Commerce Cloud to learn how to create multiple sites.

To view your store’s current robots.txt file:

Enter the following URL into your browser:
https://[store url]/robots.txt
where [store url] is the base URL for your store.
Commerce Cloud displays the contents of the current robots.txt file.

The default Commerce Cloud robots.txt file looks like this:

User-agent: *
Disallow: /cart
Disallow: /en/cart
Disallow: /checkout
Disallow: /en/checkout
Disallow: /profile
Disallow: /en/profile
Disallow: /searchresults
Disallow: /en/searchresults
Disallow: /confirmation
Disallow: /en/confirmation
Disallow: /wishlist_settings
Disallow: /en/wishlist_settings
Disallow: /wishlist
Disallow: /en/wishlist

Sitemap: http://[store url]:8080/sitemap.xml

User-agent: * means that the exclusion rules should apply to all robots. You can replace the * (asterisk) with the name of a specific robot to exclude, for example, Google.

Each Disallow: /[page] entry indicates a page that robots should not visit. You should not remove any of the Disallow: entries from the default robots.txt file, though you might want to include additional pages that you want robots to ignore. If you are testing your store and do not want any robots to crawl any pages, you might want your robots.txt file to look like this:

User-agent: *
Disallow: /

You cannot edit the robots.txt file in the administration UI. You must edit it with the Commerce Cloud Admin REST API. See Extending Oracle Commerce Cloud for information about the REST APIs.

To update the robots.txt file, issue a PUT request to /ccadmin/v1/merchant/robots. The body of the request must include the entire contents of the file, in text/plain format.

When you update the robots.txt file, it will not be overwritten until the next PUT request is sent to /ccadmin/v1/merchant/robots.

If you run multiple sites within a single instance of Commerce Cloud, you must specify the site whose robots.txt file you are updating in the x-ccsite header in the PUT request. If you do not specify a site, the request updates the default site’s robots.txt file.

The following example shows a PUT request that adds your error page to the list of pages for robots to ignore.

PUT /ccadmin/v1/merchant/robots HTTP/1.1
Content-Type: text/plain
Authorization: Bearer <access_token>

{
User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /profile
Disallow: /searchresults
Disallow: /confirmation
Disallow: /wishlist_settings
Disallow: /wishlist
Disallow: /error
Sitemap: http://{occs-host}/sitemap.xml}

Note: The XML sitemap is an index of page URLs on your store that is available for crawling by search engines. It helps search engines to crawl your site more intelligently. Only pages, products, and collections that can be seen by anonymous shoppers (that is, visitors to your store who are not logged in) are included in the generated sitemaps.

Edit the robots.txt file

Using Oracle Commerce Cloud