Oracle Commerce Guided Search - Strategy for using the site.xml file

Strategy for using the site.xml file

The strategy for using these two configuration files is to have only one directory that contains the default.xml file, but not a site.xml file. This directory is the default configuration directory.

You then create a separate directory for each different crawl-specific configuration. Each of these per-crawl directories will not contain the default.xml file, but will contain a site.xml file that is customized for a given crawl configuration.

When you run a crawl, you point to that crawl's configuration directory by using the -c command-line option. However, the Web Crawler is hard-coded to first read the configuration files in the workspace/conf/web-crawler/default directory and then those in the per-crawl directory (which can override the default files). For this reason, it is important that you do not change the name and location of the workspace/conf/web-crawler/default directory nor the default.xml file.

Copyright © Legal Notices