After you install the CAS, the configuration files are in the following locations:
The
workspace/conf/web-crawler/defaultdirectory contains all of the above files, except for thesite.xmlfile. This directory is the global configuration directory, and you should not change its name nor remove thedefault.xmlfile. Note that the settings of most of its files can be overridden by the versions in the crawl-specific configuration directories.The
workspace/conf/web-crawler/polite-crawldirectory contains only thesite.xmlandcrawl-urlfilter.txtfiles.The
workspace/conf/web-crawler/non-polite-crawldirectory also contains only thesite.xmlandcrawl-urlfilter.txtfiles. Thissite.xmlcontains more aggressive settings, such as such as no fetcher delay (versus a 1-second delay in the polite version) and a maximum of 52 threads (versus 1 in the polite version).
You can use a text editor to edit the files.

