The workspace/conf/web-crawler/default
directory
is the default configuration directory.
For example, this directory is used if you do not specify the -c
flag.
You can also use the -c
flag to override one or more configuration files in the default configuration directory with files from another configuration directory.
For example, assume you have a directory (named intsites
) that has a site.xml
file for a specific crawl (and no other configuration files).
You would then use the -c
flag to point to that directory:
.\bin\web-crawler -c conf\web\intsites -d 2 -s conf\web\intsites\int.lst
In this example, the crawl uses the site.xml
from the intsites
directory, while the rest of the files are read from the default configuration directory.