You can set the crawl-urlfilter.txt files to accept certain hosts.

The crawl-urlfilter.txt files in the configuration directories (default, polite, and non-polite) all have this line commented out:

# accept hosts in MY.DOMAIN.NAME
# +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME.com/

To limit the crawl to a specific domain:



Copyright © Legal Notices