Fetcher properties

The fetcher is the Web Crawler component that actually fetches pages from Web sites. You set the fetcher properties in the default.xml file.

By using the properties listed in the table, you can configure the behavior of the fetcher.
Property Name Property Value
fetcher.delay Value in seconds (default is 2.0). Specifies the number of seconds a fetcher will delay between successive requests to the same server. If you have multiple threads per host, the delay is on a per-thread basis, not across all threads.
fetcher.delay.max Value in seconds (default is 30). Specifies the maximum amount of time to wait between page requests.
fetcher.threads.total Integer (default is 100). Specifies the number of threads the fetcher should use. This value also determines the maximum number of requests that are made at once (because each thread handles one connection).
fetcher.threads.per-host Integer (default is 1). Specifies the maximum number of threads that should be allowed to access a host at one time.
fetcher.retry.max Integer (default is 3). Specifies the maximum number of times that a page will be retried. The page is skipped if it cannot be fetched in this number of retries.
fetcher.retry.delay Value in seconds (default is 5). Specifies the delay between subsequent retries on the same page. If this value is less than the fetcher.delay value, then the value of fetcher.delay is used instead.