The fetcher is the Web Crawler component that actually fetches pages from Web sites. You set the fetcher properties in the default.xml file.
| Property Name | Property Value |
|---|---|
| fetcher.delay | Value in seconds (default is 2.0). Specifies the number of seconds a fetcher will delay between successive requests to the same server. If you have multiple threads per host, the delay is on a per-thread basis, not across all threads. |
| fetcher.delay.max | Value in seconds (default is 30). Specifies the maximum amount of time to wait between page requests. |
| fetcher.threads.total | Integer (default is 100). Specifies the number of threads the fetcher should use. This value also determines the maximum number of requests that are made at once (because each thread handles one connection). |
| fetcher.threads.per-host | Integer (default is 1). Specifies the maximum number of threads that should be allowed to access a host at one time. |
| fetcher.retry.max | Integer (default is 3). Specifies the maximum number of times that a page will be retried. The page is skipped if it cannot be fetched in this number of retries. |
| fetcher.retry.delay | Value in seconds (default is 5). Specifies the delay between subsequent retries on the same page. If this value is less than the fetcher.delay value, then the value of fetcher.delay is used instead. |