A TIMEOUT element specifies the amount of time in seconds that a SPIDER can spending crawling a URL.
The TIMEOUT value applies to the entire scope of a crawling operation, including both the time spent connecting to a host as well as the time spent transferring data to or from a host. Choose a value for the TIMEOUT carefully because normal host name lookups can take considerable time. Limiting operations to less than a few minutes risks aborting normal operations.
<!ELEMENT TIMEOUT EMPTY>
<!ATTLIST TIMEOUT
VALUE CDATA #REQUIRED
>
The following section describes the TIMEOUT element's attribute.
VALUE
Specifies the number of seconds after which the crawling operation aborts. This value must be an integer greater than zero.
The TIMEOUT element has no sub-elements.
This example shows initialization values of a spider, including the TIMEOUT value for the crawl.
<SPIDER_INIT>
<!-- Abort fetch operations that take over 120 seconds. -->
<TIMEOUT VALUE="120"/>
<!-- Abort fetch operations when a connection isn't made within 15 seconds. -->
<CONNECT_TIMEOUT VALUE="15"/>
<!-- Abort fetch operations if the transfer rate drops below
1024 bytes/second for 5 seconds. -->
<MIN_TRANSFER_RATE MIN_RATE="1024" MAX_TIME="5"/>
<!-- Tell Web servers we are a Netscape client making the fetch request. -->
<AGENT_NAME>Netscape</AGENT_NAME>
<!-- Configure the Spider to use the proxy server host1.acme.com:8080
when fetching HTTP URLs and use host2.acme.com:8443 when
fetching HTTPS URLs. However the spider does not use a proxy for
URLs served by either host3.com or host4.com. -->
<PROXY_CONFIG>
<PROXY_HTTP HOST="host1.acme.com" PORT="8080"/>
<PROXY_HTTPS HOST="host2.acme.com" PORT="8443"/>
<PROXY_BYPASS>host3.com</PROXY_BYPASS>
<PROXY_BYPASS>host4.com</PROXY_BYPASS>
</PROXY_CONFIG>
<!-- Spider will use proxy host1.acme.com:8080 for this URL. -->
<ROOT_URL>http://www.endeca.com</ROOT_URL>
<!-- Spider will use proxy host2.acme.com:8443 for this URL. -->
<ROOT_URL>https://outlook.endeca.com</ROOT_URL>
<!-- Spider won't use a proxy for these two URLs. -->
<ROOT_URL>http://host3.com:6000</ROOT_URL>
<ROOT_URL>http://host4.com:6000</ROOT_URL>
</SPIDER_INIT>