TIMEOUT

A TIMEOUT element specifies the amount of time in seconds that a SPIDER can spending crawling a URL.

The TIMEOUT value applies to the entire scope of a crawling operation, including both the time spent connecting to a host as well as the time spent transferring data to or from a host. Choose a value for the TIMEOUT carefully because normal host name lookups can take considerable time. Limiting operations to less than a few minutes risks aborting normal operations.

Note: This element is deprecated, because the Spider component has been deprecated and will be removed in a future version.

DTD

<!ELEMENT TIMEOUT EMPTY>
<!ATTLIST TIMEOUT
    VALUE     CDATA    #REQUIRED
>

Attributes

The following section describes the TIMEOUT element's attribute.

VALUE

Specifies the number of seconds after which the crawling operation aborts. This value must be an integer greater than zero.

Sub-elements

The TIMEOUT element has no sub-elements.

Example

This example shows initialization values of a spider, including the TIMEOUT value for the crawl.

<SPIDER_INIT>
  <!-- Abort fetch operations that take over 120 seconds. -->
  <TIMEOUT VALUE="120"/>
  <!-- Abort fetch operations when a connection isn't made within 15 seconds. -->
  <CONNECT_TIMEOUT VALUE="15"/>
  <!-- Abort fetch operations if the transfer rate drops below 
       1024 bytes/second for 5 seconds. -->
  <MIN_TRANSFER_RATE MIN_RATE="1024" MAX_TIME="5"/>
  <!-- Tell Web servers we are a Netscape client making the fetch request. -->
  <AGENT_NAME>Netscape</AGENT_NAME>
  <!-- Configure the Spider to use the proxy server host1.acme.com:8080
       when fetching HTTP URLs and use host2.acme.com:8443 when 
       fetching HTTPS URLs. However the spider does not use a proxy for 
       URLs served by either host3.com or host4.com. -->
  <PROXY_CONFIG>
    <PROXY_HTTP HOST="host1.acme.com" PORT="8080"/>
    <PROXY_HTTPS HOST="host2.acme.com" PORT="8443"/>
    <PROXY_BYPASS>host3.com</PROXY_BYPASS>
    <PROXY_BYPASS>host4.com</PROXY_BYPASS>
  </PROXY_CONFIG>
  <!-- Spider will use proxy host1.acme.com:8080 for this URL. -->
  <ROOT_URL>http://www.endeca.com</ROOT_URL>
  <!-- Spider will use proxy host2.acme.com:8443 for this URL. -->
  <ROOT_URL>https://outlook.endeca.com</ROOT_URL>
  <!-- Spider won't use a proxy for these two URLs. -->
  <ROOT_URL>http://host3.com:6000</ROOT_URL>
  <ROOT_URL>http://host4.com:6000</ROOT_URL>
</SPIDER_INIT>