A CONNECT_TIMEOUT element specifies the maximum time in seconds that the SPIDER should wait for a connection to be made to a host during crawling.
If you do not specify a value, the SPIDER waits indefinitely, unless a TIMEOUT element has a value specified to limit the overall time of crawling operations.
<!ELEMENT CONNECT_TIMEOUT EMPTY> <!ATTLIST CONNECT_TIMEOUT VALUE CDATA #REQUIRED >
The following section describes the CONNECT_TIMEOUT element's attribute.
VALUE
Specifies the maximum number of seconds to wait for a host connection. This value must be an integer greater than zero.
The CONNECT_TIMEOUT element has no sub-elements.
This example shows initialization values of a Spider component, including connection and proxy information. The CONNECT_TIMEOUT value is set to 15 seconds.
<SPIDER_INIT> <!-- Abort fetch operations that take longer than 120 seconds. --> <TIMEOUT VALUE="120"/> <!-- Abort fetch operations when a connection isn't made within 15 seconds. --> <CONNECT_TIMEOUT VALUE="15"/> <!-- Abort fetch operations if the transfer rate drops below 1024 bytes/second for 5 seconds. --> <MIN_TRANSFER_RATE MIN_RATE="1024" MAX_TIME="5"/> <!-- Have the Spider tell web servers that it's a Netscape client making the fetch request. --> <AGENT_NAME>Netscape</AGENT_NAME> <!-- Configure the Spider to use the proxy server running on host1.acme.com:8080 when fetching HTTP URLs and host2.acme.com:8443 when fetching HTTPS URLs but not to use a proxy for URLs served by either host3.com or host4.com. --> <PROXY_CONFIG> <PROXY_HTTP HOST="host1.acme.com" PORT="8080"/> <PROXY_HTTPS HOST="host2.acme.com" PORT="8443"/> <PROXY_BYPASS>host3.com</PROXY_BYPASS> <PROXY_BYPASS>host4.com</PROXY_BYPASS> </PROXY_CONFIG> <!-- Spider will use proxy host1.acme.com:8080 for this URL --> <ROOT_URL>http://www.endeca.com</ROOT_URL> <!-- Spider will use proxy host2.acme.com:8443 for this URL --> <ROOT_URL>https://outlook.endeca.com</ROOT_URL> <!-- Spider won't use a proxy for this URL --> <ROOT_URL>http://host3.com:6000</ROOT_URL> <!-- Spider won't use a proxy for this URL --> <ROOT_URL>http://host4.com:6000</ROOT_URL> </SPIDER_INIT>