HTTP Properties

You set the HTTP transport properties in the default.xml file.

Property Name	Property Value
`http.agent.name`	Required. String that contains the name of the user agent originating the request (default is `endeca webcrawler`). This value is used for the HTTP User-Agent request header.
`http.robots.ignore`	Specifies whether the crawler ignores `robots.txt`.
`http.robots.agents`	Comma-delimited list of agent strings, in decreasing order of precedence (default is `endeca webcrawler,`). The agent strings are checked against the User-Agent field in the `robots.txt` file. It is recommended that you put the value of `http.agent.name` as the first agent name and keep the asterisk () at the end of the list.
`http.robots.403.allow`	Some servers return HTTP status 403 (Forbidden) if `robots.txt` does not exist. Setting this value to `false` means that such sites are treated as forbidden, while setting it to `true` means that the site can be crawled. This is a Boolean value with a default of `true`.
`http.agent.description`	String value (default is empty). Provides descriptive text about the crawler. The text is used in the User-Agent header, appearing in parenthesis after the agent name.
`http.agent.url`	String value (default is empty). Specifies the URL that appears in the User-Agent header, in parenthesis after the agent name. Custom dictates that the URL be a page explaining the purpose and behavior of this crawler.
`http.agent.email`	String value (default is empty). Specifies the email address that appears in the HTTP From request header and User-Agent header. A good practice is to mangle this address (e.g., "info at example dot com") to avoid spamming.
`http.agent.version`	String value (default is `WebCrawler`). Specifies the version of the crawl. The version is used in the User-Agent header.
`http.timeout`	Integer value (default is `10000`). Specifies the default network timeout in milliseconds.
`http.content.limit`	Integer value (default is `1048576`). Sets the length limit in bytes for downloaded content. If the value is a positive integer greater than 0, content longer than the setting will not be downloaded (the page will be skipped). If set to a negative integer, no limit is set on the content length. Oracle does not recommend setting this value to `0` because that value limits the crawl to producing 0-byte content.
`http.redirect.max`	Integer value (default is `5`). Sets the maximum number of redirects the fetcher will follow when trying to fetch a page. If set to negative or 0, the fetcher will not immediately follow redirected URLs, but instead will record them for later fetching.
`http.useHttp11`	Boolean value (default is `false`). If `true`, use HTTP 1.1; if `false`, use HTTP 1.0.
`http.cookies`	String value (default is empty). Specifies the cookies to be used by the HTTPClient.