You can set the authentication properties in the default.xml
file.
The HTTPClient supports four different types of HTTP authentication schemes :
These schemes can be used to authenticate with HTTP servers or proxies. The table below lists the properties that correspond to each authentication scheme.
Property Name |
Property Value |
---|---|
|
String value (default is empty). Specifies the credentials to be used by the HTTPClient for Basic authentication. If the value is empty, Basic authentication is not done for the crawl. |
|
String value (default is empty). Specifies the credentials to be used by the HTTPClient for Digest authentication. If the value is empty, Digest authentication is not done for the crawl. |
|
String value (default is empty). Specifies the credentials to be used by the HTTPClient for NTLM authentication. If the value is empty, NTLM authentication is not done for the crawl. |
|
File name (default is |
Related links
If a Web server uses HTTP Basic authentication to restrict access
to Web sites, you can specify authentication credentials that
enable the Web Crawler to access password-protected pages.
The http.auth.basic
property sets the credentials to be used by the HTTPClient for Basic authentication.
The credentials must be specified in this format :
USERNAME1~~~PASSWORD1~~~HOST1~~~PORT1~~~REALM1|||USERNAME2~~~...
where:
Note that the triple-tilde delimiter (~~~
)
must be used to separate the values.
A sample credential specification is:
jjones~~~hello123~~~myhost~~~ANY_PORT~~~ANY_REALM
If a Web server uses HTTP Digest authentication to restrict access
to Web sites, you can use the http.auth.digest
property to set the credentials used by the HTTPClient for Digest authentication.
The credentials must be specified in this format:
USERNAME1~~~PASSWORD1~~~HOST1~~~PORT1~~~REALM1|||USERNAME2~~~...
where the meanings of the arguments are the same as for Basic authentication.
If a Web server uses HTTP NTLM authentication to restrict access
to Web sites, you can specify authentication credentials that
enable the Web Crawler to access password-protected pages.
The http.auth.ntlm
property sets the credentials to be used by the HTTPClient for NTLM authentication.
Note
The Web Crawler only supports Version 1 of the NTLM authentication scheme.
The credentials must be specified in this format:
USERNAME1~~~PASSWORD1~~~HOST1~~~PORT1~~~REALM1~~~DOMAIN1|||USERNAME2~~~...
where:
Note that the triple-tilde delimiter (~~~
)
must be used to separate the values.
If you are crawling sites that implement form-based authentication, you supply the credentials in a form-credentials.xml
file.
To configure form-based authentication: