| Oracle® Secure Enterprise Search Administrator's Guide 10g Release 1 (10.1.7) Beta Part Number B32011-01 | 
 | 
| 
 | View PDF | 
Beta Draft
The crawler uses a set of messages to log the crawling activities.
The following table lists the most common crawler error messages.
| Message ID | Message | Comment | Action | 
|---|---|---|---|
| 30025 | {0}: Connection refused | The Web site refuses the URL access request. | Check the network setup environment of the machine running the crawler. | 
| 30027 | Not allowed URL: {0} | A URL link violates boundary rules and is discarded. | Confirm that the URL indeed can be ignored. | 
| 30030 | Malformed URL: {0} | The URL is not properly formed. | Verify the URL. | 
| 30031 | Excluded by ROBOTS.TXT: {0} | The robots.txt rule from the Web site of the URL does not allow the URL to be crawled. | Configure the crawler to ignore robots rule only when you are managing the target Web site. This is done on the Home - Sources - Crawling Parameters page. | 
| 30040 | Ignore URL: {0} | Redirection to this URL is not allowed by boundary rule. | Confirm that the URL indeed should be ignored. | 
| 30041 | {0}: excluded by MIME type inclusion rule, URL is {1} | The content type of the URL is not in MIME type inclusion list. | Check if the specified content type should be included. | 
| 30054 | Excessively long URL: {0} | The URL string is too long, and the URL is ignored. | N/A | 
| 30057 | {0}: timeout reading document | The target Web site is too slow sending page content. | Increase the crawler timeout threshold from the crawler configuration page. The default is 30 seconds. | 
| 30083 | {0}: Duplicate document ignored | A document with the same content has been seen before within the same crawl session. This could be an indication of URL looping; that is, a generation of different URLs pointing back to the same page. | Check if the URL is generated correctly. If necessary, disable indexing dynamic URLs. This is done on the Home - Sources - Crawling Parameters page. | 
| 30126 | Binary document reported as text document: "{0}" | A binary file has been sent by the Web site as a text document. In most cases, the URL in question is not a binary format text document, like pdf. | Correct the Web site content type setting for the URL, if possible. | 
| 30188 | Login form not specified for "{0}" | Unable to perform HTML form login, because the name of the form is not set. In general, the name of the form should be automatically set by the crawler. | Identify the URL of the login page, and check whether this is a regular HTML form login page or a SSO login page. Report the problem to Oracle support. | 
| 30199 | Encountered an error while responding to the following HTTP authentication request: [{0}] | Unable to authenticate through the target URL. | Verify if the authentication request is basic authentication or digest authentication. Also confirm the provided authentication credentials. | 
| 30201 | Missing authentication credentials | Authentication data is not available to access the URL. | Check the type of authentication needed and provide it through the source customization page | 
| 30206 | Ignoring "{0}" due to host (or redirected host) connection problem | The crawler is unable to contact the server of the URL. | Verify that the Web site in question is up and try to recrawl. | 
| 30209 | Document size ({0}) too big, ignored: {1} | Document size exceeds the default limit of 10 megabytes. | Increase the document size limit on the Global Settings - Crawler Configuration page. | 
| 30215 | Excluded by crawling depth limit({0}): {1} | Previously crawled URL is excluded due to newly reduced crawling depth limit. | Confirm that the depth limit is correct. | 
| 30782 | Invalid document attribute {0} - ignored | Some of the attribute picked up from the document is not defined for the source. It is ignored. | Most likely this is safe to ignore, unless you know that this particular attribute should be defined for this source. In that case, contact Oracle Support. |