E Error Messages

The crawler uses a set of messages to log the crawling activities.

The following table lists the most common crawler error messages.

Message ID Message Comment Action
30025 {0}: Connection refused The Web site refuses the URL access request. Check the network setup environment of the machine running the crawler.
30027 Not allowed URL: {0} A URL link violates boundary rules and is discarded. Confirm that the URL indeed can be ignored.
30030 Malformed URL: {0} The URL is not properly formed. Verify the URL.
30031 Excluded by ROBOTS.TXT: {0} The robots.txt rule from the Web site of the URL does not allow the URL to be crawled. Configure the crawler to ignore robots rule only when you are managing the target Web site. This is done on the Home - Sources - Crawling Parameters page.
30040 Ignore URL: {0} Redirection to this URL is not allowed by boundary rule. Confirm that the URL indeed should be ignored.
30041 {0}: excluded by MIME type inclusion rule, URL is {1} The content type of the URL is not in MIME type inclusion list. Check if the specified content type should be included.
30054 Excessively long URL: {0} The URL string is too long, and the URL is ignored. N/A
30057 {0}: timeout reading document The target Web site is too slow sending page content. Increase the crawler timeout threshold from the crawler configuration page. The default is 30 seconds.
30083 {0}: Duplicate document ignored A document with the same content has been seen before within the same crawl session. This could be an indication of URL looping; that is, a generation of different URLs pointing back to the same page. Check if the URL is generated correctly. If necessary, disable indexing dynamic URLs. This is done on the Home - Sources - Crawling Parameters page.
30126 Binary document reported as text document: "{0}" A binary file has been sent by the Web site as a text document. In most cases, the URL in question is not a binary format text document, like pdf. Correct the Web site content type setting for the URL, if possible.
30188 Login form not specified for "{0}" Unable to perform HTML form login, because the name of the form is not set. In general, the name of the form should be automatically set by the crawler. Identify the URL of the login page, and check whether this is a regular HTML form login page or a SSO login page. Report the problem to Oracle support.
30199 Encountered an error while responding to the following HTTP authentication request: [{0}] Unable to authenticate through the target URL. Verify if the authentication request is basic authentication or digest authentication. Also confirm the provided authentication credentials.
30201 Missing authentication credentials Authentication data is not available to access the URL. Check the type of authentication needed and provide it through the source customization page
30206 Ignoring "{0}" due to host (or redirected host) connection problem The crawler is unable to contact the server of the URL. Verify that the Web site in question is up and try to recrawl.
30209 Document size ({0}) too big, ignored: {1} Document size exceeds the default limit of 10 megabytes. Increase the document size limit on the Global Settings - Crawler Configuration page.
30215 Excluded by crawling depth limit({0}): {1} Previously crawled URL is excluded due to newly reduced crawling depth limit. Confirm that the depth limit is correct.
30782 Invalid document attribute {0} - ignored Some of the attribute picked up from the document is not defined for the source. It is ignored. Most likely this is safe to ignore, unless you know that this particular attribute should be defined for this source. In that case, contact Oracle Support.