Skip Headers
Oracle® Secure Enterprise Search Administrator's Guide
11g Release 1 (11.1.2.0.0)

E14130-05
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

B URL Crawler Status Codes

The crawler uses a set of codes to indicate the result of the crawled URL. Besides the standard HTTP status code, it uses its own code for non-HTTP related situations.

Only URLs with status 200 are indexed. If the record exists in EQ$URL but the status is something other than 200, then the crawler encountered an error trying to fetch the document. A status of less than 600 maps directly to the HTTP status code.

See Also:

"Status Code Definitions" in Hypertext Transfer Protocol -- HTTP/1.1 at

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

The following table lists the URL status codes, document container codes used by the crawler plug-in, and EQG codes.

Code Description Document Container Code EQG Codes
0 A URL that has been enqueued but not yet processed   N/A
200 URL OK STATUS_OK_FOR_INDEX N/A
400 Bad request STATUS_BAD_REQUEST 30009
401 Authorization required STATUS_AUTH_REQUIRED 30007
402 Payment required   30011
403 Access forbidden STATUS_ACCESS_FORBIDDEN 30010
404 Not found STATUS_NOTFOUND 30008
405 Method not allowed   30012
406 Not acceptable   30013
407 Proxy authentication required STATUS_PROXY_REQUIRED 30014
408 Request timeout STATUS_REQUEST_TIMEOUT 30015
409 Conflict   30016
410 Gone   30017
414 Request URI too large   30066
500 Internal server error STATUS_SERVER_ERROR 10018
501 Not implemented   10019
502 Bad gateway STATUS_BAD_GATEWAY 10020
503 Service unavailable STATUS_FETCH_ERROR 10021
504 Gateway timeout   10022
505 HTTP version not supported   10023
902 Timeout reading document STATUS_READ_TIMEOUT 30057
903 Filtering failed STATUS_FILTER_ERROR 30065
904 Out of memory error STATUS_OUT_OF_MEMORY 30003
905 IOEXCEPTION in processing URL STATUS_IO_EXCEPTION 30002
906 Connection refused STATUS_CONNECTION_REFUSED 30025
907 Socket bind exception   30079
908 Filter not available   30081
909 Duplicate document detected   30082
910 Duplicate document ignored STATUS_DUPLICATE_DOC 30083
911 Empty document STATUS_EMPTY_DOC 30106
951 URL not indexed (this can happen if robots.txt specifies that a certain document should not be indexed) STATUS_OK_BUT_NO_INDEX N/A
952 URL crawled STATUS_OK_CRAWLED N/A
953 Metatag redirection   N/A
954 HTTP redirection   30000
955 Black list URL   N/A
956 URL is not unique   31017
957 Sentry URL (URL as a place holder)   N/A
958 Document read error STATUS_CANNOT_READ 30173
959 Form login failed STATUS_LOGIN_FAILED 30183
960 Document size too big, ignored STATUS_DOC_SIZE_TOO_BIG 30209
962 Document was excluded based on mime type STATUS_DOC_MIME_TYPE_EXCLUDED 30041
964 Document was excluded based on boundary rules STATUS_DOC_BOUNDARY_RULE_EXCLUDED 30258
1001 Datatype is not TEXT/HTML   30001
1002 Broken network data stream   30004
1003 HTTP redirect location does not exist   30005
1004 Bad relative URL   30006
1005 HTTP error   30024
1006 Error parsing HTTP header   30058
1007 Invalid URL table column name   30067
1009 Binary document reported as text document   30126
1010 Invalid display URL   30112
1011 Invalid XML from OracleAS Portal PORTAL_XMLURL_FAIL 31011
1020-1024 URL is not reachable. The status starts at 1020, and it increases by one with each try. After five tries (if it reaches 1025), the URL is deleted.   N/A
1026-1029 URL cannot be found. The status turns from 404 to 1026 when a URL cannot be found on re-crawl, and it increases by one with each try. After five tries (if it reaches 1030), the URL is deleted.   N/A
1111 URL remained in the queue even after a successful crawl. This indicates that the crawler had a problem processing this document. You could investigate the URL by crawling it in a separate source to check for errors in the crawler log.   N/A