| Oracle® Secure Enterprise Search Administrator's Guide 11g Release 2 (11.2.2) Part Number E23427-01 |
|
|
PDF · Mobi · ePub |
The crawler uses a set of codes to indicate the result of the crawled URL. Besides the standard HTTP status code, it uses its own code for non-HTTP related situations.
Only URLs with status 200 are indexed. If the record exists in EQ$URL but the status is something other than 200, then the crawler encountered an error trying to fetch the document. A status of less than 600 maps directly to the HTTP status code.
See Also:
"Status Code Definitions" in Hypertext Transfer Protocol -- HTTP/1.1 atThe following table lists the URL status codes, document container codes used by the crawler plug-in, and EQG codes.
| Code | Description | Document Container Code | EQG Codes |
|---|---|---|---|
| 0 | A URL that has been enqueued but not yet processed | N/A | |
| 200 | URL OK | STATUS_OK_FOR_INDEX |
N/A |
| 400 | Bad request | STATUS_BAD_REQUEST |
30009 |
| 401 | Authorization required | STATUS_AUTH_REQUIRED |
30007 |
| 402 | Payment required | 30011 | |
| 403 | Access forbidden | STATUS_ACCESS_FORBIDDEN |
30010 |
| 404 | Not found | STATUS_NOTFOUND |
30008 |
| 405 | Method not allowed | 30012 | |
| 406 | Not acceptable | 30013 | |
| 407 | Proxy authentication required | STATUS_PROXY_REQUIRED |
30014 |
| 408 | Request timeout | STATUS_REQUEST_TIMEOUT |
30015 |
| 409 | Conflict | 30016 | |
| 410 | Gone | 30017 | |
| 414 | Request URI too large | 30066 | |
| 500 | Internal server error | STATUS_SERVER_ERROR |
10018 |
| 501 | Not implemented | 10019 | |
| 502 | Bad gateway | STATUS_BAD_GATEWAY |
10020 |
| 503 | Service unavailable | STATUS_FETCH_ERROR |
10021 |
| 504 | Gateway timeout | 10022 | |
| 505 | HTTP version not supported | 10023 | |
| 902 | Timeout reading document | STATUS_READ_TIMEOUT |
30057 |
| 903 | Filtering failed | STATUS_FILTER_ERROR |
30065 |
| 904 | Out of memory error | STATUS_OUT_OF_MEMORY |
30003 |
| 905 | IOEXCEPTION in processing URL | STATUS_IO_EXCEPTION |
30002 |
| 906 | Connection refused | STATUS_CONNECTION_REFUSED |
30025 |
| 907 | Socket bind exception | 30079 | |
| 908 | Filter not available | 30081 | |
| 909 | Duplicate document detected | 30082 | |
| 910 | Duplicate document ignored | STATUS_DUPLICATE_DOC |
30083 |
| 911 | Empty document | STATUS_EMPTY_DOC |
30106 |
| 951 | URL not indexed (this can happen if robots.txt specifies that a certain document should not be indexed) |
STATUS_OK_BUT_NO_INDEX |
N/A |
| 952 | URL crawled | STATUS_OK_CRAWLED |
N/A |
| 953 | Metatag redirection | N/A | |
| 954 | HTTP redirection | 30000 | |
| 955 | Black list URL | N/A | |
| 956 | URL is not unique | 31017 | |
| 957 | Sentry URL (URL as a place holder) | N/A | |
| 958 | Document read error | STATUS_CANNOT_READ |
30173 |
| 959 | Form login failed | STATUS_LOGIN_FAILED |
30183 |
| 960 | Document size too big, ignored | STATUS_DOC_SIZE_TOO_BIG |
30209 |
| 962 | Document was excluded based on mime type | STATUS_DOC_MIME_TYPE_EXCLUDED |
30041 |
| 964 | Document was excluded based on boundary rules | STATUS_DOC_BOUNDARY_RULE_EXCLUDED |
30258 |
| 1001 | Datatype is not TEXT/HTML | 30001 | |
| 1002 | Broken network data stream | 30004 | |
| 1003 | HTTP redirect location does not exist | 30005 | |
| 1004 | Bad relative URL | 30006 | |
| 1005 | HTTP error | 30024 | |
| 1006 | Error parsing HTTP header | 30058 | |
| 1007 | Invalid URL table column name | 30067 | |
| 1009 | Binary document reported as text document | 30126 | |
| 1010 | Invalid display URL | 30112 | |
| 1011 | Invalid XML from OracleAS Portal | PORTAL_XMLURL_FAIL |
31011 |
| 1020-1024 | URL is not reachable. The status starts at 1020, and it increases by one with each try. After five tries (if it reaches 1025), the URL is deleted. | N/A | |
| 1026-1029 | URL cannot be found. The status turns from 404 to 1026 when a URL cannot be found on re-crawl, and it increases by one with each try. After five tries (if it reaches 1030), the URL is deleted. | N/A | |
| 1111 | URL remained in the queue even after a successful crawl. This indicates that the crawler had a problem processing this document. You could investigate the URL by crawling it in a separate source to check for errors in the crawler log. | N/A |