Skip navigation links

Oracle Secure Enterprise Search Java API Reference
11g Release 1 (11.1.2.0.0)

E14433-02


oracle.search.sdk.crawler
Interface DocumentContainer


public interface DocumentContainer

DocumentContainer is an interface used by a crawler plug-in to submit or retrieve document information.


Field Summary
static int STATUS_ACCESS_FORBIDDEN
          Access to this document is forbidden.
static int STATUS_AUTH_REQUIRED
          Authorization is required to access this document.
static int STATUS_BAD_GATEWAY
           
static int STATUS_BAD_REQUEST
           
static int STATUS_CANNOT_READ
          The document has no contents.
static int STATUS_CONNECTION_REFUSED
          The connection was refused when accessing this document.
static int STATUS_DOC_BOUNDARY_RULE_EXCLUDED
          The document was excluded based on boundary rules.
static int STATUS_DOC_MIME_TYPE_EXCLUDED
          The document was excluded based on MIME type.
static int STATUS_DOC_SIZE_TOO_BIG
          The document was too big to crawl.
static int STATUS_DUPLICATE_DOC
          A duplicate document exists and will be ignored.
static int STATUS_EMPTY_DOC
          The document has no contents.
static int STATUS_FETCH_ERROR
          The document cannot be retrieved.
static int STATUS_FILTER_ERROR
          The document cannot be filtered.
static int STATUS_IO_EXCEPTION
          An I/O exception occurred when processing this document.
static int STATUS_LOGIN_FAILED
          Log-in failed for this document.
static int STATUS_NOTFOUND
          The document cannot be found.
static int STATUS_OK_BUT_NO_INDEX
          The document is valid, but do not index it yet.
static int STATUS_OK_CRAWLED
          The document was crawled successfully.
static int STATUS_OK_FOR_INDEX
          The document is ready to be indexed.
static int STATUS_OUT_OF_MEMORY
          An out-of-memory error occurred when processing this document.
static int STATUS_PROXY_REQUIRED
           
static int STATUS_READ_TIMEOUT
          Read timeout on this document.
static int STATUS_REQUEST_TIMEOUT
           
static int STATUS_SERVER_ERROR
          The server had a problem delivering this document.
static int STATUS_STORE_USER
           

 

Method Summary
 void addAttachment(String name, InputStream doc, String contentType)
           
 void addAttachment(String name, Reader doc, String contentType)
          Adds a text document attachment.
 void clearAttachments()
          Removes all attachments.
 InputStream getBinaryDocumentStream()
          Gets the original document content for binary file.
 Reader getDocumentReader()
          Gets the reader for the data that was set in the documentContainer.
 int getDocumentStatus()
          Gets the status of the document (DocumentContainer.STATUS*).
 InputStream getDocumentStream()
          Gets the stream for the data that was set in the documentContainer.
 DocumentMetadata getMetadata()
          Get the metadata object associated with this document container
 void setDocument(InputStream doc)
          Sets a document using the InputStream.
 void setDocument(Reader doc)
          Sets the document to be processed.
 void setDocumentStatus(int status)
          Set the status of the document.
 void setMetadata(DocumentMetadata meta)
          Sets the Document Metadata object.

 

Field Detail

STATUS_OK_FOR_INDEX

static final int STATUS_OK_FOR_INDEX
The document is ready to be indexed.
See Also:
Constant Field Values

STATUS_BAD_REQUEST

static final int STATUS_BAD_REQUEST
See Also:
Constant Field Values

STATUS_AUTH_REQUIRED

static final int STATUS_AUTH_REQUIRED
Authorization is required to access this document.
See Also:
Constant Field Values

STATUS_ACCESS_FORBIDDEN

static final int STATUS_ACCESS_FORBIDDEN
Access to this document is forbidden.
See Also:
Constant Field Values

STATUS_NOTFOUND

static final int STATUS_NOTFOUND
The document cannot be found.
See Also:
Constant Field Values

STATUS_PROXY_REQUIRED

static final int STATUS_PROXY_REQUIRED
See Also:
Constant Field Values

STATUS_REQUEST_TIMEOUT

static final int STATUS_REQUEST_TIMEOUT
See Also:
Constant Field Values

STATUS_SERVER_ERROR

static final int STATUS_SERVER_ERROR
The server had a problem delivering this document.
See Also:
Constant Field Values

STATUS_BAD_GATEWAY

static final int STATUS_BAD_GATEWAY
See Also:
Constant Field Values

STATUS_FETCH_ERROR

static final int STATUS_FETCH_ERROR
The document cannot be retrieved.
See Also:
Constant Field Values

STATUS_READ_TIMEOUT

static final int STATUS_READ_TIMEOUT
Read timeout on this document.
See Also:
Constant Field Values

STATUS_FILTER_ERROR

static final int STATUS_FILTER_ERROR
The document cannot be filtered.
See Also:
Constant Field Values

STATUS_OUT_OF_MEMORY

static final int STATUS_OUT_OF_MEMORY
An out-of-memory error occurred when processing this document.
See Also:
Constant Field Values

STATUS_IO_EXCEPTION

static final int STATUS_IO_EXCEPTION
An I/O exception occurred when processing this document.
See Also:
Constant Field Values

STATUS_CONNECTION_REFUSED

static final int STATUS_CONNECTION_REFUSED
The connection was refused when accessing this document.
See Also:
Constant Field Values

STATUS_DUPLICATE_DOC

static final int STATUS_DUPLICATE_DOC
A duplicate document exists and will be ignored.
See Also:
Constant Field Values

STATUS_EMPTY_DOC

static final int STATUS_EMPTY_DOC
The document has no contents.
See Also:
Constant Field Values

STATUS_LOGIN_FAILED

static final int STATUS_LOGIN_FAILED
Log-in failed for this document.
See Also:
Constant Field Values

STATUS_OK_BUT_NO_INDEX

static final int STATUS_OK_BUT_NO_INDEX
The document is valid, but do not index it yet.
See Also:
Constant Field Values

STATUS_OK_CRAWLED

static final int STATUS_OK_CRAWLED
The document was crawled successfully.
See Also:
Constant Field Values

STATUS_CANNOT_READ

static final int STATUS_CANNOT_READ
The document has no contents.
See Also:
Constant Field Values

STATUS_DOC_SIZE_TOO_BIG

static final int STATUS_DOC_SIZE_TOO_BIG
The document was too big to crawl.
See Also:
Constant Field Values

STATUS_DOC_MIME_TYPE_EXCLUDED

static final int STATUS_DOC_MIME_TYPE_EXCLUDED
The document was excluded based on MIME type.
See Also:
Constant Field Values

STATUS_DOC_BOUNDARY_RULE_EXCLUDED

static final int STATUS_DOC_BOUNDARY_RULE_EXCLUDED
The document was excluded based on boundary rules.
See Also:
Constant Field Values

STATUS_STORE_USER

static final int STATUS_STORE_USER
See Also:
Constant Field Values

Method Detail

setMetadata

void setMetadata(DocumentMetadata meta)
Sets the Document Metadata object. This is required to provide metadata about the object such as the author and last modified date.
Parameters:
meta - the metadata object

getMetadata

DocumentMetadata getMetadata()
Get the metadata object associated with this document container

getDocumentReader

Reader getDocumentReader()
Gets the reader for the data that was set in the documentContainer.

getDocumentStream

InputStream getDocumentStream()
Gets the stream for the data that was set in the documentContainer.

getBinaryDocumentStream

InputStream getBinaryDocumentStream()
Gets the original document content for binary file.

setDocument

void setDocument(Reader doc)
Sets the document to be processed.
Parameters:
doc - the reader to read the document

setDocument

void setDocument(InputStream doc)
Sets a document using the InputStream. Typically used for binary documents.
Parameters:
doc - the input stream to read the document

addAttachment

void addAttachment(String name,
                   InputStream doc,
                   String contentType)
Parameters:
name - the name of the attachment; null if unknown
doc - the input stream to read the attachment
contentType - the MIME type of the attachment, or null if unknown

addAttachment

void addAttachment(String name,
                   Reader doc,
                   String contentType)
Adds a text document attachment. Only text/plain, text/html, and multipart/mixed(email) attachments are supported.
Parameters:
name - the name of the attachment, or null if unknown.
doc - the input text reader to read the attachment
contentType - the MIME type of the attachment, or null if unknown

clearAttachments

void clearAttachments()
Removes all attachments.

getDocumentStatus

int getDocumentStatus()
Gets the status of the document (DocumentContainer.STATUS*).

setDocumentStatus

void setDocumentStatus(int status)
                       throws ProcessingException
Set the status of the document. Only system-defined status code(900-1200), HTTP status code (200-505), and user-defined status code in the range of 10000-1999 are allowed. The code defined in DocumentContainer.STATUS* is a subset of the system-defined status code.
Parameters:
status - the document status code
Throws:
ProcessingException

Skip navigation links

Oracle Secure Enterprise Search Java API Reference
11g Release 1 (11.1.2.0.0)

E14433-02


Copyright © 2006, 2010, Oracle and/or its affiliates. All rights reserved.