Skip navigation links

Oracle Secure Enterprise Search Java API Reference
10g Release 1 (10.1.8)

B32260-01


oracle.search.sdk.crawler
Interface DocumentContainer


public interface DocumentContainer

DocumentContainer is an interface used by a crawler plugin to submit/retrieve document information


Field Summary
static int STATUS_ACCESS_FORBIDDEN
          Access to this document is forbidden
static int STATUS_AUTH_REQUIRED
          Authorization required to access this document
static int STATUS_BAD_GATEWAY
          Bad gateway for accessing this document
static int STATUS_BAD_REQUEST
          Request to access this document is bad
static int STATUS_CANNOT_READ
          The document has no contents
static int STATUS_CONNECTION_REFUSED
          Connection refused when accessing this document
static int STATUS_DOC_BOUNDARY_RULE_EXCLUDED
          The document was excluded based on boundary rules
static int STATUS_DOC_MIME_TYPE_EXCLUDED
          The document was excluded based on mime type
static int STATUS_DOC_SIZE_TOO_BIG
          The document was too big to handle
static int STATUS_DUPLICATE_DOC
          duplicate document which should be ignored
static int STATUS_EMPTY_DOC
          The document has no contents
static int STATUS_FETCH_ERROR
          The document cannot be retrieved
static int STATUS_FILTER_ERROR
          The document can not be filtered
static int STATUS_IO_EXCEPTION
          IO exception when processing this document
static int STATUS_LOGIN_FAILED
          Login failed for this document
static int STATUS_NOTFOUND
          The document cannot be found
static int STATUS_OK_BUT_NO_INDEX
          The document is fine, but do not index it yet
static int STATUS_OK_CRAWLED
          The document was crawled ok
static int STATUS_OK_FOR_INDEX
          The document is ready to be indexed
static int STATUS_OUT_OF_MEMORY
          Out of memory error when processing this document
static int STATUS_PROXY_REQUIRED
          Proxy required to access this document
static int STATUS_READ_TIMEOUT
          Read timeout on this document
static int STATUS_REQUEST_TIMEOUT
          Request to access this document time out
static int STATUS_SERVER_ERROR
          The server has problem deliverying this dopcument

 

Method Summary
 void addAttachment(java.lang.String name, java.io.InputStream doc, java.lang.String contentType)
          Adding a binary document attachment.
 void addAttachment(java.lang.String name, java.io.Reader doc, java.lang.String contentType)
          Adding a text document attachment.
 void clearAttachments()
          Remove all attachments
 java.io.Reader getDocumentReader()
          Get the reader for the data that was set in the documentContainer.
 int getDocumentStatus()
          Get the status of the document (DocumentContainer.STATUS*)
 java.io.InputStream getDocumentStream()
          Get the stream for the data that was set in the documentContainer.
 DocumentMetadata getMetadata()
          Get the metadata object associated with this document container
 void setDocument(java.io.InputStream doc)
          Set a document using the InputStream.
 void setDocument(java.io.Reader doc)
          Set the document to be processed.
 void setDocumentStatus(int status)
          Set the status of the document.
 void setMetadata(DocumentMetadata meta)
          Set the Document Metadata object.

 

Field Detail

STATUS_OK_FOR_INDEX

public static final int STATUS_OK_FOR_INDEX
The document is ready to be indexed
See Also:
Constant Field Values

STATUS_BAD_REQUEST

public static final int STATUS_BAD_REQUEST
Request to access this document is bad
See Also:
Constant Field Values

STATUS_AUTH_REQUIRED

public static final int STATUS_AUTH_REQUIRED
Authorization required to access this document
See Also:
Constant Field Values

STATUS_ACCESS_FORBIDDEN

public static final int STATUS_ACCESS_FORBIDDEN
Access to this document is forbidden
See Also:
Constant Field Values

STATUS_NOTFOUND

public static final int STATUS_NOTFOUND
The document cannot be found
See Also:
Constant Field Values

STATUS_PROXY_REQUIRED

public static final int STATUS_PROXY_REQUIRED
Proxy required to access this document
See Also:
Constant Field Values

STATUS_REQUEST_TIMEOUT

public static final int STATUS_REQUEST_TIMEOUT
Request to access this document time out
See Also:
Constant Field Values

STATUS_SERVER_ERROR

public static final int STATUS_SERVER_ERROR
The server has problem deliverying this dopcument
See Also:
Constant Field Values

STATUS_BAD_GATEWAY

public static final int STATUS_BAD_GATEWAY
Bad gateway for accessing this document
See Also:
Constant Field Values

STATUS_FETCH_ERROR

public static final int STATUS_FETCH_ERROR
The document cannot be retrieved
See Also:
Constant Field Values

STATUS_READ_TIMEOUT

public static final int STATUS_READ_TIMEOUT
Read timeout on this document
See Also:
Constant Field Values

STATUS_FILTER_ERROR

public static final int STATUS_FILTER_ERROR
The document can not be filtered
See Also:
Constant Field Values

STATUS_OUT_OF_MEMORY

public static final int STATUS_OUT_OF_MEMORY
Out of memory error when processing this document
See Also:
Constant Field Values

STATUS_IO_EXCEPTION

public static final int STATUS_IO_EXCEPTION
IO exception when processing this document
See Also:
Constant Field Values

STATUS_CONNECTION_REFUSED

public static final int STATUS_CONNECTION_REFUSED
Connection refused when accessing this document
See Also:
Constant Field Values

STATUS_DUPLICATE_DOC

public static final int STATUS_DUPLICATE_DOC
duplicate document which should be ignored
See Also:
Constant Field Values

STATUS_EMPTY_DOC

public static final int STATUS_EMPTY_DOC
The document has no contents
See Also:
Constant Field Values

STATUS_LOGIN_FAILED

public static final int STATUS_LOGIN_FAILED
Login failed for this document
See Also:
Constant Field Values

STATUS_OK_BUT_NO_INDEX

public static final int STATUS_OK_BUT_NO_INDEX
The document is fine, but do not index it yet
See Also:
Constant Field Values

STATUS_OK_CRAWLED

public static final int STATUS_OK_CRAWLED
The document was crawled ok
See Also:
Constant Field Values

STATUS_CANNOT_READ

public static final int STATUS_CANNOT_READ
The document has no contents
See Also:
Constant Field Values

STATUS_DOC_SIZE_TOO_BIG

public static final int STATUS_DOC_SIZE_TOO_BIG
The document was too big to handle
See Also:
Constant Field Values

STATUS_DOC_MIME_TYPE_EXCLUDED

public static final int STATUS_DOC_MIME_TYPE_EXCLUDED
The document was excluded based on mime type
See Also:
Constant Field Values

STATUS_DOC_BOUNDARY_RULE_EXCLUDED

public static final int STATUS_DOC_BOUNDARY_RULE_EXCLUDED
The document was excluded based on boundary rules
See Also:
Constant Field Values

Method Detail

setMetadata

public void setMetadata(DocumentMetadata meta)
Set the Document Metadata object. This is required to provide metadata about the object such as the author, last modified date etc..
Parameters:
meta - the metadata object

getMetadata

public DocumentMetadata getMetadata()
Get the metadata object associated with this document container
Returns:
the metadata object

getDocumentReader

public java.io.Reader getDocumentReader()
Get the reader for the data that was set in the documentContainer.
Returns:
the reader object already set using the setDocument call. Null if not set.

getDocumentStream

public java.io.InputStream getDocumentStream()
Get the stream for the data that was set in the documentContainer.
Returns:
the stream object already set using the setDocument call. Null if not set.

setDocument

public void setDocument(java.io.Reader doc)
Set the document to be processed.
Parameters:
doc - the reader to read the document

setDocument

public void setDocument(java.io.InputStream doc)
Set a document using the InputStream. Typically used for binary documents.
Parameters:
doc - the input stream to read the document

addAttachment

public void addAttachment(java.lang.String name,
                          java.io.InputStream doc,
                          java.lang.String contentType)
Adding a binary document attachment. Note that the name of the attachment is not indexed.
Parameters:
name - the name of the attachment; null if unknown
doc - the input stream to read the attachment
contentType - the mimetype of the attachment, set to null if unknown

addAttachment

public void addAttachment(java.lang.String name,
                          java.io.Reader doc,
                          java.lang.String contentType)
Adding a text document attachment. Note that only text/plain, text/html, and multipart/mixed(email) attachment is supported.
Parameters:
name - the name of the attachment; null if unknown. Note that the name of the attachment is not indexed.
doc - the input text reader to read the attachment
contentType - the mimetype of the attachment, set to null if unknown

clearAttachments

public void clearAttachments()
Remove all attachments

getDocumentStatus

public int getDocumentStatus()
Get the status of the document (DocumentContainer.STATUS*)

setDocumentStatus

public void setDocumentStatus(int status)
                       throws ProcessingException
Set the status of the document. Only system defined status code(900-1200), HTTP status code (200-505), and user-defined status code in the range of 10000-1999 is allowed. Note that code defined in DocumentContainer.STATUS* is a subset of the system defined status code.
Parameters:
status - the document status code
Throws:
ProcessingException

Skip navigation links

Oracle Secure Enterprise Search Java API Reference
10g Release 1 (10.1.8)

B32260-01


Copyright © 2006, Oracle. All rights reserved.