Skip navigation links

Oracle Secure Enterprise Search Java API Reference
11g Release 1 (11.1.2.0.0)

E14433-02


oracle.search.sdk.crawler
Interface DocumentService


public interface DocumentService

DocumentService is an interface used by a document service plug-in to submit document attributes and document contents to the crawler.


Field Summary
static int STATUS_ATTRIBUTE_CHANGE
          Attribute has been modified.
static int STATUS_CONTENT_CHANGE
          Document content has been modified.
static int STATUS_LANGUAGE_CHANGE
          Document language has changed.
static int STATUS_NO_CHANGE
          No change to the document attribute, content, and index status.
static int STATUS_NO_INDEX
          Do not index this document.

 

Method Summary
 void close()
          Shuts down the document service plug-in.
 void init()
          Initialize the document service plug-in
 int process(DocumentContainer doc)
          Processes the document.

 

Field Detail

STATUS_NO_CHANGE

static final int STATUS_NO_CHANGE
No change to the document attribute, content, and index status.
See Also:
Constant Field Values

STATUS_NO_INDEX

static final int STATUS_NO_INDEX
Do not index this document.
See Also:
Constant Field Values

STATUS_ATTRIBUTE_CHANGE

static final int STATUS_ATTRIBUTE_CHANGE
Attribute has been modified. This is ignored if STATUS_NO_INDEX is set.
See Also:
Constant Field Values

STATUS_CONTENT_CHANGE

static final int STATUS_CONTENT_CHANGE
Document content has been modified. This is ignored if STATUS_NO_INDEX is set.
See Also:
Constant Field Values

STATUS_LANGUAGE_CHANGE

static final int STATUS_LANGUAGE_CHANGE
Document language has changed. This is ignored if STATUS_NO_INDEX is set.
See Also:
Constant Field Values

Method Detail

init

void init()
          throws DocumentServiceException
Initialize the document service plug-in
Throws:
DocumentServiceException

process

int process(DocumentContainer doc)
            throws DocumentServiceException
Processes the document. This is the entry point for document process. The input argument doc provides access to the document content and attributes through a DocumentContainer. The content is in HTML format. For binary documents, access to the original content is possible through DocumentContainer.getBinaryDocumentStream. Document attributes are accessed through a DocumentMetadata object from doc. This function returns an integer with the corresponding action flag for an index, attribute, language, and/or content change. For example, a return value of (STATUS_ATTRIBUTE_CHANGE | STATUS_CONTENT_CHANGE) means that both the document content and the attribute changed. For a content change, use DocumentContainer.setDocument(Reader doc) to returns the new content. For an attribute change, use DocumentMetadata. Any document property like display URL, access URL, content type, crawl depth, source hierarchy, last modified date, ACLInfo, and content length cannot be changed. The crawler stops if a fatal DocumentServiceException or Uncaught Exception is thrown.
Parameters:
doc - DocumentContainer
Returns:
int - A status code indicating the required action from the crawler.
Throws:
DocumentServiceException - If unable to process the document. This stops the crawler. Any uncaught exception is treated as fatal error.

close

void close()
           throws DocumentServiceException
Shuts down the document service plug-in.
Throws:
DocumentServiceException - If unable to close the plug-in.

Skip navigation links

Oracle Secure Enterprise Search Java API Reference
11g Release 1 (11.1.2.0.0)

E14433-02


Copyright © 2006, 2010, Oracle and/or its affiliates. All rights reserved.