Skip navigation links

Oracle Secure Enterprise Search Java API Reference
10g Release 1 (10.1.8.2)

E10465-01


oracle.search.sdk.crawler
Interface DocumentService


public interface DocumentService

DocumentService is an interface used by a document service plug-in to submit document attributes and/or document contents to the crawler


Field Summary
static int STATUS_ATTRIBUTE_CHANGE
          Attribute has been modified.
static int STATUS_CONTENT_CHANGE
          Document content has been modified.
static int STATUS_LANGUAGE_CHANGE
          Document language has changed.
static int STATUS_NO_CHANGE
          No change to the document attribute, content, and index status
static int STATUS_NO_INDEX
          Do not index this document

 

Method Summary
 void close()
          Shut down the document service plug-in
 void init()
          Initialize the document service plug-in
 int process(DocumentContainer doc)
          Process the document; this is the entry point for document process.

 

Field Detail

STATUS_NO_CHANGE

public static final int STATUS_NO_CHANGE
No change to the document attribute, content, and index status
See Also:
Constant Field Values

STATUS_NO_INDEX

public static final int STATUS_NO_INDEX
Do not index this document
See Also:
Constant Field Values

STATUS_ATTRIBUTE_CHANGE

public static final int STATUS_ATTRIBUTE_CHANGE
Attribute has been modified. This is ignored if STATUS_NO_INDEX is set.
See Also:
Constant Field Values

STATUS_CONTENT_CHANGE

public static final int STATUS_CONTENT_CHANGE
Document content has been modified. This is ignored if STATUS_NO_INDEX is set.
See Also:
Constant Field Values

STATUS_LANGUAGE_CHANGE

public static final int STATUS_LANGUAGE_CHANGE
Document language has changed. This is ignored if STATUS_NO_INDEX is set.
See Also:
Constant Field Values

Method Detail

init

public void init()
          throws DocumentServiceException
Initialize the document service plug-in
Throws:
DocumentServiceException

process

public int process(DocumentContainer doc)
            throws DocumentServiceException
Process the document; this is the entry point for document process. The input argument doc provides access to the document content and attributes through DocumentContainer. The content is in HTML format. For binary document, access to the original content is possible through DocumentContainer.getBinaryDocumentStream. Document attribute is accessed through DocumentMetadata object from doc. This function returns an integer with the corresponding action flag for index, attribute change, language change, and/or content change. For example, return (STATUS_ATTRIBUTE_CHANGE | STATUS_CONTENT_CHANGE) means document content and attribute change. If there is content change, use DocumentContainer.setDocument(Reader doc) to returns the new content. If there is attribute change, change it through DocumentMetadata. Note that any document property like display URL, access URL, content type, crawl depth, source hierarchy, last modified date, ACLInfo, and content length cannot be changed. Crawler will shutdown if a fatal DocumentServiceException or Uncaught Exception is thrown.
Parameters:
doc - DocumentContainer
Returns:
int a status code indicating required action from the crawler.
Throws:
DocumentServiceException - if unable to process the document. Throw fatal DocumentServiceException to stop the crawler. Any uncaught exception is treated as fatal error.

close

public void close()
           throws DocumentServiceException
Shut down the document service plug-in
Throws:
DocumentServiceException - if unable to close the plug-in

Skip navigation links

Oracle Secure Enterprise Search Java API Reference
10g Release 1 (10.1.8.2)

E10465-01


Copyright © 2006, 2007, Oracle. All rights reserved.