CrawlingThreadService (Oracle Secure Enterprise Search Java API Reference)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Oracle Secure Enterprise Search Java API Reference
11g Release 1 (11.1.2.0.0)
E14433-02

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

oracle.search.sdk.crawler
Interface CrawlingThreadService

public interface CrawlingThreadService

CrawlingThreadService is an interface used by a crawler plug-in to perform crawl-related tasks. Its execution is context-specific to the crawling thread that invokes the plug-in crawl() method.

Field Summary
`static int`	`DOC_EXCLUDED_BY_MIMETYPE`
`static int`	`DOC_EXCLUDED_BY_SIZE`
`static int`	`DOC_EXCLUDED_BY_URL_BOUNDARY`
`static int`	`DOC_INCLUDED`

Method Summary
`int`	`checkDocumentExcluded(DocumentMetadata meta)` Checks if the document should be crawled.
`String`	`inferMimeType(String url)` Checks the mime type based on the URL suffix.
`void`	`markStatusNotChanged(DocumentMetadata meta)` Marks a URL entry as not requiring any changes or updates.
`void`	`submitForProcessing(DocumentContainer target)` Submits the document for processing.

Field Detail

DOC_INCLUDED

static final int DOC_INCLUDED

See Also:: Constant Field Values

DOC_EXCLUDED_BY_URL_BOUNDARY

static final int DOC_EXCLUDED_BY_URL_BOUNDARY

See Also:: Constant Field Values

DOC_EXCLUDED_BY_MIMETYPE

static final int DOC_EXCLUDED_BY_MIMETYPE

See Also:: Constant Field Values

DOC_EXCLUDED_BY_SIZE

static final int DOC_EXCLUDED_BY_SIZE

See Also:: Constant Field Values

Method Detail

submitForProcessing

void submitForProcessing(DocumentContainer target)
                         throws ProcessingException

Submits the document for processing. It is indexed if its status code is DocumentContainer.STATUS_OK_FOR_INDEX. After the processing is done, this document is automatically removed from the queue. The DocumentMetadata in the submitted target is cleared automatically if the operation is a success.

Parameters:: target - - the document container containing the content and metadata.
Throws:: ProcessingException

markStatusNotChanged

void markStatusNotChanged(DocumentMetadata meta)
                          throws ProcessingException

Marks a URL entry as not requiring any changes or updates. This removes the entry from the URL queue and does not re-index or perform any additional operations on this URL entry. This should be used when re-crawling content and when there is no change to a particular URL.

Parameters:: meta - - the metadata object corresponding to the URL entry
Throws:: ProcessingException

checkDocumentExcluded

int checkDocumentExcluded(DocumentMetadata meta)

Checks if the document should be crawled. The check stops if a rule excludes the document and it returns only status code for this rule.
To avoid the overhead of processing the excluded documents, call this method before enqueuing or submitting the document. If the document size or MIME type information is not available, then rules based them are not applicable. The check order is: boundary, MIME type and size.

The internal exclusion checking always occurs when submitting the documents.

Parameters:: meta - the document metadata
Returns:: CrawlingThreadService.DOC_INCLUDED, CrawlingThreadService.DOC_EXCLUDED_BY_URL_BOUNDARY, CrawlingThreadService.DOC_EXCLUDED_BY_MIMETYPE, or CrawlingThreadService.DOC_EXCLUDED_BY_SIZE

inferMimeType

String inferMimeType(String url)

Checks the mime type based on the URL suffix.

Parameters:: url - the document URL
Returns:: MIME type such as text/html. If no application is associated with the suffix or there is no suffix, return null.