com.plumtree.server
Interface IPTCrawler

All Superinterfaces:
IPTLocalizable, IPTObject, IPTServerContext, IPTStorable, IPTUnknown

public interface IPTCrawler
extends IPTObject

Version:
$Revision$
Author:
OlegS

Method Summary
 int ClearCardRejectsMemory(boolean bExecute)
          Produces tally and optionally purges crawler history of previously rejected cards.
 int ClearDeletedCardsMemory(boolean bExecute, boolean bScopeDataSource)
          Produces tally and optionally purges crawler history of previously deleted cards.
 IPTAccessList GetCardACL()
          Returns the Access Control List assigned to cards created by this crawler.
 void GetCardExpirationDelay(java.lang.Object plTimeUnits, java.lang.Object plNumber)
          Deprecated.  
 int GetCardExpirationDelayUnits()
          Returns the time units expressed as a PT_SCHEDULETYPES used to compute the expiration rate for cards brought in by this crawler.
 int GetCardExpirationDelayValue()
          Returns the number of units used to compute the expiration delay.
 void GetCardMissingDocumentDeletionDelay(java.lang.Object plTimeUnits, java.lang.Object plNumber)
          Deprecated.  
 int GetCardMissingDocumentDeletionDelayUnits()
          Returns the time units expressed as a PT_SCHEDULETYPES used to compute the missing deletion delay rate for cards brought in by this crawler.
 int GetCardMissingDocumentDeletionDelayValue()
          Returns the number of units used to compute the missing document deletion delay.
 void GetCardRefreshRate(java.lang.Object plTimeUnits, java.lang.Object plNumber)
          Deprecated.  
 int GetCardRefreshRateUnits()
          Returns the time units expressed as a PT_SCHEDULETYPES used to compute the card refresh rate for cards brought in by this crawler.
 int GetCardRefreshRateValue()
          Returns the number of units used to compute the next card refresh rate
 java.lang.Object[][] GetCardRefreshSetting(int lRefreshSetting)
          For any requested PT_CARDREFRESHSETTING, returns a 2D array of dimensions [0][2], the first two elements of which contain the values for the number of units and their size (PT_SCHEDULETYPES) accordingly.
 boolean GetCheckLinkOnly()
          Determines whether the crawler has been configured to mark its cards for link validation only.
 java.lang.String GetContentLanguage()
          Returns the two character language designation determining how text retrieved from source documents will be indexed.
 java.lang.String GetCrawlDescription()
          Gets a human readable description of the intended crawl activity.
 com.plumtree.openfoundation.util.IXPPropertyBag GetCrawlerRuntimeConfiguration()
           
 java.lang.String GetCrawlerTag()
          Gets the current value each of the cards brought in by this crawler will inherit for their Crawler Tag property.
 int GetDataSourceID()
          Returns the ID identifying the single data source on which this crawler depends.
 IPTDocumentTypeMap GetDocumentTypeMap()
          Retrieves a reference to the IPTDocumentTypeMap object used to determine which Document Type should apply to incoming documents based on a file extension or MIME type.
 int GetSettings()
          Returns a value representing an OR'd list of PT_CRAWLER_SETTINGS.
 com.plumtree.openfoundation.util.IXPPropertyBag GetStartLocation()
          Retrieves the start location or root node designation for the crawl.
 IPTTaxonomist GetTaxonomist()
          Returns a reference to the crawler's taxonomist component containing a combination of settings for card classification behavior.
 void Initialize(int lDataSourceID)
          Initializes internal crawler resources From the Data Source, the Crawler can get the supported data formats from the Data Source Provider Registry and then go to the Top Level Document Type Map and pick out the sections it needs (or is relevant) in the Crawler.
 void SetCardExpirationDelay(int lTimeUnits, int lNumber)
          Sets the expiration delay for cards created by this crawler.
 void SetCardMissingDocumentDeletionDelay(int lTimeUnits, int lNumber)
          Sets the missing deletion delay for cards created by this crawler.
 void SetCardRefreshRate(int lTimeUnits, int lNumber)
          Sets the card refresh interval for cards created by this crawler.
 void SetCheckLinkOnly(boolean Value)
          Configures the crawler to mark its cards for link validation only or not.
 void SetContentLanguage(java.lang.String Value)
          Sets the two character language designation determining how text retrieved from source documents will be indexed.
 void SetCrawlerRuntimeConfiguration(com.plumtree.openfoundation.util.IXPPropertyBag Value)
          Set the property bag representing the runtime configuration parameters for this crawler.
 void SetCrawlerTag(java.lang.String Value)
          Gets the current value each of the cards brought in by this crawler will inherit for their Crawler Tag property.
 void SetSettings(int Value)
          Sets a value representing an OR'd list of PT_CRAWLER_SETTINGS.
 void SetStartLocation(com.plumtree.openfoundation.util.IXPPropertyBag Value)
          Sets the start location or root node designation for the crawl.
 void SetStartLocation(java.lang.String Value)
          Sets the start location or root node designation for the crawl.
 
Methods inherited from interface com.plumtree.server.IPTObject
GetAdminFolderID, GetClassID, GetCreated, GetImageUUID, GetLastModified, GetObjectProperties, SetAdminFolderID, SetImageUUID, SetLastModified
 
Methods inherited from interface com.plumtree.server.IPTLocalizable
GetDescription, GetIsLocalized, GetLocalizedDescription, GetLocalizedDescriptions, GetLocalizedName, GetLocalizedNames, GetName, GetPrimaryLang, GetSupportsLocalization, SetDescription, SetIsLocalized, SetLocalizedDescriptions, SetLocalizedNames, SetName, SetPrimaryLang
 
Methods inherited from interface com.plumtree.server.IPTUnknown
GetInterfaces
 
Methods inherited from interface com.plumtree.server.IPTServerContext
GetAccessLevel, GetACL, GetLastModifiedBy, GetLockState, GetObjectID, GetOwnerID, GetServerContextSettings, GetSession, GetSettings, LockObject, SetLastModifiedBy, SetObjectID, SetOwnerID, SetServerContextSettings, SetSettings, UnlockObject
 
Methods inherited from interface com.plumtree.server.IPTStorable
Store
 

Method Detail

Initialize

void Initialize(int lDataSourceID)
Initializes internal crawler resources From the Data Source, the Crawler can get the supported data formats from the Data Source Provider Registry and then go to the Top Level Document Type Map and pick out the sections it needs (or is relevant) in the Crawler. This initializes the Crawler's Document Type Map (which can now be edited by the user). Also from the Data Source, the Crawler is able to figure out what its DataSourceCrawlProvider is.

Parameters:
lDataSourceID - The ID identifying the single data source on which this crawler depends.

GetDataSourceID

int GetDataSourceID()
Returns the ID identifying the single data source on which this crawler depends.

Returns:
an int representing the crawler's single data source.

GetDocumentTypeMap

IPTDocumentTypeMap GetDocumentTypeMap()
Retrieves a reference to the IPTDocumentTypeMap object used to determine which Document Type should apply to incoming documents based on a file extension or MIME type.

Returns:
an IPTDocumentTypeMap object for mapping Document Types

GetTaxonomist

IPTTaxonomist GetTaxonomist()
Returns a reference to the crawler's taxonomist component containing a combination of settings for card classification behavior.

Returns:
an IPTTaxonomist object with classification settings.

GetSettings

int GetSettings()
Returns a value representing an OR'd list of PT_CRAWLER_SETTINGS. These settings govern a range of crawling behaviors including such directives as whether to mirror, refresh, or re-import cards.

Returns:
An int value resulting from an OR'd product of PT_CRAWLER_SETTINGS.

SetSettings

void SetSettings(int Value)
Sets a value representing an OR'd list of PT_CRAWLER_SETTINGS. These settings govern a range of crawling behaviors including such directives as whether to mirror, refresh, or re-import cards.

Parameters:
Value - An int value resulting from an OR'd product of PT_CRAWLER_SETTINGS.

GetStartLocation

com.plumtree.openfoundation.util.IXPPropertyBag GetStartLocation()
Retrieves the start location or root node designation for the crawl. This information is specified in a Data Source Provider specific format.

Returns:
a property bag specifying this crawl's start location.

SetStartLocation

void SetStartLocation(java.lang.String Value)
Sets the start location or root node designation for the crawl. This information is specified in a Data Source Provider specific format. The String provided is expected to be a streamed version of the standard property bag format.

Parameters:
Value - a String specifying this crawl's start location.

SetStartLocation

void SetStartLocation(com.plumtree.openfoundation.util.IXPPropertyBag Value)
Sets the start location or root node designation for the crawl. This information is specified in a Data Source Provider specific format.

Parameters:
Value - a property bag specifying this crawl's start location.

GetCrawlDescription

java.lang.String GetCrawlDescription()
Gets a human readable description of the intended crawl activity. This is distinct from the crawl description and does not support localization.

Returns:
A readable String describing crawl behavior.

ClearCardRejectsMemory

int ClearCardRejectsMemory(boolean bExecute)
Produces tally and optionally purges crawler history of previously rejected cards.

Parameters:
bExecute - If passed as true, purges rejected card history.
Returns:
The sum of all cards rejected by this crawler at the time of the call.

ClearDeletedCardsMemory

int ClearDeletedCardsMemory(boolean bExecute,
                            boolean bScopeDataSource)
Produces tally and optionally purges crawler history of previously deleted cards.

Parameters:
bExecute - If passed as true, purges deleted card history.
Returns:
The sum of all cards brought in by this crawler and subsequently deleted.

GetCardACL

IPTAccessList GetCardACL()
Returns the Access Control List assigned to cards created by this crawler.

Returns:
The Access Control List this crawler uses for its cards.

GetCardRefreshRate

void GetCardRefreshRate(java.lang.Object plTimeUnits,
                        java.lang.Object plNumber)
Deprecated. 

Retrieves the card refresh interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.

Parameters:
plTimeUnits - An Integer to hold the PT_SCHEDULETYPES value.
plNumber - An Integer to hold the number of time units applied.

SetCardRefreshRate

void SetCardRefreshRate(int lTimeUnits,
                        int lNumber)
Sets the card refresh interval for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The card refresh rate is the amount of time that should elapse before the Card Refresh Agent should attempt to refresh the card's properties against any changes to the source document. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.

Parameters:
plTimeUnits - An int indicating the PT_SCHEDULETYPES value.
plNumber - An int indicating the number of time units applied.

GetCardMissingDocumentDeletionDelay

void GetCardMissingDocumentDeletionDelay(java.lang.Object plTimeUnits,
                                         java.lang.Object plNumber)
Deprecated. 

Retrieves the missing deletion delay interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. Missing document deletion delay refers to the amount of time that should elapse after a source document cannot be found before its card is removed by the Card Refresh Agent. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.

Parameters:
plTimeUnits - An Integer to hold the PT_SCHEDULETYPES value.
plNumber - An Integer to hold the number of time units applied.

SetCardMissingDocumentDeletionDelay

void SetCardMissingDocumentDeletionDelay(int lTimeUnits,
                                         int lNumber)
Sets the missing deletion delay for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The missing deletion delay rate is the amount of time that should elapse before the Card Refresh Agent should attempt to delete the card after it's source document cannot be retrieved. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.

Parameters:
plTimeUnits - An int indicating the PT_SCHEDULETYPES value.
plNumber - An int indicating the number of time units applied.

GetCardExpirationDelay

void GetCardExpirationDelay(java.lang.Object plTimeUnits,
                            java.lang.Object plNumber)
Deprecated. 

Retrieves the expiration interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. The expiration delay refers to the amount of time that should elapse before a card is removed from the portal directory. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.

Parameters:
plTimeUnits - An Integer to hold the PT_SCHEDULETYPES value.
plNumber - An Integer to hold the number of time units applied.

SetCardExpirationDelay

void SetCardExpirationDelay(int lTimeUnits,
                            int lNumber)
Sets the expiration delay for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The expiration delay rate is the amount of time that should elapse before the Card Refresh Agent should expire or remove the card. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.

Parameters:
plTimeUnits - An int indicating the PT_SCHEDULETYPES value.
plNumber - An int indicating the number of time units applied.

GetCheckLinkOnly

boolean GetCheckLinkOnly()
Determines whether the crawler has been configured to mark its cards for link validation only. If true, this setting indicates that the Card Refresh Agent will only attempt to validate the continued existance of the source document and will avoid refreshing the cards properties.

Returns:
a boolean, if true indicates link validation only.

SetCheckLinkOnly

void SetCheckLinkOnly(boolean Value)
Configures the crawler to mark its cards for link validation only or not. If true, this setting indicates that the Card Refresh Agent will only attempt to validate the continued existance of the source document and will avoid refreshing the cards properties.

Parameters:
Value - a boolean, if true sets link validation only.

GetCardRefreshSetting

java.lang.Object[][] GetCardRefreshSetting(int lRefreshSetting)
For any requested PT_CARDREFRESHSETTING, returns a 2D array of dimensions [0][2], the first two elements of which contain the values for the number of units and their size (PT_SCHEDULETYPES) accordingly.

Parameters:
lRefreshSetting - a PT_CARDREFRESHSETTING
Returns:
a 2D array containing the quantity and time units of the interval

GetCardRefreshRateUnits

int GetCardRefreshRateUnits()
Returns the time units expressed as a PT_SCHEDULETYPES used to compute the card refresh rate for cards brought in by this crawler.

Returns:
an int as a PT_SCHEDULETYPES

GetCardRefreshRateValue

int GetCardRefreshRateValue()
Returns the number of units used to compute the next card refresh rate

Returns:
an int for the number of refresh units.

GetCardMissingDocumentDeletionDelayUnits

int GetCardMissingDocumentDeletionDelayUnits()
Returns the time units expressed as a PT_SCHEDULETYPES used to compute the missing deletion delay rate for cards brought in by this crawler.

Returns:
an int as a PT_SCHEDULETYPES

GetCardMissingDocumentDeletionDelayValue

int GetCardMissingDocumentDeletionDelayValue()
Returns the number of units used to compute the missing document deletion delay.

Returns:
an int for the number of deletion delay units.

GetCardExpirationDelayUnits

int GetCardExpirationDelayUnits()
Returns the time units expressed as a PT_SCHEDULETYPES used to compute the expiration rate for cards brought in by this crawler.

Returns:
an int as a PT_SCHEDULETYPES

GetCardExpirationDelayValue

int GetCardExpirationDelayValue()
Returns the number of units used to compute the expiration delay.

Returns:
an int for the number of expiration units.

GetContentLanguage

java.lang.String GetContentLanguage()
Returns the two character language designation determining how text retrieved from source documents will be indexed.

Returns:
a String containing a two letter language code

SetContentLanguage

void SetContentLanguage(java.lang.String Value)
Sets the two character language designation determining how text retrieved from source documents will be indexed.

Parameters:
a - String containing a two letter language code

GetCrawlerTag

java.lang.String GetCrawlerTag()
Gets the current value each of the cards brought in by this crawler will inherit for their Crawler Tag property. The crawler tag property allows cards from a particular crawl to be marked with a user supplied string providing a searchable parameter to identify related cards.

Returns:
a String containing the user supplied crawler tag.

SetCrawlerTag

void SetCrawlerTag(java.lang.String Value)
Gets the current value each of the cards brought in by this crawler will inherit for their Crawler Tag property. The crawler tag property allows cards from a particular crawl to be marked with a user supplied string providing a searchable parameter to identify related cards.

Parameters:
a - String containing the user supplied crawler tag.

SetCrawlerRuntimeConfiguration

void SetCrawlerRuntimeConfiguration(com.plumtree.openfoundation.util.IXPPropertyBag Value)
Set the property bag representing the runtime configuration parameters for this crawler. The runtime configuration would include such settings as a number of threads that a crawler uses to fetch URL, a number of threads used to index cards, etc. Since the crawling algorithm/design is constantly evolving, it is appropriate to have a generic bag to store the config values. This way new crawl algorithm tuning knobs can be added with mimimal modifications to the crawler model.

Parameters:
Value - - the property bag containing all the crawler runtime configuration values.

GetCrawlerRuntimeConfiguration

com.plumtree.openfoundation.util.IXPPropertyBag GetCrawlerRuntimeConfiguration()
Returns:
- a property bag containing all the crawler runtime configuration values
See Also:
SetCrawlerRuntimeConfiguration(IXPPropertyBag)


Copyright 2008 Plumtree Software Inc. All Rights Reserved.