Aqualogic Interaction API  
 

IPTCrawler Members

IPTCrawler overview

Public Instance Methods

ClearCardRejectsMemory Produces tally and optionally purges crawler history of previously rejected cards.
ClearDeletedCardsMemory Produces tally and optionally purges crawler history of previously deleted cards.
GetCardACL Returns the Access Control List assigned to cards created by this crawler.
GetCardExpirationDelay Retrieves the expiration interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. The expiration delay refers to the amount of time that should elapse before a card is removed from the portal directory. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.
GetCardExpirationDelayUnits Returns the time units expressed as a PT_SCHEDULETYPES used to compute the expiration rate for cards brought in by this crawler.
GetCardExpirationDelayValue Returns the number of units used to compute the expiration delay.
GetCardMissingDocumentDeletionDelay Retrieves the missing deletion delay interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. Missing document deletion delay refers to the amount of time that should elapse after a source document cannot be found before its card is removed by the Card Refresh Agent. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.
GetCardMissingDocumentDeletionDelayUnits Returns the time units expressed as a PT_SCHEDULETYPES used to compute the missing deletion delay rate for cards brought in by this crawler.
GetCardMissingDocumentDeletionDelayValue Returns the number of units used to compute the missing document deletion delay.
GetCardRefreshRate Retrieves the card refresh interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.
GetCardRefreshRateUnits Returns the time units expressed as a PT_SCHEDULETYPES used to compute the card refresh rate for cards brought in by this crawler.
GetCardRefreshRateValue Returns the number of units used to compute the next card refresh rate
GetCardRefreshSetting For any requested PT_CARDREFRESHSETTING, returns a 2D array of dimensions [0][2], the first two elements of which contain the values for the number of units and their size (PT_SCHEDULETYPES) accordingly.
GetCheckLinkOnly Determines whether the crawler has been configured to mark its cards for link validation only. If true, this setting indicates that the Card Refresh Agent will only attempt to validate the continued existance of the source document and will avoid refreshing the cards properties.
GetContentLanguage Returns the two character language designation determining how text retrieved from source documents will be indexed.
GetCrawlDescription Gets a human readable description of the intended crawl activity. This is distinct from the crawl description and does not support localization.
GetCrawlerRuntimeConfiguration 
GetCrawlerTag Gets the current value each of the cards brought in by this crawler will inherit for their Crawler Tag property. The crawler tag property allows cards from a particular crawl to be marked with a user supplied string providing a searchable parameter to identify related cards.
GetDataSourceID Returns the ID identifying the single data source on which this crawler depends.
GetDocumentTypeMap Retrieves a reference to the IPTDocumentTypeMap object used to determine which Document Type should apply to incoming documents based on a file extension or MIME type.
GetSettings Returns a value representing an OR'd list of PT_CRAWLER_SETTINGS. These settings govern a range of crawling behaviors including such directives as whether to mirror, refresh, or re-import cards.
GetStartLocation Retrieves the start location or root node designation for the crawl. This information is specified in a Data Source Provider specific format.
GetTaxonomist Returns a reference to the crawler's taxonomist component containing a combination of settings for card classification behavior.
Initialize Initializes internal crawler resources From the Data Source, the Crawler can get the supported data formats from the Data Source Provider Registry and then go to the Top Level Document Type Map and pick out the sections it needs (or is relevant) in the Crawler. This initializes the Crawler's Document Type Map (which can now be edited by the user). Also from the Data Source, the Crawler is able to figure out what its DataSourceCrawlProvider is.
SetCardExpirationDelay Sets the expiration delay for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The expiration delay rate is the amount of time that should elapse before the Card Refresh Agent should expire or remove the card. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.
SetCardMissingDocumentDeletionDelay Sets the missing deletion delay for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The missing deletion delay rate is the amount of time that should elapse before the Card Refresh Agent should attempt to delete the card after it's source document cannot be retrieved. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.
SetCardRefreshRate Sets the card refresh interval for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The card refresh rate is the amount of time that should elapse before the Card Refresh Agent should attempt to refresh the card's properties against any changes to the source document. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span.
SetCheckLinkOnly Configures the crawler to mark its cards for link validation only or not. If true, this setting indicates that the Card Refresh Agent will only attempt to validate the continued existance of the source document and will avoid refreshing the cards properties.
SetContentLanguage Sets the two character language designation determining how text retrieved from source documents will be indexed.
SetCrawlerRuntimeConfiguration Set the property bag representing the runtime configuration parameters for this crawler. The runtime configuration would include such settings as a number of threads that a crawler uses to fetch URL, a number of threads used to index cards, etc. Since the crawling algorithm/design is constantly evolving, it is appropriate to have a generic bag to store the config values. This way new crawl algorithm tuning knobs can be added with mimimal modifications to the crawler model.
SetCrawlerTag Gets the current value each of the cards brought in by this crawler will inherit for their Crawler Tag property. The crawler tag property allows cards from a particular crawl to be marked with a user supplied string providing a searchable parameter to identify related cards.
SetSettings Sets a value representing an OR'd list of PT_CRAWLER_SETTINGS. These settings govern a range of crawling behaviors including such directives as whether to mirror, refresh, or re-import cards.
SetStartLocationOverloaded. Sets the start location or root node designation for the crawl. This information is specified in a Data Source Provider specific format. The String provided is expected to be a streamed version of the standard property bag format.

See Also

IPTCrawler Interface | com.plumtree.server Namespace