![]() | Produces tally and optionally purges crawler history of previously rejected cards. |
![]() | Produces tally and optionally purges crawler history of previously deleted cards. |
![]() | Returns the Access Control List assigned to cards created by this crawler. |
![]() | Retrieves the expiration interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. The expiration delay refers to the amount of time that should elapse before a card is removed from the portal directory. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span. |
![]() | Returns the time units expressed as a PT_SCHEDULETYPES used to compute the expiration rate for cards brought in by this crawler. |
![]() | Returns the number of units used to compute the expiration delay. |
![]() | Retrieves the missing deletion delay interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. Missing document deletion delay refers to the amount of time that should elapse after a source document cannot be found before its card is removed by the Card Refresh Agent. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span. |
![]() | Returns the time units expressed as a PT_SCHEDULETYPES used to compute the missing deletion delay rate for cards brought in by this crawler. |
![]() | Returns the number of units used to compute the missing document deletion delay. |
![]() | Retrieves the card refresh interval for cards created by this crawler. The first parameter is returned as an Integer carrying a value indicating the units from which the interval is calculated. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span. |
![]() | Returns the time units expressed as a PT_SCHEDULETYPES used to compute the card refresh rate for cards brought in by this crawler. |
![]() | Returns the number of units used to compute the next card refresh rate |
![]() | For any requested PT_CARDREFRESHSETTING, returns a 2D array of dimensions [0][2], the first two elements of which contain the values for the number of units and their size (PT_SCHEDULETYPES) accordingly. |
![]() | Determines whether the crawler has been configured to mark its cards for link validation only. If true, this setting indicates that the Card Refresh Agent will only attempt to validate the continued existance of the source document and will avoid refreshing the cards properties. |
![]() | Returns the two character language designation determining how text retrieved from source documents will be indexed. |
![]() | Gets a human readable description of the intended crawl activity. This is distinct from the crawl description and does not support localization. |
![]() | |
![]() | Gets the current value each of the cards brought in by this crawler will inherit for their Crawler Tag property. The crawler tag property allows cards from a particular crawl to be marked with a user supplied string providing a searchable parameter to identify related cards. |
![]() | Returns the ID identifying the single data source on which this crawler depends. |
![]() | Retrieves a reference to the IPTDocumentTypeMap object used to determine which Document Type should apply to incoming documents based on a file extension or MIME type. |
![]() | Returns a value representing an OR'd list of PT_CRAWLER_SETTINGS. These settings govern a range of crawling behaviors including such directives as whether to mirror, refresh, or re-import cards. |
![]() | Retrieves the start location or root node designation for the crawl. This information is specified in a Data Source Provider specific format. |
![]() | Returns a reference to the crawler's taxonomist component containing a combination of settings for card classification behavior. |
![]() | Initializes internal crawler resources From the Data Source, the Crawler can get the supported data formats from the Data Source Provider Registry and then go to the Top Level Document Type Map and pick out the sections it needs (or is relevant) in the Crawler. This initializes the Crawler's Document Type Map (which can now be edited by the user). Also from the Data Source, the Crawler is able to figure out what its DataSourceCrawlProvider is. |
![]() | Sets the expiration delay for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The expiration delay rate is the amount of time that should elapse before the Card Refresh Agent should expire or remove the card. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span. |
![]() | Sets the missing deletion delay for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The missing deletion delay rate is the amount of time that should elapse before the Card Refresh Agent should attempt to delete the card after it's source document cannot be retrieved. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span. |
![]() | Sets the card refresh interval for cards created by this crawler. The first parameter is passed as an int whose value indicates the units from which the interval is calculated. The card refresh rate is the amount of time that should elapse before the Card Refresh Agent should attempt to refresh the card's properties against any changes to the source document. The unit will correspond to one of the PT_SCHEDULETYPES values. The second parameter is a multiplier to the unit time span. |
![]() | Configures the crawler to mark its cards for link validation only or not. If true, this setting indicates that the Card Refresh Agent will only attempt to validate the continued existance of the source document and will avoid refreshing the cards properties. |
![]() | Sets the two character language designation determining how text retrieved from source documents will be indexed. |
![]() | Set the property bag representing the runtime configuration parameters for this crawler. The runtime configuration would include such settings as a number of threads that a crawler uses to fetch URL, a number of threads used to index cards, etc. Since the crawling algorithm/design is constantly evolving, it is appropriate to have a generic bag to store the config values. This way new crawl algorithm tuning knobs can be added with mimimal modifications to the crawler model. |
![]() | Gets the current value each of the cards brought in by this crawler will inherit for their Crawler Tag property. The crawler tag property allows cards from a particular crawl to be marked with a user supplied string providing a searchable parameter to identify related cards. |
![]() | Sets a value representing an OR'd list of PT_CRAWLER_SETTINGS. These settings govern a range of crawling behaviors including such directives as whether to mirror, refresh, or re-import cards. |
![]() | Overloaded. Sets the start location or root node designation for the crawl. This information is specified in a Data Source Provider specific format. The String provided is expected to be a streamed version of the standard property bag format. |
IPTCrawler Interface | com.plumtree.server Namespace