Oracle Commerce Guided Search - Crawling in incremental mode

Crawling in incremental mode

Crawling in incremental mode means that CAS processes only that content whose metadata information, stored in the crawl history, has changed since the last crawl. Specifically, CAS checks all properties on the record to see if any have changed. If any properties have changed, the CAS Server crawls the content again. This is true in cases where CAS is calculating the incremental difference. An extension developer, using the CAS Extension API, may choose to calculate incremental changes in a data source extension.

CAS automatically determines which crawling mode is necessary. By default, CAS attempts to crawl in incremental mode. If necessary, CAS switches to crawling in full mode, if a crawl's configuration has unavailableIncrementalSwitchesToFullCrawl set to true, and any of the following conditions are true:

A data source has not been crawled before, which means no crawl history exists.
A Record Store instance does not contain at least one record generation. (This applies to cases where the CAS Server is configured to output to a Record Store instance rather than a file on disk.)
Seeds have been removed from the data source configuration (adding seeds does not require crawling in full mode).
The document conversion setting has changed.
Folder filters or file filters have been added, modified, or removed in the data source configuration.
Repository properties have been changed, such as the Gather native properties option for file system data sources.

If unavailableIncrementalSwitchesToFullCrawl is set to false and any of the above conditions are true, the crawl fails and throw and exception.

This switch from incremental to full mode can occur no matter how you run a crawl (using the CAS Console, the CAS Server API, or the CAS Server Command-line Utility).

After you click Start in CAS Console, you can click the link under Acquisition Status to see a status message indicating whether a full or incremental crawl is running. After you crawl a data source using the API, the status message is returned.

Copyright © Legal Notices