Full and incremental crawling modes

The CAS Server crawls a data source in one of two modes:

Crawling in full mode

Crawling in full mode means that CAS processes all the content in a data source according to the filtering criteria you specify. As part of crawling a data source, CAS creates metadata information and stores it in a crawl history. This history includes the Id of each record and information about all properties on the record.

Crawling in incremental mode

Crawling in incremental mode means that CAS processes only that content whose metadata information, stored in the crawl history, has changed since the last crawl. Specifically, CAS checks all properties on the record to see if any have changed. If any properties have changed, the CAS Server crawls the content again. This is true in cases where CAS is calculating the incremental difference. An extension developer, using the CAS Extension API, may choose to calculate incremental changes in a data source extension.

CAS automatically determines which crawling mode is necessary. By default, CAS attempts to crawl in incremental mode. If necessary, CAS switches to crawling in full mode, if a crawl's configuration has unavailableIncrementalSwitchesToFullCrawl set to true, and any of the following conditions are true:
  • A data source has not been crawled before, which means no crawl history exists.
  • A Record Store instance does not contain at least one record generation. (This applies to cases where the CAS Server is configured to output to a Record Store instance rather than a file on disk.)
  • Seeds have been removed from the data source configuration (adding seeds does not require crawling in full mode).
  • The document conversion setting has changed.
  • Folder filters or file filters have been added, modified, or removed in the data source configuration.
  • Repository properties have been changed, such as the Gather native properties option for file system data sources.

If unavailableIncrementalSwitchesToFullCrawl is set to false and any of the above conditions are true, the crawl fails and throw and exception.

This switch from incremental to full mode can occur no matter how you run a crawl (using the CAS Console, the CAS Server API, or the CAS Server Command-line Utility).

After you click Start in CAS Console, you can click the link under Acquisition Status to see a status message indicating whether a full or incremental crawl is running. After you crawl a data source using the API, the status message is returned.

Incremental mode and MDEX compatible output

An incremental crawl processes only data records. It does not process any configuration stored in the IFCR (such as dimensions and properties, precedence rules, and so on), and it does not crawl dimension value records. By contrast, a full crawl processes data records, configuration in the IFCR, and dimension value records.