A crawl configuration specifies settings and processing
instructions for a crawl. When you start a crawl, CAS Server executes the
instructions in the following order:
sourceConfig
,
textExtractionConfig
,
manipulatorConfig
, and
outputConfig
. This topic provides additional detail about
execution order.
When CAS Server starts a crawl, the following happens:
CAS Server crawls files and folders according to the seeds and settings in
sourceConfig
, and CAS Server creates an Endeca record for each file and folder crawled.If
textExtractionConfig
is enabled and contains document conversion settings, then CAS Server performs document conversion and stores the converted text as a property on the Endeca record.If one or more
manipulatorConfig
elements are present, CAS Server passes the record to each manipulator for processing according to itsmanipulatorConfig
settings. Manipulators execute in the order in which they are nested withinmanipulatorConfigs
.CAS Server then writes the record to a Record Store instance (or an output file) according to the settings in according to
outputConfig
.
This processing continues until all files and folders are crawled and all records are processed. In this way, Endeca records are propagated through a crawl configuration.