Crawling source data

The Endeca Information Access Platform supports file system and Web crawlers that gather data from source documents and allow you to write out Endeca records based on the source documents.

You can modify the records as necessary in your pipeline by adding dimension and property values, and build an Endeca application to access the records. This allows your application users to search and navigate the document contents contained in the records. The following crawlers are supported:

The Endeca Web Crawler is appropriate for large-scale crawling of Web documents. Supports crawling HTTP and HTTPS sites. Available as part of the Endeca CAS package.
The Endeca CAS Server can run both file system and CMS (Content Management System) crawls. Supports crawling Windows and UNIX systems. Available as part of the Endeca CAS package.
The Endeca Crawler is a lightweight crawler that is configured via a Spider component in Developer Studio. Supports crawling HTTP, HTTPS, and file systems. Note that the Endeca Crawler is deprecated, and therefore it is recommended that you use the Endeca Web Crawler or the Endeca CAS Server for your crawling requirements.

For more information on the CAS crawlers, see the Endeca Web Crawler Guide or the Endeca CAS Server Guide. For details on the Endeca Crawler, see the Endeca Forge Guide.