The Endeca Information Access Platform supports file system and Web crawlers
that gather data from source documents and allow you to write out
Endeca records based on the source documents.
You can modify the
records as necessary in your pipeline by adding dimension and
property values, and build an Endeca application to access the
records. This allows your application users to search and navigate
the document contents contained in the records. The following crawlers are supported:
- The Endeca Web Crawler is appropriate for large-scale
crawling of Web documents. Supports crawling HTTP and HTTPS sites.
Available as part of the Endeca CAS package.
- The Endeca CAS Server can run both file system and CMS (Content Management System) crawls. Supports crawling Windows and UNIX systems.
Available as part of the Endeca CAS package.
- The Endeca Crawler is a lightweight crawler that is configured via a Spider component in Developer Studio. Supports crawling HTTP, HTTPS, and file systems. Note that the Endeca Crawler is deprecated, and therefore it is recommended that you use the Endeca Web Crawler or the Endeca CAS Server for your crawling requirements.
For more information on the CAS crawlers, see the Endeca Web Crawler Guide or the Endeca CAS Server Guide. For details on the Endeca Crawler, see the
Endeca Forge
Guide.