Overview of the Endeca Crawler

The Endeca Crawler provides the capability to crawl file systems, HTTP, and HTTPS hosts, in order to fetch documents in a variety of formats.

You use a record adapter to read in the documents to a crawler pipeline. Once read into the pipeline, Forge processes the documents and converts them into Endeca records. These records can contain property values, dimension values, and metadata based on each document’s content. You can then build an Endeca application to access the records and allow your application users to search and navigate the document contents contained in the records.

Keep in mind that the Endeca IAP can process only HTML and TXT documents. You can gain the ability to process over 200 document types by installing the optional Endeca Document Conversion Module.

Important: The Endeca Crawler is deprecated, and will be removed in a future version of the Endeca Information Access Platform. Therefore, if you are beginning a new project, it is recommended that you use the Endeca Web Crawler, which is a component of the Endeca Content Acquisition System.