The Endeca Crawler provides the capability to crawl file systems, HTTP, and HTTPS hosts, in order to fetch documents in a variety of formats.
You use a record adapter to read in the documents to a crawler pipeline. Once read into the pipeline, Forge processes the documents and converts them into Endeca records. These records can contain property values, dimension values, and metadata based on each document’s content. You can then build an Endeca application to access the records and allow your application users to search and navigate the document contents contained in the records.
Keep in mind that the Endeca IAP can process only HTML and TXT documents. You can gain the ability to process over 200 document types by installing the optional Endeca Document Conversion Module.