This section describes how to configure and run the Endeca Crawler.
Overview of the Endeca Crawler
The Endeca Crawler provides the capability to crawl file systems, HTTP, and HTTPS hosts, in order to fetch documents in a variety of formats.
About installing the Endeca Crawler
The components required to set up an Endeca Crawler application are included in Developer Studio. Therefore, installing Developer Studio will ensure that you have the necessary software for running a crawl.
Source documentation and Endeca records
In the context of an Endeca Crawler application, Endeca records represent both the data in the source documents and metadata about the source documents.
Crawling errors
Processing source documents, including retrieving and extracting text can introduce problems. This section lists several common errors and any workarounds, if applicable.
Endeca Crawler operational details
This section documents some of the operational facets of the Endeca Crawler, including information on how URLs are processed and how Forge generates property names.
The full crawling pipeline
These sections describe how to create and configure a full crawling pipeline using Developer Studio.
About configuring authentication
Forge can be configured to provide basic or HTTPS authentication, as well as client authentication or authentication for a Microsoft Exchange server.