Creating an Endeca Crawler

Begin your pipeline by creating an Endeca Crawler that crawls your unstructured documents. Most of the steps to create a crawler pipeline that includes Stratify are common to a typical crawler pipeline. The pipeline differs in the components that follow the spider component.

To create a Endeca Crawler pipeline:

Create a record adapter to read source documents. For details, see "Creating a record adapter to read documents" in the chapter titled "The Endeca Crawler" in this guide.
Create a record manipulator. For this task and the following bullet items, see "Creating a record manipulator in this guide:
1. Add a RETRIEVE_URL expression.
2. Convert documents to text.
3. (Optional) Identify the language of the documents.
4. (Optional, Recommended) Remove document body properties.
Create a spider component. See "Creating a spider" in the chapter titled "The Endeca Crawler".

After creating your Endeca Crawler pipeline, proceed to configure your Stratify Classification Server.