Creating an Endeca Crawler

Begin your pipeline by creating an Endeca Crawler that crawls your unstructured documents. Most of the steps to create a crawler pipeline that includes Stratify are common to a typical crawler pipeline. The pipeline differs in the components that follow the spider component.

To create a Endeca Crawler pipeline:

  1. Create a record adapter to read source documents. For details, see "Creating a record adapter to read documents" in the chapter titled "The Endeca Crawler" in this guide.
  2. Create a record manipulator. For this task and the following bullet items, see "Creating a record manipulator in this guide:
    1. Add a RETRIEVE_URL expression.
    2. Convert documents to text.
    3. (Optional) Identify the language of the documents.
    4. (Optional, Recommended) Remove document body properties.
  3. Create a spider component. See "Creating a spider" in the chapter titled "The Endeca Crawler".

After creating your Endeca Crawler pipeline, proceed to configure your Stratify Classification Server.