Overview of a Forge pipeline

The CAS Server and the Endeca Web Crawler create Endeca records ready for processing by any type of Forge pipeline (baseline or partial).

The CAS Server stores the records in either a Record Store instance or in a file on disk. By default record storage is written to a Record Store instance. The Web Crawler stores records, by default, in a file on disk but can be configured to store records in a Record Store instance. (Using a Record Store instance is the recommended approach.)

To read the records into a Forge pipeline, you add an input record adapter to your Developer Studio project.

If the record adapter is reading from a CAS output file, you specify the input format of either XML or binary (depending on how you configure the output format). The URL field of the record adapter specifies the location of the CAS output file.

If the record adapter is reading from a Record Store instance, you configure the record adapter as a custom adapter.

Depending on the needs of your application, you can create these types of Forge pipelines:

Baseline-update pipeline. This type of pipeline is intended for sites that perform only full crawls and wish to perform only baseline updates. The topics in this chapter describe this type of Forge pipeline.
Delta-update pipeline. This type of pipeline is intended for sites that perform both full and incremental crawls and wish to perform baseline updates on both sets of data. This type of application is not documented in this guide.
Baseline-update and partial-update pipelines. These pipelines are used if the site wants to perform partial updates. This type of application is not documented in this guide. Instead, refer to the Endeca Partial Updates Guide.

Regardless of the type of Forge pipeline, you create it as well as perform the rest of the back-end application tasks (such as creating Endeca properties and dimensions, search interfaces, and so on) with Endeca Developer Studio.