The CAS Server and the Endeca Web
Crawler create Endeca records ready for processing by any type of Forge
pipeline (baseline or partial).
The CAS Server stores the records in either a Record Store instance or
in a file on disk. By default record storage is written to a Record Store
instance. The Web Crawler stores records, by default, in a file on disk but can
be configured to store records in a Record Store instance. (Using a Record
Store instance is the recommended approach.)
To read the records into a Forge pipeline, you add an input record
adapter to your Developer Studio project.
If the record adapter is reading from a CAS output file, you specify the
input format of either XML or binary (depending on how you configure the output
format). The URL field of the record adapter specifies the location of the CAS
output file.
If the record adapter is reading from a Record Store instance, you
configure the record adapter as a custom adapter.
Depending on the needs of your application, you can create these types
of Forge pipelines:
- Baseline-update pipeline.
This type of pipeline is intended for sites that perform only full crawls and
wish to perform only baseline updates. The topics in this chapter describe this
type of Forge pipeline.
- Delta-update pipeline. This
type of pipeline is intended for sites that perform both full and incremental
crawls and wish to perform baseline updates on both sets of data. This type of
application is not documented in this guide.
- Baseline-update and
partial-update pipelines. These pipelines are used if the site wants to perform
partial updates. This type of application is not documented in this guide.
Instead, refer to the
Endeca Partial Updates Guide.
Regardless of the type of Forge pipeline, you create it as well as
perform the rest of the back-end application tasks (such as creating Endeca
properties and dimensions, search interfaces, and so on) with Endeca Developer
Studio.