About pipelines

A pipeline functions as the script for the entire data transformation process that occurs when you run the Forge program.

The pipeline specifies things like the format and location of the source data, any changes to be made to the source data (standardization), and the mapping method to use for each of the source data’s properties. A pipeline is composed of a collection of components. Each component performs a specific function during the transformation of your source data into Endeca records. Components are connected by links, giving the pipeline a sequential flow from inputs to outputs.

You add and edit components in your pipeline using the Pipeline Diagram editor. The pipeline diagram graphically depicts the components in your pipeline and the links between them. It describes the series of transformations that occur in the process of converting raw data to a format that the Endeca MDEX Engine can use, making it easy for you to trace the logic of your data transformation. The pipeline diagram is the best way to maneuver and maintain a high-level view of your pipeline as it grows in size and complexity.

All Endeca projects require a main pipeline to process baseline updates. A baseline update (also called a full update) is a complete re-index of the entire dataset. Baseline updates occur infrequently, usually once per day or once per week. They usually involve the customer generating an extract from their database system and making the files accessible either on an FTP server or on the indexing server. This data is processed by Forge and the Dgidx Indexer, and is then finally made available through the MDEX Engine. You define the steps in a baseline update in your project's main pipeline.

Endeca projects may optionally contain a partial pipeline to process partial updates. A partial update is a much smaller change in the overall dataset. Partial updates affect a small percentage of the total records in the system, and therefore occur much more frequently. They consist of a much smaller extract from the customer’s database and contain volatile information. For example, the price and availability of products on a retail store site are usually volatile.

You can create, view, and edit both types of pipelines in Developer Studio. You access your baseline and partial update pipelines via the Pipeline tab, and the Pipeline Diagram and Partial Pipeline Diagram editors.

Note: Refer to Section I in the Endeca Forge Guide for an introduction to main pipelines. Refer to the Endeca Partial Updates Guide for details on using and implementing partial pipelines.