Update mode

The UPDATE_MODE pass-through specifies which type of data update is being performed by the pipeline.

The UPDATE_MODE pass-through is optional. The three values for this pass-through are STATELESS, STATEFUL, and PARTIAL. Note that if this pass-through is omitted, the term extractor performs a STATELESS update.

STATELESS mode

STATELESS is analogous to a baseline update and performs the following actions:
  1. Extracts terms from all records.
  2. Performs corpus-level and record-level filtering on all records.
  3. Emits all records.
  4. Does not create state files.

STATEFUL mode

STATEFUL performs the following actions:
  1. Extracts terms from new records only.
  2. Performs corpus-level and record-level filtering on all records (i.e., both previously processed records and new records).
  3. Emits all records (i.e., both previous and new records).
  4. Creates term state files.

For information on the STATEFUL mode, see the "Using STATEFUL mode for tuning" topic in Chapter 6.

PARTIAL mode

PARTIAL is analogous to a partial update and performs the following actions:
  1. Extracts terms from new records only.
  2. Performs record-level filtering on new records only. Corpus-level filtering is not performed at all, so the previous corpus information is left as-is.
  3. Emits new records only.
  4. Does not create state files (i.e., previous term state files are left as-is).

For details on performing PARTIAL updates with a Term Discovery pipeline, see the "Partial updates for term extraction" topic in Chapter 6.

Notes on update modes

Keep the following notes in mind when using the update modes:
  • STATELESS mode is recommended for sites that perform baseline updates exclusively.
  • STATEFUL mode is recommended for sites that implement partial updates. That is, the baseline update pipeline will use STATEFUL mode while the partial update pipeline will use PARTIAL mode.
  • The STATELESS and STATEFUL modes do not need pre-existing state files in order to run.
  • The PARTIAL mode does require the existence of the term state files. Therefore, a STATEFUL update must be performed before a PARTIAL update.