The UPDATE_MODE pass-through specifies which type of data update is being performed by the pipeline.
The UPDATE_MODE pass-through is optional. The three values for this pass-through are STATELESS, STATEFUL, and PARTIAL. Note that if this pass-through is omitted, the term extractor performs a STATELESS update.
STATELESS mode
STATELESS is analogous to a baseline update and performs the following actions:
- Extracts terms from all records.
- Performs corpus-level and record-level filtering on all records.
- Emits all records.
- Does not create state files.
STATEFUL mode
STATEFUL performs the following actions:
- Extracts terms from new records only.
- Performs corpus-level and record-level filtering on all records (i.e., both previously processed records and new records).
- Emits all records (i.e., both previous and new records).
- Creates term state files.
For information on the STATEFUL mode, see the "Using STATEFUL mode for tuning" topic in Chapter 6.
PARTIAL mode
PARTIAL is analogous to a partial update and performs the following actions:
- Extracts terms from new records only.
- Performs record-level filtering on new records only. Corpus-level filtering is not performed at all, so the previous corpus information is left as-is.
- Emits new records only.
- Does not create state files (i.e., previous term state files are left as-is).
For details on performing PARTIAL updates with a Term Discovery pipeline, see the "Partial updates for term extraction" topic in Chapter 6.
Notes on update modes
Keep the following notes in mind when using the update modes:
- STATELESS mode is recommended for sites that perform baseline updates exclusively.
- STATEFUL mode is recommended for sites that implement partial updates. That is, the baseline update pipeline will use STATEFUL mode while the partial update pipeline will use PARTIAL mode.
- The STATELESS and STATEFUL modes do not need pre-existing state files in order to run.
- The PARTIAL mode does require the existence of the term state files. Therefore, a STATEFUL update must be performed before a PARTIAL update.