You can use STATEFUL mode to tune the values you set for the corpus-level and record-level pass-throughs in the CAS manipulator.
Before you begin, make sure the baseline pipeline has the UPDATE_MODE pass-through set to STATEFUL mode.
The general procedure for using STATEFUL mode for tuning is:
Generate a baseline update, index the records, and start the MDEX Engine.
Run searches against the
P_AllTermsproperty and check the quality of the clusters.Add a MAX_INPUT_RECORDS pass-through set to 0 (zero) to the CAS manipulator.
Generate another baseline update. The update will be much faster because no terms will be extracted. However, a full corpus- and record-level filtering operation will be performed.
Repeat the above steps (except step 4) until you are satisfied with the results.
When you finish, be sure to remove the MAX_INPUT_RECORDS pass-through so all your source records will be processed.

