The RECORD_NTERMS pass-through sets a limit on the maximum number of terms that are tagged on a record.

You can use the RECORD_NTERMS pass-through to implement one of two strategies to limit the number of terms that are tagged on records:

You cannot mix both strategies. In both strategies, CAS determines which terms have the highest relevance for that record. Note that this pass-through is recommended mainly for collections that have large documents.

To establish a cut-off window, use the RECORD_NTERMS pass-through with a range of two integers, which sets the lower and upper limits of a cut-off window. This windowing strategy establishes a window that will be scanned for an optimal cut-off. This cut-off is where term informativeness drops off most precipitously. Use this strategy when you want CAS to be sensitive to actual term informativeness rather than just using a hard limit.

You can think of the term range as providing a fuzzy neighborhood to be used instead of a hard limit. For example, instead of RECORD_NTERMS having a hard limit of 32, you can set it to a range of 24-36. This range establishes a window where a record can have a minimum of 24 terms and a maximum of 36 terms. CAS determines the optimal cut-off within that window for each record.

For example, assume that 40 terms were extracted from Record A and also from Record B:

When using the range version of this pass-through, keep the following in mind:


Copyright © Legal Notices