RECORD_NTERMS and RECORD_FRACT_OF_MEDIAN

Recommendation: The use of these pass-throughs depends on the length of text in the text property that contains candidate terms. The two scenarios considered here are properties with either short text or long text.

For short text (such the P_Description property in the wine data set shipped with the sample reference implementation), the recommendation is to not use these pass-throughs, which will keep all the terms.

For long text (such as news sites or sites with long articles), use the range version of RECORD_NTERMS to set however many terms per record you want, say a range of 16-24 or, if more is wanted, a range of 24-30. (Keep in mind that the lower limit should be greater than 10 and the upper limit should not be much large than the lower limit.) Set RECORD_FRACT_OF_MEDIAN to 1.1 for relatively small documents, 1.2 for larger documents, and 1.5 for very large documents.


Copyright © Legal Notices