The following thesaurus clean-up rules should be observed to avoid performance problems related to expensive and non-useful thesaurus search query expansions.
Do not create a two-way thesaurus entry for a word with multiple meanings. For example, khaki can refer to a color as well as to a style of pants. If you create a two-way thesaurus entry for
khaki = pants
, then a user’s search for khaki towels could return irrelevant results for pants.Do not create a two-way thesaurus entry between a general and several more-specific terms, such as:
top = shirt = sweater = vest
This increases the number of results the user has to go through while reducing the overall accuracy of the items returned. In this instance, better results are attained by creating individual one-way thesaurus entries between the general term top and each of the more-specific terms.
A thesaurus entry should never include a term that is a substring of another term in the entry.
For example, consider the two-way equivalency:
Adam and Eve = Eve
If users type Eve, they get results for Eve or (Adam and Eve) (that is, the same results they would have gotten for Eve without the thesaurus). If users type Adam and Eve, they get results for (Adam and Eve) or Eve, causing the Adam and part of the query to be ignored.
Stop words such as and or the should not be used in single-word thesaurus forms. For example, if the has been configured as a stop word, an equivalency between thee and the is not useful.
You can use stop words in multi-word thesaurus forms, because multi-word thesaurus forms are handled as phrases. In phrases, a stop word is treated as a literal word and not a stop word.
Avoid multi-word thesaurus forms where single-word forms are appropriate. In particular, avoid multi-word forms that are not phrases that users are likely to type, or to which phrase expansion is likely to provide relevant additional results.
For example, the two-way thesaurus entry:
Aethelstan, King Of England (D. 939) = Athelstan, King Of England (D. 939)
should be replaced with the single-word form:
Aethelstan = Athelstan
Thesaurus forms should not use non-searchable characters. For example, the one-way thesaurus entry:
Pikes Peak -> Pike’s Peak
should be used only if the apostrophe (') is enabled as a search character.