The thesaurus is a comma-delimited file, in which each line represents a single thesaurus entry.
The first comma-delimited element on a line is the name of the thesaurus entry. The remaining elements on that line are the search tokens that should be treated as synonyms for the thesaurus entry. Each synonym can be assigned a weight that determines the amount each match contributes to the overall query score. For example, a file that contains the following two lines defines thesaurus entries for couch and dog:
couch,sofa[0.9],divan[0.5],davenport[0.4] dog,canine,doggy[0.85],pup[0.7],mutt[0.3]
Searches for couch generate results with text matching terms couch, sofa, divan, and davenport. Searches for dog generate results that have text matching terms dog, canine, doggy, pup, and mutt. In the example shown, the term dog has the same contribution to the relevance score of a matching item as the term canine. This is equivalent to a default synonym weighting of 1.0. In contrast, the presence of the term pup contributes less to the relevance score than the presence of the term dog, by a factor of 0.7 (70%).
new york city,big apple[0.9],gotham[0.5]
Searches for the phrase “new york city” will return results that also include results containing “big apple” and “gotham.” Thesaurus expansion for phrase entries only occurs for searches on the complete phrase, not the individual words that constitute the phrase. Similarly, the synonym entries are treated as phrases and not as individual terms. So while a search for “new york city” returns items containing “big apple” and “gotham,” a search for new (or for york, or for city, or for “new york”) will not. Conversely, an item that contains big or apple but not the phrase “big apple” will not be returned by a search for “new york city.”
wastrel,ne er do well[0.7]
wastrel,ne er do well[0.7],neer do well[0.7]
# furniture entries couch,sofa[0.9],divan[0.5],davenport[0.4] #chair,stool[5.0] # animal entries dog,canine[0.9],doggy[0.85],pup[0.7],mutt[0.3]
In this example, the Search Service parses two thesaurus entries: couch and dog. There will be no entry for chair.
A CDF thesaurus file can have at most 50,000 distinct entries (lines). Each entry can have at most 50 comma-delimited elements (including the name of the entry). If either of these limits are exceeded, the customize utility will exit with an appropriate error message.