The Thesaurus feature enables you to configure search terms in queries to match words or concepts that are synonymous or similar in meaning.
For example, if you define a thesaurus entry that maps the words automobile and car to each other, a search for automobile can return automobile and car.
The thesaurus supports specifying multi-word equivalences. For example, the Thesaurus might specify that the phrase Mark Twain is interchangeable with the phrase Samuel Clemens. It is also possible to mix the number of words in the phrase-forms for a single equivalence. For example, you can specify that wine opener is equivalent to corkscrew.
Multi-word equivalences are matched on a phrase basis. For example, if a thesaurus equivalence between wine opener and corkscrew is defined, then a search for corkscrew will match the text stainless steel wine opener, but will not match the text an effective opener for wine casks.
Thesaurus equivalences can be either one-way or two-way:
One-way mapping specifies only one direction of equivalence. That is, one "From" term is mapped to one or more "To" terms, but none of the "To" terms are mapped to the "From" term. Only one "From" term can be specified in a one-way Thesaurus entry.
In most cases, a one-way Thesaurus entry maps a term with a broad range of meanings to one or more terms with specific meanings. For example, a one-way Thesaurus entry can map the general term "Red wine" to a number of different, more specific terms, such as "Merlot", "Shiraz", and "Bordeaux". In such a case, the reverse mappings would not be useful to the customer; someone searching for a bottle of Merlot, for example, is not likely to be interested in red wines other than Merlot. These one-way mappings can be defined in
Thesaurus.xml
as follows:<THESAURUS_ENTRY_ONEWAY> <THESAURUS_FORM_FROM>Red wine</THESAURUS_FORM_FROM> <THESAURUS_FORM_TO>Merlot</THESAURUS_FORM_TO> <THESAURUS_FORM_TO>Shiraz</THESAURUS_FORM_TO> <THESAURUS_FORM_TO>Bordeaux</THESAURUS_FORM_TO> </THESAURUS_ENTRY_ONEWAY>
The terms to which a one-way Thesaurus entry maps a search term must be in your catalog, but not the search term itself. Thus, in the preceding example, the search term "Red wine" need not be in your catalog, but the terms "Merlot", "Shiraz", and "Bordeaux" must be in order to appear in the results list.
Two-way (or all-to-all) mapping means that the direction of a word mapping is equivalent between the words. For example, a two-way mapping between stove, range, and oven means that a search for one of these words will return all results matching any of these words (that is, the mapping marks the forms as strictly interchangeable).
When you define a two-way mapping, you do not specify "From" or "To" terms. Instead, you specify two or more terms that are mapped to each other, as follows:
<THESAURUS> <THESAURUS_ENTRY> <THESAURUS_FORM>france</THESAURUS_FORM> <THESAURUS_FORM>french</THESAURUS_FORM> </THESAURUS_ENTRY> </THESAURUS>
Unlike the Stemming module, the Thesaurus feature enables you to define multiple equivalences for a single word or phrase. These multiple equivalences are considered independent.
For example, we might define one equivalence between football and NFL, and another between football and soccer. With these two equivalences, a search for NFL will return hits for NFL and hits for football, a search for soccer will return hits for soccer and football, and a search for football will return all of the hits for football, NFL, and soccer. However, searches for NFL will not return hits for soccer (and vice versa).
This non-transitive nature of the thesaurus is useful for defining equivalences containing ambiguous terms such as football. The word football is sometimes used interchangeably with soccer, but in other cases football refers to American football, which is played professionally in the NFL. In other words, the term football is ambiguous.
When you define equivalences for ambiguous terms, you do not want their specific meanings to overlap into one another. People searching for soccer do not want hits for NFL, but they may want at least some of the hits associated with the more general term football.
Thesaurus entries are essentially used to produce alternate forms of the user query, which in turn are used to produce additional query results. As a rule, the MDEX Engine will expand the user query into the maximum possible set of alternate queries based on the available thesaurus entries.
This behavior is particularly important in the presence of overlapping thesaurus forms. For example, suppose that you define an equivalence between red wine and vino rosso, and a second equivalence between wine opener and corkscrew. The query red wine opener might match the thesaurus entries in two different ways: red wine could be mapped to vino rosso based on the first entry; or wine opener could be mapped to corkscrew based on the second entry.
Using the maximal-expansion rule, this issue is resolved by expanding to all possible queries. In other words, the MDEX Engine returns hits for all of the queries: red wine opener, vino rosso opener, and red corkscrew.