The Thesaurus feature enables you to configure search terms in queries to match words or concepts that are synonymous or similar in meaning.

For example, if you define a thesaurus entry that maps the words automobile and car to each other, a search for automobile can return automobile and car.

The thesaurus supports specifying multi-word equivalences. For example, the Thesaurus might specify that the phrase Mark Twain is interchangeable with the phrase Samuel Clemens. It is also possible to mix the number of words in the phrase-forms for a single equivalence. For example, you can specify that wine opener is equivalent to corkscrew.

Multi-word equivalences are matched on a phrase basis. For example, if a thesaurus equivalence between wine opener and corkscrew is defined, then a search for corkscrew will match the text stainless steel wine opener, but will not match the text an effective opener for wine casks.

Thesaurus equivalences can be either one-way or two-way:

Unlike the Stemming module, the Thesaurus feature enables you to define multiple equivalences for a single word or phrase. These multiple equivalences are considered independent.

For example, we might define one equivalence between football and NFL, and another between football and soccer. With these two equivalences, a search for NFL will return hits for NFL and hits for football, a search for soccer will return hits for soccer and football, and a search for football will return all of the hits for football, NFL, and soccer. However, searches for NFL will not return hits for soccer (and vice versa).

This non-transitive nature of the thesaurus is useful for defining equivalences containing ambiguous terms such as football. The word football is sometimes used interchangeably with soccer, but in other cases football refers to American football, which is played professionally in the NFL. In other words, the term football is ambiguous.

When you define equivalences for ambiguous terms, you do not want their specific meanings to overlap into one another. People searching for soccer do not want hits for NFL, but they may want at least some of the hits associated with the more general term football.

Thesaurus entries are essentially used to produce alternate forms of the user query, which in turn are used to produce additional query results. As a rule, the MDEX Engine will expand the user query into the maximum possible set of alternate queries based on the available thesaurus entries.

This behavior is particularly important in the presence of overlapping thesaurus forms. For example, suppose that you define an equivalence between red wine and vino rosso, and a second equivalence between wine opener and corkscrew. The query red wine opener might match the thesaurus entries in two different ways: red wine could be mapped to vino rosso based on the first entry; or wine opener could be mapped to corkscrew based on the second entry.

Using the maximal-expansion rule, this issue is resolved by expanding to all possible queries. In other words, the MDEX Engine returns hits for all of the queries: red wine opener, vino rosso opener, and red corkscrew.


Copyright © Legal Notices