The MDEX Engine provides a number of advanced tuning options that allow you to achieve various performance and behavioral effects in the Spelling Correction feature.

An explanation of these tuning parameters relies on an understanding of the internal process used by the MDEX Engine to generate spelling suggestions.

At a high level, the spelling engine performs the following steps to generate alternate spelling suggestions for a given query:

  1. If the user query generates more than a certain number of hits, then do not generate suggestions. This threshold number of hits is the hthresh parameter.

  2. For each word in the user’s search query, compute the N most similar words in the data set from a spelling similarity perspective (N words are computed for each user query term). This number is set internally and is not user-configurable.

  3. For each word in the user’s search query, from the set of N most similar spelling words determined in step 2, pick the M most likely replacement words (where M<=N), based on a scoring process that combines factors such as spelling similarity and word frequency (number of hits). This narrows the set of possible spelling replacements for each user query word to M. This number is set internally and is not user-configurable.

  4. Consider combinations of these replacements for the user query words, limiting consideration to only combinations that gain more than a threshold percentage number of hits relative to the user’s original query, without reducing the number of query terms matched. This gain threshold percent is set internally and is not user-configurable.

  5. Scoring each such alternate query using a combination of factors such as spelling similarity of words used and the number of hits generated by the query, select the K best queries and use them as suggestions. K (the maximum number of replacement queries to generate) is called the nsug parameter.

  6. Finally, consider alternate queries computed by changing the word divisions in the user’s query, with the word-break analysis feature. Using the same scoring technique and limits on suggested queries described in steps 4 and 5, include alternate word-break queries in the final suggestion set.


Copyright © Legal Notices