Logic used for spelling correction

At a high level, the spelling engine in Oracle Endeca Server performs the following steps related to spelling correction for a given search query.

  1. If the search terms generate more than a certain number of hits without any correction, then the spelling engine does not generate any corrections or suggestions.

    For the automatic correction, the threshold for the number of hits is 1. For the Did You Mean feature, the threshold for the number of hits is 20.

  2. For each term in the query, the spelling engine finds the 32 corrections with the lowest spelling scores. A low spelling score signifies that the correction is similar to the search term.

    For the Aspell mode that the Dgraph process uses for English, the spelling score is based on phonetic distance. The 32 corrections are pruned to corrections with a spelling score below a certain threshold. For the automatic correction, the spelling threshold is 125, for Did You Mean, the spelling threshold is 175.

  3. The spelling engine tests each correction in place of the original search term it corrects. Only those corrections which increase the number of hits (relative to the original query) without reducing the number of terms matched are eligible to be returned.
  4. The spelling engine selects the best correction based on which of the eligible corrections has the highest number of hits. For record search, this is the number of records matched. For value search, this is the number of records associated with the set of values matched.
    Note: For more information about the difference in the treatment of results between record search and value search, see the topic How value search treats number of results.

To change the Dgraph process configuration for Automatic Spelling Correction and DYM, you can rebuild the spelling dictionary with the updateSpellingDictionaries operation of the Data Ingest Web Service.

Suggestions for automatic correction are not exposed by the Oracle Endeca Server, that is, you cannot update the dictionary manually in the installed product.

In the Global Configuration Record, you can configure the indexing parameters such as minimum word occurrences and maximum and minimum word length. These parameters let you set boundaries to indicate to the Dgraph process of the Oracle Endeca Server which words to include in the spelling dictionary.