Configuring constraints for spelling dictionaries

The Oracle Endeca Server selects words for the spelling dictionary based on pre-defined constraints. Modifying these constraints can be useful for improving performance of spell-corrected searches.

The constraint settings are available in the Global Configuration Record.

You can use these configuration settings to tune and improve the types of spelling corrections produced by the Oracle Endeca Server. For example, setting the minimum number of word occurrences can direct the attention of the spelling correction algorithm away from infrequent terms and towards more popular (frequently occurring) terms, which might be deemed more likely to correspond to intended user search terms.

To configure the settings which the Dgraph process of the Oracle Endeca Server uses to generate spelling dictionary entries:

  1. In the editor of your choice, edit the constraints in the GCR that the Dgraph should use for adding words to the spelling dictionary.

    You can separately edit settings for entries in the dictionary for record search and value search. In other words, for each attribute assignment on a record, and for each attribute value, you could specify the following settings in the Global Configuration Record:

    Attribute Type Description
    mdex-config_SpellingRecordMinWordOccur Int Specifies the minimum number of times a word must occur in a standard attribute value (record assignment on an attribute) for it to be indexed for spelling correction. The default value is 4.
    mdex-config_SpellingRecordMinWordLength Int Specifies the minimum number of characters that a word must contain in a standard attribute value (record assignment on an attribute) for it to be indexed for spelling correction. The default value is 3.
    mdex-config_SpellingRecordMaxWordLength Int Specifies the maximum number of characters that a word may contain for it to be indexed for spelling correction. The default value is 16.
    mdex-config_SpellingDValMinWordOccur Int Specifies the minimum number of times a word must occur in a managed attribute value for it to be indexed for spelling correction. The default value is 1.
    mdex-config_SpellingDValMinWordLength Int Specifies the minimum number of characters that a word must contain in a managed attribute value for it to be indexed for spelling correction. The default value is 3.
    mdex-config_SpellingDValMaxWordLength Int Specifies the maximum number of characters that a word may contain for it to be indexed for spelling correction. The default value is 16.
  2. To send the updated GCR to the Oracle Endeca Server, use the Configuration Web Service directly or use Integrator ETL. For information, see either the section on Configuration Web Service in this guide, or, if you are using Integrator ETL, see the Oracle Endeca Information Discovery Integrator ETL User's Guide.
  3. Run the Endeca Server update-spelling-dictionaries endeca-cmd command on the data domain in order for these changes to take effect.