You can modify the default stemming dictionaries by running Dgidx with the --stemming-updates flag and specifying an XML file that contains the updates to the dictionary that you want to make. The update file can include both additions and deletions. Dgidx processes the file by adding and deleting entries in the static stemming dictionary file.

The default static stemming dictionary files are stored in Endeca\MDEX\version\conf\stemming (on Windows) and /usr/local/endeca/MDEX/version/conf/stemming (on UNIX).

For most supported languages, the stemming directory contains two types of stemming dictionaries per language. One dictionary ( <RFC 3066 Language Code>_word_forms_collection.xml) contains stemming entries that support accented characters for the particular <RFC 3066 Language Code>.

The other dictionary ( <RFC 3066 Language Code>-x-folded_word_forms_collection.xml) contains stemming entries in which all accented characters have been folded down (removed) for the particular <language_code>. If present, this is the static stemming dictionary that is used if you specify --diacritic-folding. For details about how to map accented characters to unaccented characters, refer to the Oracle Commerce Guided Search Internationalization Guide.


Copyright © Legal Notices