Rather than supplement a default stemming dictionary, you may chose to entirely replace a default stemming dictionary with a custom a stemming dictionary.
To replace a default stemming dictionary with a custom stemming dictionary:
Create a custom dictionary file with stemming entries. For example XML, see the XML schema of any default stemming dictionary stored in
<install path>\MDEX\<version>\conf\stemming
.For example, this simplified English stemming dictionary contains one term and one stemmed variant:
<?xml version="1.0"?> <!ELEMENT WORD_FORMS_COLLECTION_UPDATES (COMMENT?, REMOVE_WORD_FORMS_KEYS*,ADD_WORD_FORMS*)> <WORD_FORMS_COLLECTION> <WORD_FORMS> <WORD_FORM>car</WORD_FORM> <WORD_FORM>cars</WORD_FORM> </WORD_FORMS> </WORD_FORMS_COLLECTION>
When you have created the custom stemming dictionary, save the XML file with one of the following name formats:
If the dictionary contains unaccented characters and you use the Dgidx flag
--diacritic-folding
, save the file as<RFC 3066 Language Code>
-x-folded_word_forms_collection.xmlIf the dictionary contains accented characters and you are not using the Dgidx flag
--diacritic-folding
, save the file as<RFC 3066 Language Code>
_word_forms_collection.xml
For example, the XML above would be saved as
en_word_forms_collection.xml
whereen
is the ISO639-1 code for English.Place the XML file in
<install path>\MDEX\<version>\conf\stemming\custom
.Specify the
--lang
flag to Dgidx with a<lang id>
argument that matches the language code of the custom stemming dictionary file.In the example above that uses an English (
en
) dictionary, you would specify:dgidx --lang en