With the exception of Chinese, Japanese, and Korean (the CJK languages), you can set the default language analysis for each language to either OLT or Latin-1 language analysis. CJK languages default to OLT analysis and cannot be configured to use Latin-1.
To change the default language analyzer for other languages:
For Dutch, English, English (UK), French, German, Italian, Portuguese, and Spanish, which default to Latin-1 analysis:
For Arabic, Czech, Danish, Greek, Hungarian, Polish, and Russian, which default to OLT analysis:
Navigate to the
MDEX\<version>\conf\stemming\custom
directory.Create a static stemming dictionary named
<lang id>_word_forms_collection.xml
.Open the stemming file for your application.
For example,
Endeca\apps\<app name>\config\pipeline\<app name>.stemming.xml
.In the entry for the language, set
USE_STATIC_WORDFORMS="TRUE"
.This configures the language for Latin-1 analysis.
Note
The configuration for the
stemming.xml
file was designed to accept only a limited set of languages. These languages must be enabled explicitly in the file to set the language analyzer. Additional languages, including those listed below in Step 3, are automatically configured based on the presence or absence of a custom stemming dictionary.For Catalan, Croatian, Finnish, Hebrew, Persian (Farsi), Portuguese (Brazil), Norwegian (Bokmal and Nynorsk), Romanian, Serbian, Serbian (Latin), Slovak, Slovenian, Swedish, Thai, and Turkish, which default to OLT analysis:
The presence of the static stemming dictionary is sufficient to change the language analyzer to Latin-1.
Note
The Dgidx and Dgraph load custom dictionaries for all languages
configured in the
stemming.xml
file.