This section lists the languages that the MDEX engine can process, as well as the language analyzer or analyzers that can be used with each language.
The management of the languages originally
supported by MDEX can be configured in the
stemming.xml
file. Every language that can be
configured in
stemming.xml
has a default language analyzer that is
used if no analyzer is specified for it in
stemming.xml
.
You can select the non-default language analyzer for any language
configured in
stemming.xml
except the CJK languages (Chinese,
Japanese, and Korean), which can be used only with OLT language analysis. To
select a non-default language analyzer, specify a value for the
USE_STATIC_WORDFORMS attribute in
stemming.xml
. However, before using a non-default
analysis, be sure to verify that doing so will be free from unintended
consequences.
Languages that were not originally supported by the MDEX Engine can be used only with OLT language analysis, and must be specified on the dgidx command line using --lang; for example, to specify Bulgarian as your language, use the following command:
dgidx --lang bg
For detailed information about how the MDEX Engine chooses language analyzers for the various supported languages, see Analyzing and Sorting.
The following table lists the languages supported by the MDEX Engine, as well as the language code and language analyzer for each.
Language |
Language Code |
Configurable in stemming.xml? |
Default Language Analyzer * |
Only Supported Language Analyzer |
---|---|---|---|---|
Arabic |
|
yes |
OLT |
-- |
Basque |
|
no |
-- |
OLT |
Belerusian |
|
no |
-- |
OLT |
Bosnian |
|
no |
-- |
OLT |
Bulgarian |
|
no |
-- |
OLT |
Catalan |
|
no |
-- |
OLT |
Chinese (Simplified) ** |
|
yes |
-- |
OLT * |
Chinese (Traditional) ** |
|
yes |
-- |
OLT * |
Croatian |
|
no |
-- |
OLT |
Czech |
|
yes |
OLT |
-- |
Danish |
|
yes |
OLT |
-- |
Dutch |
|
yes |
Latin-1 |
-- |
English |
|
yes |
Latin-1 |
-- |
English (United Kingdom) |
|
yes |
Latin-1 |
-- |
Estonian |
|
no |
-- |
OLT |
Finnish |
|
no |
-- |
OLT |
French |
|
yes |
Latin-1 |
-- |
French (Canadian) |
|
no |
-- |
OLT |
Galician |
|
no |
-- |
OLT |
German |
|
yes |
Latin-1 |
-- |
Greek |
|
yes |
OLT |
-- |
Hebrew |
|
no |
-- |
OLT |
Hungarian |
|
yes |
OLT |
-- |
Indonesian |
|
no |
-- |
OLT |
Italian |
|
yes |
Latin-1 |
-- |
Japanese ** |
|
yes |
-- |
OLT * |
Korean ** |
|
yes |
-- |
OLT * |
Latvian |
|
no |
-- |
OLT |
Lithuanian |
|
no |
-- |
OLT |
Macedonian |
|
no |
-- |
OLT |
Malay |
|
no |
-- |
OLT |
Norwegian (Bokmal) |
|
no |
-- |
OLT |
Norwegian (Nyorsk) |
|
no |
-- |
OLT |
Persian (Farsi) |
|
no |
-- |
OLT |
Polish |
|
yes |
OLT |
-- |
Portuguese |
|
yes |
Latin-1 |
-- |
Portuguese (Brazil) |
|
no |
-- |
OLT |
Romanian |
|
no |
-- |
OLT |
Russian |
|
yes |
OLT |
-- |
Serbian |
|
no |
-- |
OLT |
Serbian (Latin) |
|
no |
-- |
OLT |
Slovak |
|
no |
-- |
OLT |
Slovenian |
|
no |
-- |
OLT |
Spanish |
|
yes |
Latin-1 |
-- |
Swedish |
|
no |
-- |
OLT |
Thai |
|
no |
-- |
OLT |
Turkish |
|
no |
-- |
OLT |
Ukranian |
|
no |
-- |
OLT |
Valencian |
|
no |
-- |
OLT |
Vietnamese |
vi |
no |
-- |
OLT |
* A default language analyzer for each language that
can be configured in
|