This section lists the languages that the MDEX engine can process, as well as the language analyzer or analyzers that can be used with each language.

The management of the languages originally supported by MDEX can be configured in the stemming.xml file. Every language that can be configured in stemming.xml has a default language analyzer that is used if no analyzer is specified for it in stemming.xml.

You can select the non-default language analyzer for any language configured in stemming.xml except the CJK languages (Chinese, Japanese, and Korean), which can be used only with OLT language analysis. To select a non-default language analyzer, specify a value for the USE_STATIC_WORDFORMS attribute in stemming.xml. However, before using a non-default analysis, be sure to verify that doing so will be free from unintended consequences.

Languages that were not originally supported by the MDEX Engine can be used only with OLT language analysis, and must be specified on the dgidx command line using --lang; for example, to specify Bulgarian as your language, use the following command:

dgidx --lang bg

For detailed information about how the MDEX Engine chooses language analyzers for the various supported languages, see Analyzing and Sorting.

The following table lists the languages supported by the MDEX Engine, as well as the language code and language analyzer for each.

Language

Language Code

Configurable in stemming.xml?

Default Language

Analyzer *

Only Supported

Language Analyzer

Arabic

ar

yes

OLT

--

Basque

eu

no

--

OLT

Belerusian

be

no

--

OLT

Bosnian

bs

no

--

OLT

Bulgarian

bg

no

--

OLT

Catalan

ca

no

--

OLT

Chinese (Simplified) **

zh-CN

yes

--

OLT *

Chinese (Traditional) **

zh-TW

yes

--

OLT *

Croatian

hr

no

--

OLT

Czech

cs

yes

OLT

--

Danish

da

yes

OLT

--

Dutch

nl

yes

Latin-1

--

English

en

yes

Latin-1

--

English (United Kingdom)

en-GB

yes

Latin-1

--

Estonian

et

no

--

OLT

Finnish

fi

no

--

OLT

French

fr

yes

Latin-1

--

French (Canadian)

fr-CA

no

--

OLT

Galician

gl

no

--

OLT

German

de

yes

Latin-1

--

Greek

el

yes

OLT

--

Hebrew

he

no

--

OLT

Hungarian

hu

yes

OLT

--

Indonesian

id

no

--

OLT

Italian

it

yes

Latin-1

--

Japanese **

ja

yes

--

OLT *

Korean **

ko

yes

--

OLT *

Latvian

lv

no

--

OLT

Lithuanian

lt

no

--

OLT

Macedonian

mk

no

--

OLT

Malay

ms

no

--

OLT

Norwegian (Bokmal)

nb

no

--

OLT

Norwegian (Nyorsk)

nn

no

--

OLT

Persian (Farsi)

fa

no

--

OLT

Polish

pl

yes

OLT

--

Portuguese

pt

yes

Latin-1

--

Portuguese (Brazil)

pt-BR

no

--

OLT

Romanian

ro

no

--

OLT

Russian

ru

yes

OLT

--

Serbian

sr

no

--

OLT

Serbian (Latin)

sr-Latn

no

--

OLT

Slovak

sk

no

--

OLT

Slovenian

sl

no

--

OLT

Spanish

es

yes

Latin-1

--

Swedish

sv

no

--

OLT

Thai

th

no

--

OLT

Turkish

tr

no

--

OLT

Ukranian

uk

no

--

OLT

Valencian

ca-ES-valencia

no

--

OLT

Vietnamese

vi

no

--

OLT

* A default language analyzer for each language that can be configured in stemming.xml is specified in stemming.dtd. Oracle recommends, however, that you specify language analyzers in stemming.xml, rather than rely on the defaults. ** The CJK languages (Chinese, Japanese, and Korean) can be used only with OLT language analysis; they cannot be used with Latin-1.


Copyright © Legal Notices