The Dgraph uses a language code to identify a language for a specific attribute.
Afrikaans: af |
Danish: da |
Indonesian: id |
Norwegian Bokmal: nb |
Spanish, Latin American: es_lam |
Albanian: sq |
Divehi: nl |
Italian: it |
Norwegian Nynorsk: nn |
Spanish, Mexican: es_mx |
Amharic: am |
Dutch: nl |
Japanese: ja |
Oriya: or |
Swahili: sw |
Arabic: ar |
English, American: en |
Kannada: kn |
Persian: fa |
Swedish: sv |
Armenian: hy |
English, British: en_GB |
Kazakh, Cyrillic: kk |
Persian, Dari: prs |
Tagalog: tl |
Assamese: as |
Estonian: et |
Khmer: km |
Polish: pl |
Tamil: ta |
Azerbaijani: az |
Finnish: fi |
Korean: ko |
Portuguese: pt |
Telugu: te |
Bangla: bn |
French: fr |
Kyrgyz: ky |
Portuguese, Brazilian: pt_BR |
Thai: th |
Basque: eu |
French, Canadian: fr_ca |
Lao: lo |
Punjabi: pa |
Turkish: tr |
Belarusian: be |
Galician: gl |
Latvian: lv |
Romanian: ro |
Turkmen: tk |
Bosnian: bs |
Georgian: ka |
Lithuanian: lt |
Russian: ru |
Ukrainian: uk |
Bulgarian: bg |
German: de |
Macedonian: mk |
Serbian, Cyrillic: sr_Cyrl |
Urdu: ur |
Catalan: ca |
Greek: el |
Malay: ms |
Serbian, Latin: sr_Latn |
Uzbek, Cyrillic: uz |
Chinese, simplified: zh_CN |
Gujarati: gu |
Malayalam: ml |
Sinhala: si |
Uzbek, Latin: uz_latin |
Chinese, traditional: zh_TW |
Hebrew: he |
Maltese: mt |
Slovak: sk |
Valencian: vc |
Croatian: hr |
Hungarian: hu |
Marathi: mr |
Slovenian: sl |
Vietnamese: vn |
Czech: cs |
Icelandic: is |
Nepali: ne |
Spanish: es |
unknown (i.e., none of the above languages): unknown |
The language codes are case insensitive.
Note that an error is returned if you specify an invalid language code.
With the language codes, you can specify the language of the text to the Dgraph during a record search or value search query, so that it can correctly perform language-specific operations.
How country locale codes are treated
A country locale code is a combination of a language code (such as es
for Spanish) and a country code (such as MX
for Mexico or AR
for Argentina). Thus, the es_MX
country locale means Mexican Spanish while es_AR
is Argentinian Spanish.
If you specify a country locale code for a Language element, the software ignores the country code but accepts the language code part. In other words, a country locale code is mapped to its language code and only that part is used for tokenizing queries or generating search indexes. For example, specifying es_MX
is the same as specifying just es
. The exceptions to this rule are the codes listed above (such as pt_BR
).
Note, however, that if you create a Dgraph attribute and specify a country locale code in the Language
field, the attribute will be tagged with the country locale code, even though the country code will be ignored during indexing and querying.
Language-specific dictionaries and Dgraph database
Language
property in an attribute is set to en
, then spelling correction will be handled through the English spelling engine (and its English spelling dictionary).Language
property is set to any other value, then spelling correction will use the non-English spelling engine (and its language-specific dictionaries).All dictionaries are generated from the data records in the Dgraph, and therefore require that the attribute definitions be tagged with a language code.
A data set's dictionary files are stored in the Dgraph database directory for that data set.
Specifying a language for a data set
defaultLanguage
property in the edp.properties
configuration file sets the language.Note that you cannot set languages on a per-attribute basis.