Supported languages

You use a language code to identify a language.

Language codes must be specified as valid RFC-3066 language code identifiers. The supported languages and their language code identifiers are:
  • Arabic — ar
  • Catalan — ca
  • Chinese, simplified — zh_CN
  • Chinese, traditional — zh_TW
  • Croatian — hr
  • Czech — cs
  • Danish — da
  • Dutch — nl
  • English, American — en
  • English, British — en_GB
  • Finnish — fi
  • French — fr
  • German — de
  • Greek — el
  • Hebrew — he
  • Hungarian — hu
  • Italian — it
  • Japanese — ja
  • Korean — ko
  • Norwegian Bokmal — nb
  • Norwegian Nynorsk — nn
  • Persian — fa
  • Polish — pl
  • Portuguese — pt
  • Portuguese, Brazilian — pt_BR
  • Romanian — ro
  • Russian — ru
  • Serbian, Cyrillic — sr_Cyrl
  • Serbian, Latin — sr_Latn
  • Slovak — sk
  • Slovenian — sl
  • Spanish — es
  • Swedish — sv
  • Thai — th
  • Turkish — tr
  • unknown (i.e., none of the above languages) — unknown

The language codes are case insensitive.

Note that an error is returned if you specify an invalid language code.

With the language codes, you can specify the language of the text to the Dgraph during a record search or value search query, so that it can correctly perform language-specific operations.

How country locale codes are treated

A country locale code is a combination of a language code (such as es for Spanish) and a country code (such as MX for Mexico or AR for Argentina). Thus, the es_MX country locale means Mexican Spanish while es_AR is Argentinian Spanish.

If you specify a country locale code for a Language element, the software ignores the country code but accepts the language code part. In other words, a country locale code is mapped to its language code and only that part is used for tokenizing queries or generating search indexes. For example, specifying es_MX is the same as specifying just es. The exceptions to this rule are the codes listed above (such as pt_BR).

Note, however, that if you create a standard attribute and specify a country locale code in the mdex-property_Language field, the attribute will be tagged with the country locale code, even though the country code will be ignored during indexing and querying.

Language-specific dictionaries and indices

The Dgraph has two spelling correction engines. If the mdex-property_Language property in a PDR is set to en, then spelling correction will be handled through the English spelling engine (and its English spelling dictionary); if it is set to any other value, then spelling correction will use the non-English spelling engine (and its language-specific dictionaries). All dictionaries are generated from the data records in the Dgraph, and therefore require that the standard attribute PDRs be tagged with a language ID.

All dictionary files are stored in the data domain's index directory.