International Language Environments Guide

Indic Localization

Phonetic lookup based input method (Shabdalipi) and continuous phonetic input method are available for all Indic languages which are supported in the UTF-8 locale. The input methods and virtual keyboards allow you to enter Indic text in all of the CDE applications.

The following data flow illustrates the workings of the Indic input process.

Data flow indicating workings of Indic input process

How to Use the Indic Input Methods

Click the input status area to display the input method selection menu.

Select an input method from the menu.

Alternatively, you can press the F6 key to select from among the available input methods.

You can also type the Compose-hi key sequence to select the input method that you used previously.

Press the F5 key to select the Indic script you want to use.
1. For the keyboard-based (indic INSCRIPT keyboard) input method, use the keyboard images shown in Indic Keyboards.
2. For the phonetic lookup-based input method, type the first English phonetic equivalent character corresponding to the character in the target script.
  
  Select from a list of choices displayed in the lookup window.
3. For the continuous phonetic input method, type in English phonetic equivalents continuously.
  
  The corresponding characters in the target script are displayed in the preedit and will be committed when subsequent input makes the preedit text unambiguous or by an explicit commit. Refer to figures given in Mapping for the Continuous Phonetic Based Input Method for illustrations of the mapping from the English tokens to the UTF-8 codepoints of the target script for the continuous phonetic input method.

Press Control-spacebar to switch back to English/European input mode.

Alternatively, click in the status area to select the English/European input mode from the input mode selection window.

Indic Keyboards

The following figures show the keyboard layouts that are available for the Indic input method.

The following figure shows the layout of the Bengali keyboard.

The following figure shows the layout of the Devanagari keyboard.

The following figure shows the layout of the Gujarati keyboard.

The following figure shows the layout of the Gurmukhi keyboard.

The following figure shows the layout of the Kannada keyboard.

The following figure shows the layout of the Malayalam keyboard.

The following figure shows the layout of the Tamil keyboard.

The following figure shows the layout of the Teluga keyboard.

Understanding the Mappings

The images in Mapping for the Continuous Phonetic Based Input Method show the mappings between English tokens and their equivalent codepoints in each of the target scripts supported. The CONSONANT category means the mapping is between the English tokens and consonants of the script. The VOWEL category means that mapping from English tokens and vowels of the script. The OTHER category includes mapping of characters that do not exhibit the properties of consonants and vowels (whose form does not change depending on the surrounding character).

The keywords CONSONANT, VOWEL and OTHER also mean that these characters are part of Unicode standard. The section SPECIAL CONSONANT, SPECIAL VOWEL or SPECIAL OTHER means that though in principle these characters display the properties of consonants, vowels or others they are not officially part of the Unicode standard and are font dependent. They are assigned codepoint values in Unicode Private User Area. They are supported in Oracle Solaris UTF-8 locales and the mapping may not work in a different platform.

These mapfiles are not the same as the ones in your system, but slightly edited ones for removing unneeded keywords for the context of this discussion.

In the VOWELS and SPECIAL VOWELS section, an independent form and a dependent form is displayed for the same English token depending on the context. See How the Continuous Phonetic Input Method Works.

The Malayalam script contains a special ‘CHILLU’ section, that is actually the SPECIAL OTHER category.

Mapping for the Continuous Phonetic Based Input Method

The following figures show the existing mappings from English to the phonetic equivalent characters in the target Indic scripts. Use these illustrations as a reference until you know all the mappings for the script that you use. Mappings given here are intuitive, so you should be able to input most of the characters without looking up the illustration.

Note –

In these mappings, special characters such as ‘.’ and ‘|’ included as part of the mapping are escaped with a ‘\’ character. If not escaped, the ‘|’ character acts as a separator when more than one token represents the same UTF-8 character.

Figure 4–1, Figure 4–2, and Figure 4–3 show the English to Bengali mappings for consonants, vowels, and others.

Figure 4–1 Map for Bengali Consonants

graphical representation of map for Bengali consonants

Figure 4–2 Map for Bengali Vowels

graphical representation of map for Bengali vowels

Figure 4–3 Map for Bengali Others

graphical representation of map for Bengali others

Figure 4–4, Figure 4–5, and Figure 4–6 show the English to Gujarati mappings for consonants, vowels, and others.

Figure 4–4 Map for Gujarati Consonants

graphical representation of map for Gujarati consonants

Figure 4–5 Map for Gujarati Vowels

graphical representation of map for Gujarati vowels

Figure 4–6 Map for Gujarati Others

graphical representation of map for Gujarati others

Figure 4–7, Figure 4–8, and Figure 4–9 show the English to Gurmukhi mappings for consonants, vowels, and others.

Figure 4–7 Map for Gurmukhi Consonants

graphical representation of map for Gurmukhi consonants

Figure 4–8 Map for Gurmukhi Vowels

graphical representation of map for Gurmukhi vowels

Figure 4–9 Map for Gurmukhi Others

graphical representation of map for Gurmukhi others

Figure 4–10, Figure 4–11, and Figure 4–12 show the English to Hindi mappings for consonants, vowels, and others.

Figure 4–10 Map for Hindi Consonants

graphical representation of map for Hindi consonants

Figure 4–11 Map for Hindi Vowels

graphical representation of map for Hindi vowels

Figure 4–12 Map for Hindi Others

graphical representation of map for Hindi others

Figure 4–13, Figure 4–14, and Figure 4–15 show the English to Kannada mappings for consonants, vowels, and others.

Figure 4–13 Map for Kannada Consonants

graphical representation of map for Kannada consonants

Figure 4–14 Map for Kannada Vowels

graphical representation of map for Kannada vowels

Figure 4–15 Map for Kannada Others

graphical representation of map for Kannada others

Figure 4–16, Figure 4–17, and Figure 4–18 show the English to Malayalam mappings for consonants, vowels, and others.

Figure 4–16 Map for Malayalam Consonants

graphical representation of map for Malayalam consonants

Figure 4–17 Map for Malayalam Vowels

graphical representation of map for Malayalam vowels

Figure 4–18 Map for Malayalam Others

graphical representation of map for Malayalam others

Figure 4–19 and Figure 4–20 show the English to Tamil mappings for consonants and vowels.

Figure 4–19 Map for Tamil Consonants

graphical representation of map for Tamil consonants

Figure 4–20 Map for Tamil Vowels

graphical representation of map for Tamil vowels

Figure 4–21,Figure 4–22, and Figure 4–23 show the English to Telugu mappings for consonants, vowels, and others.

Figure 4–21 Map for Telugu Consonants

graphical representation of map for Telugu consonants

Figure 4–22 Map for Telugu Vowels

graphical representation of map for Telugu vowels

Figure 4–23 Map for Telugu Others

graphical representation of map for Telugu others

How the Continuous Phonetic Input Method Works

For each Indic script, a ‘virama’ or equivalent sign combined with a consonant gives the half form (or ready to combine form) of the consonant. Whenever a multiple key combination corresponding to a consonant is typed, the consonant + virama form is output, symbolizing that the characters are ready to combine.

Consonants, at initial input, will assume their half form and will be a full syllable or their variation when followed by a vowel.

Two consecutive consonants remain as the ready to combine half forms. Half forms can be converted by the layout engine as a single combined character or can remain as those independent forms that are also syntactically valid for every language.

Any vowel that forms the beginning of a word or is followed by another vowel appears in independent form. A vowel that immediately follows a consonant assumes dependent forms.

Characters that do not change shapes in any context are called others. These characters are neither consonants nor vowels.

Digits and other punctuation marks that do not form a part of a character are mapped one to one.

Using these principles, a parser is written that will parse the input into these different categories and output the language-specific Unicode codepoints. The continuous phonetic input method engine does not deal with layout or rendering, which will be done by other modules in the system.