Sun Java System Portal Server Mobile Access 7.1 Developer's Guide

Localizing Voice Applications

Voice applications are more locale-dependent than conventional software applications. Not only must the language that the computer uses to communicate with users be modified for user locales, but the voice interface must also be modified so that interface recognizes the language spoken by users. There might even be differences within the same language based on location. Localizing a voice application therefore requires careful attention to language and regional differences.

Localization of a voice application involves:

Re-recording Voice Prompts

Mobile Access software uses a naming scheme for recorded prompts, where prompt file names are based on the words in the prompt with underscores between words.

For example, the prompt:

Here are your notes

is named

here_are_your_notes.wav

For long phrases, the file name is truncated at 54 characters with 50 for the file name and 4 for the extension .wav.

The phrase:

Tell me which channel you want to add, or say cancel

would be named:

tell_me_which_channel_you_want_to_add_or_say_cance.wav.

Using English phrases for prompt names greatly improves the readability of VoiceXML code for English speaking developers.

To localize an application that contains prompts, you do not need to change the prompt file names. Instead, the prompts can be re-recorded in the new language and saved with the same file name.

The prompt:

Here are your notes

would become:

Voici vos notes

in French, but the file name would remain here_are_your_notes.wav. This approach allows the language used by the application to be changed without editing any of the VoiceXML <audio> tags.

Grammar Translation

VoiceXML applications use grammars to define the phrases that users can speak in any given dialog. For example, when logging into the Portal Server 7.1 software, users are prompted to enter an account number. In this case, the grammar allows users to speak a sequence of numbers. Once logged in, users can access a channel by speaking the channel name, such as email.

In VoiceXML applications, grammars can be included inline in the VoiceXML dialog code, for example:

<link next="#goodBye">
    <grammar scope="dialog">
         [
            ( goodbye )
            ( exit )
            ( quit )
         ]
    </grammar
</link>

The other option is to store the grammars in a file and reference that file form the VoiceXML dialog.

For example:

<form id="channelcommand">
    <field name="action" slot="action">
    <grammar src="grammars/overview.grammar#NextAction"/>

In this case the grammar is located in the file grammars/overview.grammar.

Voice applications, such as Personal Notes and Bulletin Board, generally use inline grammars. However, external grammar files can also be used as well.

For external grammar files, localization involves replacing the English words or phrases in the grammar file with their equivalents in the new language. Do not simply replace words with their dictionary equivalents. Instead, choose words or phrases that native language speakers would use in every day conversation.

When several ways of saying the same thing are available, include these alternatives in the grammar. For example, in the previous example, users can exit by saying goodbye, exit, or quit.

For inline grammars, you must identify which files contain inline grammars. Search for files that contain the string <grammar> but do not include a src= parameter (which indicates an external grammar file). Replace the words or phrases as you would in the case of external grammar file, but be careful not to inadvertently modify other parts of the VoiceXML code.

Modifying Pre-Recorded Prompts to Match Grammar Changes

Some voice prompts contain instructions on how to interact with the system, such as to end your session say goodbye. In this case, the application grammar is defined to recognize the word goodbye.

When localizing a voice application, you must take care to ensure that when you change a grammar, you also modify any prompts that refer to that grammar. Typically, you will find audio prompts that give instructions to users in <noinput>, <nomatch>, and <help> VoiceXML tags.

Before the recording artist re-records these prompts for the new language, make a note of any grammar changes, and update the localized prompt phrase to match the grammar.

Updating Concatenated Phrases

Sentences in voice applications are frequently constructed from individual phrases and words. For example, the phrase Today is Tuesday April 8th, 2003 might be constructed from the following eight words or phrases: Today is, Tuesday, April, eighth, two, thousand, and three. The VoiceXML code plays these prompts in order.

Localizing the application might require that words be concatenated in a different order. Recording the prompts in the localized language might not result in a correctly structured sentence.

This problem has two solutions:

To Localize the Application:

Test the localized application by interacting with the application to detect any instances where the phraseology is incorrect. This is the simplest approach for localization teams who are not familiar with VoiceXML.

Perform a code review, identifying prompt concatenation in the code, and making changes to the prompts as necessary. In some cases you might need to add new prompts to account for significant changes in sentence structure. You might need to go back to the recording studio to record new prompts.

Concatenated phrases might also suffer from cadence issues. Cadence is the way that individual words and phrases flow within a sentence. In some languages, words flow together without pauses. This could require the removal of silence at the beginning or end of recorded prompts, or in some cases, the recording of a single phrase to replace several concatenated words.

Cadence issues are usually discovered during testing and can often be resolved with careful prompt editing. If you edit or re-record a prompt to work well in one concatenation, the prompt might not work correctly if used elsewhere in a different part of the sentence. If you make a change to a prompt in one dialog, check all other cases where that prompt is used to ensure that the change does not adversely affect them.

Sometimes the pronunciation of a word changes depending on the immediately preceding or following words, or if the language has masculine and feminine forms of words, depending on the gender of the object. Review the prompt phrases before recording and make notes to the recording artist where a particular pronunciation is required. If the article in the sentence is only known at run-time, you might need to add VoiceXML code to select the correct pronunciation depending on the gender of the article.

Translating Text-To-Speech Prompts

In addition to pre-recorded prompts, some voice applications use text-to-speech (TTS) prompts. These prompts appear as English text in VoiceXML code within <prompt> statements. For example:

<prompt>Please say yes or no</prompt>

TTS prompts can also be used in conjunction with pre-recorded prompts:

<prompt><audio src="you_have.wav"/> 5 <audio src="unread_messages.wav"/> </prompt>

In this example, TTS is used for the word five in the phrase you have five unread messages.

Finally, VoiceXML variables can be used in TTS prompts:

<prompt><value expr="num_messages"/></prompt>

In this example, the digit 5 and the variable num_messages are spoken using TTS. No localization work is required because the TTS engine for the new locale automatically speaks the number in the new language. However, variables can also be assigned values that correspond to English words or phrases that the TTS engine will not translate. In such cases you must identify any place in the VoiceXML code where English language strings are assigned to variables. Look for <assign> tags such as:

<assign name="prompt" expr="’OK, got it!’"/>

You must change any embedded English language words that would be spoken using TTS. The easiest way to identify these prompts is to search for <prompt> tags.