Sun Java logo     Previous      Contents      Index      Next     

Sun logo
Sun Java System Portal Server Mobile Access 6 2004Q2 Developer's Guide 

Chapter 2
Developing Voice Applications

This chapter provides information about Sun Java System Portal Server Mobile Access voice application development. The chapter contains the following sections:

Understanding Voice Applications

Voice applications are developed similarly to other Sun Java System Portal Server applications. The best approach for developing voice applications is to develop them in VoiceXML and then to integrate them with Portal Server software by developing a custom provider. The Notes and Personal Notes voice applications both use this approach.

This section discusses the following topics:

Voice Application Prerequisites

The pre-requisites for integrating new voice applications with Portal Server Mobile Access software are:

A Voice Application Example

The best way to start building voice applications is to examine the Notes application included with Portal Server software. The Notes application consists of a provider (NotesProvider) that uses template files for each type of access device (web browser, wireless device, and voice browser). The voice application template files for the NotesProvider are stored in the following directory:


These files contain VoiceXML code in addition to template tags. The tags provide dynamic content when the dialogs are accessed. For example, the content.template file uses [tag:count] to retrieve the number of notes and [tag:note] tag to speak the notes using text-to-speech:

<prompt bargein="true"> [tag:note]</prompt>

The prompts for this voice application are Microsoft Windows audio (.wav) files, stored in the following directory:


This path is constructed at runtime by concatenating the root directory (/voice), the locale identifier (en_US), a prompts sub-directory (prompts) and a persona (gary). The resulting path /voice/en_US/prompts/gary is relative to the web container directory /opt/SUNWwbsvr/docs.

Finally, each voice application must provide a grammar that allows the application to be selected from the voice desktop channel chooser. The grammar for the Notes application (notes.grammar) is located in the following directory:


This directory contains the following Nuance GSL grammar expression:

Notes [ (notes ?channel) ]

This allows users to select the channel by speaking the phrase notes or notes channel.

Building a Voice Application

To create a new voice application, you need to build a Portal Server custom provider, or extend an existing provider. For details on creating a custom provider, see the information on leaf providers in the Portal Server Developer’s Guide.

Most interactive voice applications use dynamically-generated content. For example, a weather application might report the weather for a particular region when you speak the name of a city or postal code. The dynamic content (the weather in this case) is retrieved from a weather service at runtime. For this reason, voice application dialogs are typically generated dynamically using techniques such as JavaServerTM Pages (JSPTM) technology.

Providers could generate the VoiceXML dialogs programmatically, but the simplest approach is to build template files that contain the static dialog code, and use tags that are interpreted at runtime to retrieve the dynamic content.

The following steps describe how to build a provider that implements a voice-enabled weather application:

  1. Develop a dialog design.
  2. Most voice applications consist of a set of dialogs. Each dialog is responsible for one part of user interaction.

    You can use a flow chart to represent the design of your voice application. The flow chart should include phrases spoken by users, shown as transitions between the dialogs. Or, you can develop a script where the conversational flow between the voice application and users is listed in chronological order.

    Either way, you must handle cases where users say something that was not understood, or the voice application did not receive any input.

  3. Build a prototype of your voice application in VoiceXML.
  4. For dynamic content, begin by simply including static text as a placeholder content. For example, in a weather application, you might always speak the same report using a <prompt> statement:


    Here's the weather forecast for Santa Clara, California.Today will be mostly sunny, with a high of 75 and a low of 68 degrees Fahrenheit.


    The static content will be replaced with dynamic content once the prototype is complete.

  5. Test your application, exploring all possible dialog interactions.
  6. Integrate the application with the Portal Server software.
  7. Adding a voice application to the Portal Server software involves building a custom provider. The simplest approach is to use template files like the NotesProvider described in the previous section. The template approach allows you to take the VoiceXML dialogs from the prototype and use them directly with your provider.

  8. Identify static placeholder text in your VoiceXML prototype.
  9. Review your VoiceXML prototype and identify the places where you use static content as placeholders for dynamic content. In each case, you must build a custom tag that can generate the appropriate dynamic content at runtime. For example, the weather report might be implemented using a custom weather tag.

  10. Build a custom provider using templates.
  11. See information on leaf providers in the Portal Server Developer’s Guide for details on building a custom provider that uses templates. Follow the instructions for creating a new custom provider. Implement support for the custom tags required for dynamic content in the voice application.

  12. Edit the VoiceXML files.
  13. While building the custom provider, you must make some changes to the VoiceXML files:

    • Replace the static content with tags. For example, the static weather report in Build a prototype of your voice application in VoiceXML. would be replaced with the following, assuming you have implemented support for a weather tag in your provider:
    • <prompt>



    • Change the file extensions of the VoiceXML files from .vxml to .template. You must modify references to other dialogs in the VoiceXML code.
  14. Move all files to the appropriate directories.
  15. The files comprising your voice application must reside in specific directories. For a discussion of these directories, see File System Directories for Dialogs, Grammars, and Prompts.

  16. Complete the installation of the new provider.
  1. Test the new provider.
  2. Create user accounts with a numeric user name and a numeric PIN, then call into the Portal Server software and enter that information when required.

    At the main menu, speak the add a channel command to add the new channel from the list of available channels:

File System Directories for Dialogs, Grammars, and Prompts

Portal Server software has specific directories for the various provider components. This section discusses the directories for:


The dialogs for each voice application are stored in a separate directory that includes the name of the application, the presentation format (vxml) and the name of the voice browser vendor (Nuance).

For a weather application, the dialog files would be stored in:


where weather is the name of the new application.


If your application uses external grammar files, they should be stored in the web server’s document root, or in some other well-known location within the Portal Server web application.

To make the application accessible from the voice Portal Desktop, you must create a second grammar file that allows users to select the application. The grammar for the channel must be unique across all of the voice-enabled channels. For consistency, the grammar should allow users to optionally speak the word channel after the name of the channel.

For example, the following grammar allows weather or weather channel:

Weather [ (weather ?channel) ]

Name this file weather.grammar and store the file in the following directory:



The voice prompts are located in the following directory:


The path element gary is the name of the default persona—the person whose voice appears on the recording. If you record new prompts, you should create a new directory for the new persona. This new directory could be named after the person who recorded the prompts.

For example:


Voice prompt file names use the following naming convention:

Porting the Voice Environment and Applications

To provide voice functionality with non-Nuance VoiceXML platforms, the Portal Server voice environment and voice applications must be ported.

The following issues are commonly encountered while porting:

Localizing Voice Applications

Voice applications are more locale-dependent than conventional software applications. Not only must the language that the computer uses to communicate with users be modified for user locales, but the voice interface must also be modified so that interface recognizes the language spoken by users. There might even be differences within the same language based on location. Localizing a voice application therefore requires careful attention to language and regional differences.

Localization of a voice application involves:

Re-recording Voice Prompts

Mobile Access software uses a naming scheme for recorded prompts, where prompt file names are based on the words in the prompt with underscores between words.

For example, the prompt:

Here are your notes

is named


For long phrases, the file name is truncated at 54 characters with 50 for the file name and 4 for the extension .wav.

The phrase:

Tell me which channel you want to add, or say cancel

would be named:


Using English phrases for prompt names greatly improves the readability of VoiceXML code for English speaking developers.

To localize an application that contains prompts, you do not need to change the prompt file names. Instead, the prompts can be re-recorded in the new language and saved with the same file name.

The prompt:

Here are your notes

would become:

Voici vos notes

in French, but the file name would remain here_are_your_notes.wav. This approach allows the language used by the application to be changed without editing any of the VoiceXML <audio> tags.

Grammar Translation

VoiceXML applications use grammars to define the phrases that users can speak in any given dialog. For example, when logging into the Portal Server software, users are prompted to enter an account number. In this case, the grammar allows users to speak a sequence of numbers. Once logged in, users can access a channel by speaking the channel name, such as email.

In VoiceXML applications, grammars can be included inline in the VoiceXML dialog code, for example:

<link next="#goodBye">

    <grammar scope="dialog">


            ( goodbye )

            ( exit )

            ( quit )




The other option is to store the grammars in a file and reference that file form the VoiceXML dialog.

For example:

<form id="channelcommand">

    <field name="action" slot="action">

    <grammar src="grammars/overview.grammar#NextAction"/>

In this case the grammar is located in the file grammars/overview.grammar.

Voice applications, such as Personal Notes and Bulletin Board, generally use inline grammars. However, external grammar files can also be used as well.

For external grammar files, localization involves replacing the English words or phrases in the grammar file with their equivalents in the new language. Do not simply replace words with their dictionary equivalents. Instead, choose words or phrases that native language speakers would use in every day conversation.

When several ways of saying the same thing are available, include these alternatives in the grammar. For example, in the previous example, users can exit by saying goodbye, exit, or quit.

For inline grammars, you must identify which files contain inline grammars. Search for files that contain the string <grammar> but do not include a src= parameter (which indicates an external grammar file). Replace the words or phrases as you would in the case of external grammar file, but be careful not to inadvertently modify other parts of the VoiceXML code.

Modifying Pre-Recorded Prompts to Match Grammar Changes

Some voice prompts contain instructions on how to interact with the system, such as to end your session say goodbye. In this case, the application grammar is defined to recognize the word goodbye. When localizing a voice application, you must take care to ensure that when you change a grammar, you also modify any prompts that refer to that grammar. Typically, you will find audio prompts that give instructions to users in <noinput>, <nomatch>, and <help> VoiceXML tags.

Before the recording artist re-records these prompts for the new language, make a note of any grammar changes, and update the localized prompt phrase to match the grammar.

Updating Concatenated Phrases

Sentences in voice applications are frequently constructed from individual phrases and words. For example, the phrase Today is Tuesday April 8th, 2003 might be constructed from the following eight words or phrases: Today is, Tuesday, April, eighth, two, thousand, and three. The VoiceXML code plays these prompts in order.

Localizing the application might require that words be concatenated in a different order. Recording the prompts in the localized language might not result in a correctly structured sentence.

This problem has two solutions:

  1. Test the localized application by interacting with the application to detect any instances where the phraseology is incorrect. This is the simplest approach for localization teams who are not familiar with VoiceXML.
  2. Perform a code review, identifying prompt concatenation in the code, and making changes to the prompts as necessary. In some cases you might need to add new prompts to account for significant changes in sentence structure. You might need to go back to the recording studio to record new prompts.

Concatenated phrases might also suffer from cadence issues. Cadence is the way that individual words and phrases flow within a sentence. In some languages, words flow together without pauses. This could require the removal of silence at the beginning or end of recorded prompts, or in some cases, the recording of a single phrase to replace several concatenated words.

Cadence issues are usually discovered during testing and can often be resolved with careful prompt editing. If you edit or re-record a prompt to work well in one concatenation, the prompt might not work correctly if used elsewhere in a different part of the sentence. If you make a change to a prompt in one dialog, check all other cases where that prompt is used to ensure that the change does not adversely affect them.

Sometimes the pronunciation of a word changes depending on the immediately preceding or following words, or if the language has masculine and feminine forms of words, depending on the gender of the object. Review the prompt phrases before recording and make notes to the recording artist where a particular pronunciation is required. If the article in the sentence is only known at run-time, you might need to add VoiceXML code to select the correct pronunciation depending on the gender of the article.

Translating Text-To-Speech Prompts

In addition to pre-recorded prompts, some voice applications use text-to-speech (TTS) prompts. These prompts appear as English text in VoiceXML code within <prompt> statements. For example:

<prompt>Please say yes or no</prompt>

TTS prompts can also be used in conjunction with pre-recorded prompts:

<prompt><audio src="you_have.wav"/> 5 <audio src="unread_messages.wav"/> </prompt>

In this example, TTS is used for the word five in the phrase you have five unread messages.

Finally, VoiceXML variables can be used in TTS prompts:

<prompt><value expr="num_messages"/></prompt>

In this example, the digit 5 and the variable num_messages are spoken using TTS. No localization work is required because the TTS engine for the new locale automatically speaks the number in the new language. However, variables can also be assigned values that correspond to English words or phrases that the TTS engine will not translate. In such cases you must identify any place in the VoiceXML code where English language strings are assigned to variables. Look for <assign> tags such as:

<assign name="prompt" expr="'OK, got it!'"/>

You must change any embedded English language words that would be spoken using TTS. The easiest way to identify these prompts is to search for <prompt> tags.

Previous      Contents      Index      Next     

Copyright 2004 Sun Microsystems, Inc. All rights reserved.