Chapter 2 Developing Voice Applications

This chapter provides information about Sun Java™ System Portal Server Mobile Access voice application development. The chapter contains the following sections:

Understanding Voice Applications

Voice applications are developed similarly to other Sun Java™ System Portal Server applications. The best approach for developing voice applications is to develop them in VoiceXML and then to integrate them with Portal Server software by developing a custom provider. The Notes and Personal Notes voice applications both use this approach.

Voice Application Prerequisites

See, the A Voice Application Example

Building a Voice Application

Voice Application Prerequisites

The pre-requisites for integrating new voice applications with Portal Server Mobile Access software are:

Voice components of Mobile Access software are certified against the Nuance Voice Platform, which includes a VoiceXML 2.0-compliant voice browser. Because of differences between VoiceXML interpreters, the voice components might need to be ported to run on other platforms (see to Porting the Voice Environment and Applications for further information). Portal Server software and voice applications should use the same VoiceXML browser.

Mobile Access software supports VoiceXML applications only. If you have a legacy voice application that was not built using VoiceXML, you must provide a VoiceXML wrapper for that legacy voice application. This wrapper will integrate with Portal Server software and manage the voice session with the non-VoiceXML application.

If you are not familiar with building voice applications using VoiceXML, consult a book on VoiceXML programming. The W3C VoiceXML 2.0 specification is the definitive reference on the programmatic elements of the language.

See, the A Voice Application Example

The best way to start building voice applications is to examine the Notes application included with Portal Server software. The Notes application consists of a provider (NotesProvider) that uses template files for each type of access device (web browser, wireless device, and voice browser). The voice application template files for the NotesProvider are stored in the following directory:

These files contain VoiceXML code in addition to template tags. The tags provide dynamic content when the dialogs are accessed. For example, the content.template file uses [tag:count] to retrieve the number of notes and [tag:note] tag to speak the notes using text-to-speech:

The prompts for this voice application are Microsoft Windows audio (.wav) files, stored in the following directory:

This path is constructed at runtime by concatenating the root directory (/voice), the locale identifier (en_US), a prompts sub-directory (prompts) and a persona (gary). The resulting path /voice/en_US/prompts/gary is relative to the web container directory /opt/SUNWwbsvr/docs.

Finally, each voice application must provide a grammar that allows the application to be selected from the voice desktop channel chooser. The grammar for the Notes application (notes.grammar) is located in the following directory:

This allows users to select the channel by speaking the phrase notes or notes channel.

Building a Voice Application

To create a new voice application, you need to build a Portal Server custom provider, or extend an existing provider. For details on creating a custom provider, see the information on leaf providers in the Portal Server Developer’s Guide.

Most interactive voice applications use dynamically-generated content. For example, a weather application might report the weather for a particular region when you speak the name of a city or postal code. The dynamic content (the weather in this case) is retrieved from a weather service at runtime. For this reason, voice application dialogs are typically generated dynamically using techniques such as JavaServerTM Pages (JSPTM) technology.

Providers could generate the VoiceXML dialogs programmatically, but the simplest approach is to build template files that contain the static dialog code, and use tags that are interpreted at runtime to retrieve the dynamic content.

The following steps describe how to build a provider that implements a voice-enabled weather application:

Develop a dialog design.

Most voice applications consist of a set of dialogs. Each dialog is responsible for one part of user interaction.

You can use a flow chart to represent the design of your voice application. The flow chart should include phrases spoken by users, shown as transitions between the dialogs. Or, you can develop a script where the conversational flow between the voice application and users is listed in chronological order.

Either way, you must handle cases where users say something that was not understood, or the voice application did not receive any input.

Build a prototype of your voice application in VoiceXML.

For dynamic content, begin by simply including static text as a placeholder content. For example, in a weather application, you might always speak the same report using a <prompt> statement:


<prompt>
Here's the weather forecast for Santa Clara, California.Today will be mostly sunny, with a high of 75 and a low of 68 degrees Fahrenheit.
</prompt>

The static content will be replaced with dynamic content once the prototype is complete.

Test your application, exploring all possible dialog interactions.

Integrate the application with the Portal Server software.

Adding a voice application to the Portal Server software involves building a custom provider. The simplest approach is to use template files like the NotesProvider described in the previous section. The template approach allows you to take the VoiceXML dialogs from the prototype and use them directly with your provider.

Identify static placeholder text in your VoiceXML prototype.

Review your VoiceXML prototype and identify the places where you use static content as placeholders for dynamic content. In each case, you must build a custom tag that can generate the appropriate dynamic content at runtime. For example, the weather report might be implemented using a custom weather tag.

Build a custom provider using templates.

See information on leaf providers in the Portal Server Developer’s Guide for details on building a custom provider that uses templates. Follow the instructions for creating a new custom provider. Implement support for the custom tags required for dynamic content in the voice application.

Edit the VoiceXML files.

While building the custom provider, you must make some changes to the VoiceXML files:

Replace the static content with tags. For example, the static weather report in Build a prototype of your voice application in VoiceXML. would be replaced with the following, assuming you have implemented support for a weather tag in your provider:


<prompt>
[tag:weather]
</prompt>

Change the file extensions of the VoiceXML files from .vxml to .template. You must modify references to other dialogs in the VoiceXML code.

Move all files to the appropriate directories.

The files comprising your voice application must reside in specific directories. For a discussion of these directories, see File System Directories for Dialogs, Grammars, and Prompts.

Complete the installation of the new provider.

Test the new provider.

Create user accounts with a numeric user name and a numeric PIN, then call into the Portal Server software and enter that information when required.

At the main menu, speak the add a channel command to add the new channel from the list of available channels:

User:

Add a channel.

System:

Sure;

Here's the list of channels you can add:

E-Mail, Calendar, Weather

That's it.

Tell me which one you want to add, or say cancel.

User:

Weather

System:

All right;

Weather has been added.

Would you like to go there?

User:

Yes

System:

OK, Weather.

When you're done, say main menu.

Here's the weather forecast for Santa Clara, California:

Today will be mostly sunny, with a high of 75 and a low of 68 degrees Fahrenheit.

OK, we're back at the portal main menu. What’s next?

File System Directories for Dialogs, Grammars, and Prompts

Portal Server software has specific directories for the various provider components. This section discusses the directories for:

Dialogs

Grammars

Prompts

Dialogs

The dialogs for each voice application are stored in a separate directory that includes the name of the application, the presentation format (vxml) and the name of the voice browser vendor (Nuance).

Grammars

If your application uses external grammar files, they should be stored in the web server’s document root, or in some other well-known location within the Portal Server web application.

To make the application accessible from the voice Portal Desktop, you must create a second grammar file that allows users to select the application. The grammar for the channel must be unique across all of the voice-enabled channels. For consistency, the grammar should allow users to optionally speak the word channel after the name of the channel.

Prompts

The path element gary is the name of the default persona—the person whose voice appears on the recording. If you record new prompts, you should create a new directory for the new persona. This new directory could be named after the person who recorded the prompts.

Use the words of the prompt in lower case as the file name.

Use an underscore character between words of the phrase. For example: word_word.

Truncate the file name at 50 characters, leaving an additional 4 characters for the .wav extension

To use this prompt directory in your voice application, prepend the path /voice/en_US/prompts/cheryl (or the path to your prompts) before the prompt file name.

For example, if your prompts were stored in a prompts/ sub-directory, replace this statement:

with:

For localization convenience, you might want to define some VoiceXML variables in your application's root document, and construct the path from this:


<var name="root" expr="'/voice'" />
<var name="locale" expr="'en_US'" />
<var name="prompts_directory" expr="'prompts'" />
<var name="persona" expr="'cheryl'" />
<var name="promptPath" expr="root + '/' + locale + '/' +
prompts_directory + '/' + persona + '/'" />

Then use the promptPath variable as follows, using expr= instead of src=:

Porting the Voice Environment and Applications

To provide voice functionality with non-Nuance VoiceXML platforms, the Portal Server voice environment and voice applications must be ported.

Each VoiceXML dialog file contains an XML DTD header, and the XML DTD header used by your voice application must be correct. To determine the correct XML DTD header, see the documentation for the voice browser you are using.

Voice browser vendors sometimes use different grammar file formats. These could be proprietary, or based on evolving industry standards. Check to see if the grammar used by your voice browser is compatible with the Nuance format used by the Portal Server software, and modify the grammars if necessary. Some grammars might be inline in VoiceXML dialogs, or in external grammar files.

If you develop VoiceXML applications that use client-side scripting, such as ECMAScript or JavaScript, be aware of interpreter differences between voice platforms.

Some voice browsers might not support the full set of VoiceXML tags. The voice browser documentation usually lists which tags the browser supports. You might need to modify VoiceXML code to remove unsupported tags.

Some VoiceXML tags behave differently between voice browsers. Although the code might execute, the behavior could change subtly. Thoroughly test all dialog states in your application, even if the dialog states appear to execute correctly.

If pre-recorded prompts are not played correctly, you might need to change the encoding from the default 8-bit, 8kHz, mulaw sphere encoded WAV audio file format.

Localizing Voice Applications

Voice applications are more locale-dependent than conventional software applications. Not only must the language that the computer uses to communicate with users be modified for user locales, but the voice interface must also be modified so that interface recognizes the language spoken by users. There might even be differences within the same language based on location. Localizing a voice application therefore requires careful attention to language and regional differences.

Re-recording Voice Prompts

Grammar Translation

Modifying Pre-Recorded Prompts to Match Grammar Changes

Updating Concatenated Phrases

Translating Text-To-Speech Prompts

Re-recording Voice Prompts

Mobile Access software uses a naming scheme for recorded prompts, where prompt file names are based on the words in the prompt with underscores between words.

For long phrases, the file name is truncated at 54 characters with 50 for the file name and 4 for the extension .wav.

Using English phrases for prompt names greatly improves the readability of VoiceXML code for English speaking developers.

To localize an application that contains prompts, you do not need to change the prompt file names. Instead, the prompts can be re-recorded in the new language and saved with the same file name.

in French, but the file name would remain here_are_your_notes.wav. This approach allows the language used by the application to be changed without editing any of the VoiceXML <audio> tags.

Grammar Translation

VoiceXML applications use grammars to define the phrases that users can speak in any given dialog. For example, when logging into the Portal Server software, users are prompted to enter an account number. In this case, the grammar allows users to speak a sequence of numbers. Once logged in, users can access a channel by speaking the channel name, such as email.

In VoiceXML applications, grammars can be included inline in the VoiceXML dialog code, for example:

The other option is to store the grammars in a file and reference that file form the VoiceXML dialog.

Voice applications, such as Personal Notes and Bulletin Board, generally use inline grammars. However, external grammar files can also be used as well.

For external grammar files, localization involves replacing the English words or phrases in the grammar file with their equivalents in the new language. Do not simply replace words with their dictionary equivalents. Instead, choose words or phrases that native language speakers would use in every day conversation.

When several ways of saying the same thing are available, include these alternatives in the grammar. For example, in the previous example, users can exit by saying goodbye, exit, or quit.

For inline grammars, you must identify which files contain inline grammars. Search for files that contain the string <grammar> but do not include a src= parameter (which indicates an external grammar file). Replace the words or phrases as you would in the case of external grammar file, but be careful not to inadvertently modify other parts of the VoiceXML code.

Modifying Pre-Recorded Prompts to Match Grammar Changes

Some voice prompts contain instructions on how to interact with the system, such as to end your session say goodbye. In this case, the application grammar is defined to recognize the word goodbye. When localizing a voice application, you must take care to ensure that when you change a grammar, you also modify any prompts that refer to that grammar. Typically, you will find audio prompts that give instructions to users in <noinput>, <nomatch>, and <help> VoiceXML tags.

Before the recording artist re-records these prompts for the new language, make a note of any grammar changes, and update the localized prompt phrase to match the grammar.

Updating Concatenated Phrases

Sentences in voice applications are frequently constructed from individual phrases and words. For example, the phrase Today is Tuesday April 8th, 2003 might be constructed from the following eight words or phrases: Today is, Tuesday, April, eighth, two, thousand, and three. The VoiceXML code plays these prompts in order.

Localizing the application might require that words be concatenated in a different order. Recording the prompts in the localized language might not result in a correctly structured sentence.

Test the localized application by interacting with the application to detect any instances where the phraseology is incorrect. This is the simplest approach for localization teams who are not familiar with VoiceXML.

Perform a code review, identifying prompt concatenation in the code, and making changes to the prompts as necessary. In some cases you might need to add new prompts to account for significant changes in sentence structure. You might need to go back to the recording studio to record new prompts.

Concatenated phrases might also suffer from cadence issues. Cadence is the way that individual words and phrases flow within a sentence. In some languages, words flow together without pauses. This could require the removal of silence at the beginning or end of recorded prompts, or in some cases, the recording of a single phrase to replace several concatenated words.

Cadence issues are usually discovered during testing and can often be resolved with careful prompt editing. If you edit or re-record a prompt to work well in one concatenation, the prompt might not work correctly if used elsewhere in a different part of the sentence. If you make a change to a prompt in one dialog, check all other cases where that prompt is used to ensure that the change does not adversely affect them.

Sometimes the pronunciation of a word changes depending on the immediately preceding or following words, or if the language has masculine and feminine forms of words, depending on the gender of the object. Review the prompt phrases before recording and make notes to the recording artist where a particular pronunciation is required. If the article in the sentence is only known at run-time, you might need to add VoiceXML code to select the correct pronunciation depending on the gender of the article.

Translating Text-To-Speech Prompts

In addition to pre-recorded prompts, some voice applications use text-to-speech (TTS) prompts. These prompts appear as English text in VoiceXML code within <prompt> statements. For example:

In this example, TTS is used for the word five in the phrase you have five unread messages.

In this example, the digit 5 and the variable num_messages are spoken using TTS. No localization work is required because the TTS engine for the new locale automatically speaks the number in the new language. However, variables can also be assigned values that correspond to English words or phrases that the TTS engine will not translate. In such cases you must identify any place in the VoiceXML code where English language strings are assigned to variables. Look for <assign> tags such as:

You must change any embedded English language words that would be spoken using TTS. The easiest way to identify these prompts is to search for <prompt> tags.

Previous Contents Index Next
Sun Java System Portal Server Mobile Access 6 2005Q1 Developer's Guide