Voice

The SDKs for the Oracle Android, Oracle iOS, and Oracle Web channels have been integrated with speech recognition to allow users to talk directly to skills and digital assistants and get the appropriate responses.

When speech recognition is enabled, a microphone button replaces the send button whenever the user input field is empty. Users tap this button to begin recording their voices. The speech is sent to the speech server for recognition, converted to text, and then sent to the skill. If the speech is only partly recognized, then the partial result is displayed in the user input field, allowing the user to clean it up before sending it to the skill.

See General Feature Support by Language for a list of the languages that are supported for voice.

Enable Voice for the Oracle Android Channel

To enable the microphone in chat view:
  • Create the Oracle Android Channel and enable it.
  • Set the enableSpeechRecognition feature flag to true. Speech Recognition describes this and other voice-related properties and methods.

Enable Voice for the Oracle Web Channel

To enable the microphone for the chat widget that renders in a web page:
  • Configure the Oracle Web Channel and enable it.
  • Set the enableSpeech configuration property to true. Voice Recognition describes this and other voice-related properties and methods.

Enable Voice on the Oracle iOS Channel

To enable the microphone in the iOS chat view:
  • Configure the Oracle iOS Channel.
  • Set the enableSpeechRecognition feature flag to true. Speech Recognition describes this and other voice-recognition properties and methods.

Improve ASR with Enhanced Speech

If your skill's training data contains a lot of application- or skill-specific words or phrases, jargon, proper nouns, or words with unusual spellings or pronunciations, then you can increase the likelihood of these getting recognized and transcribed correctly using an enhanced speech model.
Note

You can only use enhanced speech with English-language skills (with training data in English) that are intended for an English-speaking audience.
To build an enhanced speech model:
  1. Select Enable Enhanced Speech in Settings.
  2. Retrain the skill.
  3. Route an Oracle Web, iOS, or Android client channel to the skill.

    Tip:

    Enhanced speech models are only available for skills developed with Version 20.12 or later. If you want to use enhanced speech models, then you must upgrade the skill to 20.12.

When you select this option, the speech recognition system builds an enhanced speech model that's based on the skill's intent and entity data: utterances, entity values, synonyms for both custom and dynamic entity values, and system entities that have been associated with intents. The enhanced speech model is updated each time you retrain your skill (or, as is the case in the current release, when the skill is retrained after a finalized push request from the Dynamic Entity API).

When users issue a speech request through the Oracle Web, iOS, or Android client channels, the speech runtime dynamically pulls in the custom language model for the skill that's routed to the channel. If the channel points to a digital assistant, it will pull the custom language models for each skill that has Enable Enhanced Speech enabled. You can toggle this setting on and off for the individual skills that are registered to a digital assistant.