Digital Assistant Testing

16 Digital Assistant Testing

Here is a set of best practices for testing your digital assistant before (and after) you deploy it to production.

If you think you are done once you have gone through all of the planning, design, and development steps, you are not! Although you have tested all of your skills in isolation, you will need to retest them in the context of a digital assistant.

Note:

If your skills haven’t been thoroughly tested and optimized, there is no point in seriously testing the digital assistant. Before investing in the creation of batch tests for your digital assistant, make sure each skill is in the best shape it can be. A skill that doesn't perform well when tested in isolation will not perform any better when tested with other skills in a digital assistant.

Utterance Testing

In a nutshell, when you add a skill to a digital assistant and train the digital assistant, all utterances that were used to train the intents in a skill are used by the digital assistant to train a classifier for the skill.

If at runtime the routing engine in a digital assistant is confident that a particular skill represents a match for an incoming user message, it flags the skill as a “candidate skill”. If no other skill is resolved within a configured confidence range or better, it navigates to the identified candidate skill and its matching intent and starts a conversation.

So, utterances matter when routing requests in a digital assistant, which requires you to test if the utterances that successfully resolved to an intent in a skill, still resolve to it. Similarly to how you test your skills in isolation, you will run positive tests, negative tests, and neighbour tests on your skills.

The positive and negative tests use utterances you used to test the intents of a skill. If the tests are positive, you should get results well above the confidence threshold, though not necessarily the same confidence as when testing in isolation.

For neighbour testing, use test utterances from other skills in the digital assistant and configure them to resolve into the skill you are testing for. Ideally, when you run the test, all tests will fail because the utterances are not intended for the skill being tested.

Oracle Digital Assistant supports batch testing of utterances on the digital assistant level, which you can use to implement the tests explained in this part of the document.

Conversation Testing

As soon as you are satisfied with the result of the utterance tests, you can start the conversation test. For this, there is a conversation tester that also explains the decision making that led to a specific skill routing.

Like for skills, the conversation tester can be used to record test conversations for later replaying. By replaying conversations, you can ensure that changes to a skill still result in the same conversation and that it does not behave differently.

User Testing of Digital Assistants

Before signing off on a digital assistant, have real users test it. Give them a minimum of instructions and see how they do. You can use Insights for monitoring traffic, identifying utterances that don't find a matching intent, identifying utterances that find a wrong match, and to learning about the rate of successful vs. unsuccessful conversations.

Here are some questions you can use to guide users to what you want them to pay attention to:

Is it clear to users that they are interacting with a digital assistant and not a human?
Does the digital assistant explain to users what it can do and what it can’t?
Is it possible for experienced users to shorten the conversation by providing more information in the initial message?
Can users work with the digital assistant without needing to first learn a set of keywords or how to start a conversation?
Does the digital assistant handle errors by directing users to contact a human agent when they get stuck?
Does the digital assistant offer a help or cancel option in response to users failing to provide a valid input when prompted?
Does the digital assistant offer quick selections for common user input options when prompted (e.g. a button to set today's or tomorrow's date when creating a calendar entry)?
Is the bot persona (tone and voice) used consistently throughout the digital assistant conversations?
Is the digital assistant truly conversational or does it have areas that are not message driven but that mandate users to push a button or select from a list?
Is the language used by the digital assistant plain? If using expert language and abbreviations, will it be understood by the intended audience?
Are bot messages concise and meaningful?
Do bot messages and prompts contain context for the user to understand what the current status of the conversation is?
Does the digital assistant use alternating prompts when re-prompting for a piece of information?
Does the digital assistant actively help to disambiguate user input when the provided input is not clear (e.g. two sizes entered in a pizza order when only one should be provided)?

Checklist for Digital Assistant Testing

☑ Test NLU understanding at the digital assistant level using test suites.
☑ Test intent resolution for different contexts (setting a skill to be assumed current).
☑ Review digital assistant configuration settings to adapt the message templates for built-in messages to your needs and the bot persona.
☑ Use digital assistant confidence settings to tune understanding.
☑ Use the conversation tester to ensure your digital assistant provides the correct answers to user messages.
☑ Monitor the performance and behavior of your digital assistant at runtime.
☑ Implement a feedback loop for users to provide feedback via the conversation.

Learn More

Oracle Digital Assistant Design Camp video: Inside Artie – Sharing the experience of building Artie
Tune Routing Behavior
Conversation Metrics for Digital Assistants