Create Machine Learning (ML) Entities

Before you Begin

This 15-minute tutorial shows you how to use ML (Machine Learning) entities in a skill in Oracle Digital Assistant.

Background

An ML entity is an entity type that is based on a training utterances, similar to the way intents are. For ML entities, these utterances are annotated to show where entity values appear in the utterance. With the help of the training model created from these annotated utterances, entity values can be extracted from user input based on the wording of their utterance, even if the training data does not include those particular entity values. This is especially useful when there are too many possible values for you to enumerate in a value list entity or when you don't know all of the possible values at design time.

In this tutorial we'll use ML entities to extract TV show names from user messages.

What Do You Need?

Access to Oracle Digital Assistant.
The ML_Materials_v2.zip file, which includes the starter skill, a finished version of the skill, and some other resources you'll use as part of the tutorial. Download this file and then unzip it to your local system.

Explore the Starter Skill

The first thing you need to do is import the starter skill into your Oracle Digital Assistant instance so that you can see what you're working with.

The skill contains two intents that are set up to extract the TV show name contained in user messages. At the moment, these intents are associated with a value list entity (not an ML entity).

Import the Skill

If you haven't done so already, download ML_Materials_v2.zip and extract its contents.
With the Oracle Digital Assistant UI open in your browser, click to open the side menu.
Click Development and select Skills.
Click again to collapse the side menu.
Click Import Skill (on the upper right part of the page).

Description of the illustration screenshot-import-skill.png
Browse to the extracted archive, open the ML_Materials_v2/ML_Materials folder, select MLEntityBasicDemo(1.0).zip and then click Open.
The process of importing the skill might take a few seconds.
Once the skill has finished importing, click the MLEntityBasicDemo tile to open it.

Description of the illustration screenshot-ml-entity-tile.png

The skill opens on its Intents page. Besides a welcome intent, the skill includes two intents that make use of an entity for TV shows.
In the list of intents on the left side of the page, select tvshow.reg.findActor.

Description of the illustration screenshot-intent1.png

In the Intent Entities section on the right side of the page, you'll notice that the entity list.TVShowNames appears, which means that this entity is associated with the intent.
In the Intents list, select tvshow.reg.getGenre.
You'll notice that this intent also has the list.TVShowNames entity associated with it.
In the skill's left navigation, click .
In the list of entities, select list.TVShowNames.
You should see a list of four TV show names.

Description of the illustration screenshot-entity1.png

Test the Starter Skill

In this step, we'll use the conversation tester to show how entity extraction works for value list entities.

Note:

The skill isn't fully functional, but its intents are written to let you know whether it recognizes an entity.

Click the Preview button to open the Conversation Tester.

Description of screenshot-preview.png follows — Description of the illustration screenshot-preview.png

In the tester, enter who is the lead actor in Soap.
As you can see, it is able to recognize that "Soap" is the name of a TV show.

Description of the illustration screenshot-tester-soap.png
Now click Reset and enter who is the lead actor in friends.
You'll notice that the skill is not able to find the TV show name, which is natural because the list.TVShowNames value list entity does not contain Friends as one of its values.

Description of the illustration screenshot-tester-friends.png
Close the tester.

Develop an ML Entity

In this part of the tutorial, we'll replace the list.TVShowNames value list entity with an ML entity so that entity values can be extracted from matched intents, even if the entity values are not explicitly added to the entity definition.

Create the ML Entity

The first step is to define the ML entity:

In the skill's left navigation, click .
Click Add Entity.
In the Add Entity dialog:
- For Name, enter ml.tvshownames.
- For Description, enter TV show names.
- For Type, select ML Entity.
- Click Create.
Description of the illustration screenshot-create-ml-entity.png

Add Utterances with Annotated Entity Values

Once you have created an ML entity, you need to provide a training dataset for it. As mentioned earlier, this dataset consists of training utterances that are annotated to mark where entity values appear in the utterance.

Your training set should have 500 to 5000 annotated utterances (with 1000 being a typical amount). You can provide these utterances yourself in the UI or create Data Manufacturing jobs to crowdsource the work.

To save time, we have provided a set of annotated utterances as a JSON file.

To import the annotated utterances:

If it isn't already selected, select the ml.tvshownames entity and then select its Dataset tab.
Select + Import, select the corpus-examples.json file that came as part of the zip file that you extracted at the beginning of this tutorial, and click Open.
After the import completes you should see the new utterances added to your dataset.

Description of screenshot-edit-annotations.png follows — Description of the illustration screenshot-edit-annotations.png

Take a look at the utterances and note that they are phrases with various types of wording, but which all contains names of TV shows. As you can see, the names of the shows are annotated as values for the ML entity.

Note:

On the Dataset tab you can add, annotate, and edit utterances. In addition, it is possible to set up Data Manufacturing jobs to crowdsource the creation of utterances, annotation of entities, and validation of the annotations. After completing this tutorial, you might want to explore the Crowdsource Training Data for ML Entities with Data Manufacturing tutorial.

Associate the ML Entity to Intents

Now let's reconfigure the intents to use our new ML entity instead of the value list entity that the skill started with.

In the skill's left navigation, click .
Select the tvshow.regfindActor intent.
In the Intent Entities section of the page, delete the list.TVShowNames entry.
Click + Add Entity and select ml.tvshownames.

Description of the illustration screenshot-replace-entity-association.png
Select the tvshow.reg.genGenre intent.
In the Intent Entities section of the page, delete the list.TVShowNames entry.
Click + Add Entity and select ml.tvshownames.

Update the BotML Code

Now that we have replaced the value list entity with our ML entity, we need to update the dialog flow code accordingly.

In the skill's left navigation, click .
Scroll to line 28 of the flow, which is where the text variable of the displayTVShowName state is defined.

Replace the whole line with the following:

          text: "${rb('displayTVShowName.message')} <#if iResult.value.fullEntityMatches['ml.tvshownames']?has_content>${iResult.value.fullEntityMatches['ml.tvshownames'][0].name}<#else>NOT_FOUND</#if>"

After pasting, make sure that the line begins with 10 spaces.
The updated state should look like this:

Description of the illustration screenshot-dialog-flow-change.png
Click Validate to make sure that there are no errors.

Train and Test the Skill

Now we are almost ready to test the updated skill. First we'll need to train it.

Train the Skill

On the right side of the page, click Train button ().
In the dialog, make sure that Trainer Tm and Entity are selected.

Description of the illustration screenshot-trainer-tm.png
Click Submit and wait for the training to complete.
It may take a few minutes.

Test the Machine Learning Entities

Click the Preview button to open the Conversation Tester.
In the tester, enter who is the lead actor in friends.
As you can see, "friends" is now correctly recognized as TV show name.

Description of the illustration screenshot-test-again.png

As it happens, "Friends" is included in the new training data, so that's not really a fair test. Let's try a TV show that is not included in the training set. And this time, let's use the Utterance Tester, which graphically illustrates where the entity value is extracted from:

Close the tester.
In the skill's left navigation, click .
Click the Test Utterances link that's near the top of the page.
The Utterance Tester will appear on the right side of the page.
In the Utterance field, enter Is Mahabharat an epic or history?
Click Test.
Scroll down to the Detected Entities section of the tester.
You should see that "Mahabharat" is correctly identified as TV show name.

Description of the illustration screenshot-utterance-tester.png

"Mahabharat" is not among the entity values in the training dataset. It was extracted as an entity solely based on the patterns in the training utterances, which demonstrates that the ml.tvshownames ML entity was successfully implemented!

Learn More

Title and Copyright Information

Create Machine Learning (ML) Entities

F49031-02

November 2021

Shows how to create and use ML (machine learning) entities in Oracle Digital Assistant.

This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms governing the U.S. Government's use of Oracle cloud services are defined by the applicable contract for such services. No other rights are granted to the U.S. Government.

This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.