Before you Begin

This 30-minute tutorial shows you how to crowdsource training data with the Data Manufacturing feature in Oracle Digital Assistant. This tutorial is a companion to the Create Machine Learning Entities tutorial, which focuses on how ML entities work.

Background

To create a quality training corpus for your skills, it is useful to get multiple people involved so that you can get a large set of data with a lot of natural variance. Oracle Digital Assistant's Data Manufacturing feature helps you crowdsource the work of creating training sets for both intents and ML entities.

In this tutorial we'll walk through the process of crowdsourcing the creation of training utterances and annotating entity values in those utterances for an ML entity that contains names of TV shows.

This is the general workflow for building an ML entity's dataset:

  1. Collect training utterances that include examples of values for the entity.
  2. For the collected training utterances, mark occurrences of entity values.
  3. Once the crowdsource users complete the entity annotation job, validate and make adjustments to the training data.
  4. Add the validated dataset to the ML entity.
  5. Do a final review of the dataset within Digital Assistant.

What Do You Need?

  • Access to Oracle Digital Assistant.
  • The contents of the DM_Materials.zip file.

Set Up the Starter Skill

We'll start by importing a skill that already has an ML entity defined.

  1. Download DM_Materials.zip file and extract it to your local system.

    In the extracted folder, you should see the following files:

    • DataManufacturingDemoStarter(1.0).zip
    • 10-tv-show-utterances.csv
  2. With the Oracle Digital Assistant UI open in your browser, click the main menu (main menu icon) to open the side menu.
  3. Click Development and select Skills.
  4. Click the main menu (main menu icon) again to collapse the side menu.
  5. Click Import Skill (on the upper right part of the page).
    Description of screenshot-import-skill.png follows
    Description of the illustration screenshot-import-skill.png
  6. Browse to the DataManufacturingDemoStarter(1.0).zip file then click Open.

    The process of importing the skill might take a few seconds.

  7. Once the skill has finished importing, click the DataManufacturingDemoStarter tile to open it.
    Description of screenshot-starter-tutorial-tile.png follows
    Description of the illustration screenshot-starter-tutorial-tile.png
  8. In the skill's left navigation, click Entities Entities icon.
  9. In the list of entities, select ml.tvshownames.
  10. Click the Dataset tab.

    You should see a list of utterances that are annotated with the ml.tvshownames entity.

    Description of screenshot-entity-dataset.png follows
    Description of the illustration screenshot-entity-dataset.png

    In the following section of the tutorial, we will use the Data Manufacturing feature to crowdsource additional annotated utterances for the ML entity.

Collect Training Utterances

The first step of building the dataset is to collect utterances relevant to your skill, the majority of which should contain values for the ML entity. These utterances should reflect the kinds of phrases you expect users to input into the bot and include variance in structure and wording.

For this tutorial, we'll use with the phrases in the 10-tv-show-utterances.csv file.

Note:

For future reference, see Guidelines for ML Entities for detailed guidelines on writing training utterances for ML entities.

Annotate the Utterances in the ML Entity's Dataset

Create an Entity Annotation Job

For an ML entity's training model to work, the training utterances need to be annotated to show which words are entity values. We'll create a Data Manufacturing entity annotation job for annotating the utterances that we have collected in the CSV file.

  1. In the skill's left navigation, click Manufacturing Manufacturing icon.
  2. On the Jobs tab, click + New Job.
  3. In the New Job dialog:
    • For Job Type, select Entity Annotation.
    • For Job Name, enter TVShowEntityAnnotation.
    • For Maximum Number of Tasks per Contributor, enter 10.
    • For Add entities, select ml.tvshownames.
    • Click Upload and select the 10-tv-show-utterances.csv file that you previously extracted from the DM_Materials.zip file.
    • Click the Continue button (which is at the top of the dialog).
  4. Click Launch.

    An item for TVShowEntityAnnotation should now appear in the list of jobs.

    Description of screenshot-annotation-job-running.png follows
    Description of the illustration screenshot-annotation-job-running.png
  5. Click Copy Link.
  6. In a convenient text file, paste the copied link.

This link points to the interface for the crowdsourding job.

Note:

For the purposes of this tutorial, you'll be acting as the sole crowd worker and will be addressing all the provided utterances. More typically, the size of the job would be much larger, you'd be counting on multiple crowd workers, and you would decide on the number tasks for each contributor accordingly.

Mark the Entity Values

Now it's time to use the just-created Entity Annotation job to mark where the entity values are in the utterances. For real world jobs, you'd share this link with crowd workers. For this tutorial you'll be playing the role of a crowd worker yourself. Your job will be to select the entity values you find in each utterance and assign it the ml.tvshownames ML entity.

Here are the steps for completing the crowdsourcing job:

  1. Open a new browser tab and paste in the link for the entity annotation job.
  2. Enter a user name and email address and click Start.
  3. Dismiss the Help dialog that appears.

    Note:

    You can later redisplay the Help dialog by clicking the Help icon that appears next to the Submit button at the top of the page.

    Description of shelp-button-in-top-nav.png follows
    Description of the illustration help-button-in-top-nav.png

    Once the Help dialog is closed, you should see a page that shows one of the utterances that you uploaded for the job.

  4. If the utterance contains a TV show name:
  5. If the utterance doesn't contain a show name, select the None of these entities apply radio button.
    Description of screenshot-no-entities-apply.png follows
    Description of the illustration screenshot-no-entities-apply.png
  6. Click Submit.

    Another utterance should appear on the page.

  7. Repeat the process for the remaining utterances.

    After submitting your work for all the utterances, you'll get a confirmation that your work is completed.

    Description of screenshot-thank-you-for-participating.png follows
    Description of the illustration screenshot-thank-you-for-participating.png

Confirm Completion of the Entity Annotation Job

Now, returning to your role as the person who created the entity annotation job, you'll confirm that the job has been completed.

  • Navigate to your browser tab that has the running Digital Assistant instance.

    You should see that the Status field for the entity annotation job has a value of Finished.

Note:

If it appears that the job hasn't been completed, reload the browser page to refresh its contents.

Description of screenshot-annotation-job-finished.png follows
Description of the illustration screenshot-annotation-job-finished.png

At this point, we could go to the job and review its results and potentially add them to our dataset. However, for a typical entity annotation job, that could involve reviewing thousands of utterances. So instead we'll use the next section of the tutorial to walk through the process of crowdsourcing the validation work.

Validate the Entity Annotation Job Results

Based on the results of the entity annotation job, we'll now create an entity validation job and then act as a crowd worker to complete that job.

Create Entity Validation Job

  1. Click + New Job.
  2. In the New Job dialog:
    1. For Job Type, select Entity Validation.
    2. For Job Name, enter TVShowValidation.
    3. For Maximum Number of Tasks per Contributor, enter 10.
    4. For Select the source for validation, click Previous Jobs.
    5. In the Add Jobs field, select TVShowEntityAnnotation.
    6. Click the Continue button (which is at the top of the dialog).
  3. Click Launch.

    An item for TVShowEntityValidation should now appear in the list of jobs.

    Description of screenshot-validation-job-running.png follows
    Description of the illustration screenshot-validation-job-running.png
  4. Click Copy Link.

This link points to the interface for the crowdsourding job.

Validate the Entity Annotations

Now, acting as a crowdworker, do the following:

  1. Open a new browser tab and paste in the link for the entity validation job.
  2. Enter a user name and email address and click Start.
  3. Dismiss the Help dialog that appears.

    You should see a page that shows one of the annotated utterances that you uploaded for the job.

    Description of screenshot-validation-job.png follows
    Description of the illustration screenshot-validation-job.png

  4. Review the utterance and its annotation and then click Correct, Incorrect, or Not Sure.

    When Correct is selected, the utterance is marked for addition to the training dataset.

    When Incorrect is selected, the utterance is marked to not be included in the dataset.

    When Not Sure is selected, the utterance remains unevaluated, which means that it will still be available for another crowd worker to evaulate it.

  5. Click Submit.

    Another utterance should appear on the page.

  6. Repeat the previous two steps for the remaining utterances.

    After submitting your work for all the utterances, you'll get a confirmation that your work is completed.

Add the Crowdsourced Data to the Dataset

With the validation job completed, we'll do one final review and add the data to the entity's dataset.

Accept the Crowdsourced Data

  1. In your browser, return to your Digital Assistant instance.

    On the Jobs tab of the Data Manufacturing page, you should see that the TVShowEntityValidation job has a status of Finished.

    Description of screenshot-entity-validation-finished.png follows
    Description of the illustration screenshot-entity-validation-finished.png

    Note:

    If it appears that the job hasn't been completed, reload the browser page to refresh its contents.

  • Click View Result.
  • Review the results and make sure that they are as you expect (or at least close to what you'd expect).
    Description of screenshot-validation-results.png follows
    Description of the illustration screenshot-validation-results.png
  • Click Accept to apply the results to the training dataset.
  • In the Accept Data dialog, click Yes to confirm.

    Note:

    When you click Accept, all the results are applied. However, it is possible to later adjust the training data on the Dataset tab of the ML entity, so it makes sense to accept the results even when there are a few that you later need to change.

  • Review the Updated Dataset

    1. In the skill's left navigation, click Entities Entities icon.
    2. In the list of entities, select ml.tvshownames and select its Dataset tab.

      You should see the utterances from the entity validation job at the top of the list.

      Description of screenshot-added-entity-values.png follows
      Description of the illustration screenshot-added-entity-values.png
    3. Scroll through the entries to make sure that they were added correctly.

    If you'd like to make any adjustments to utterances and their annotations, you can do so directly here.

    With that, you have completed the crowdsourcing of 10 training utterances for the ml.tvshownames ML entity. When you train the skill, your training model will be updated with those utterances.

    Learn More