41 Data Manufacturing

As a single developer, it can be difficult, or even impossible for you to create a large, varied set of utterances, especially when you need to provide training data for multiple intents or ML Entities. Rather than trying to come up with training data on your own, you can use Oracle Digital Assistant to crowd source this task. Assigning this to the crowd can be particularly useful when you need utterances that only experts in the application or the domain can provide.

What is a Data Manufacturing Job?

Data manufacturing jobs are collections of tasks assigned to crowd workers. The jobs themselves focus on various ways of improving intents and ML entities.

Annotation Jobs

You can assign an Annotation Job when you have logging data that needs to be classified to an intent, or when a single intent is too broad and needs to be broken down into separate intents. You can also assign crowd workers to annotate the key words and phrases from the training data that relate to an ML Entity.

Validation Jobs

For Validation jobs, crowd workers review utterances to ascertain if they fit the task or action described by the intent, or if the correct ML Entity has been identified. Only utterances that are judged valid by crowd workers get added to the training data.

Paraphrasing Jobs

The Paraphrase Job is how you collect utterances from the crowd. This assignment describes how they should craft their utterances.

The Data Manufacturing Job Workflow

To create a data manufacturing job, you first create a job and monitor its progress. If you want to access the data before the job has officially finished (say, for example, that the crowd workers are no longer working on the job), then you can cancel the job. Finally, you review the results before you add them to the training data by accepting them, or exclude them from the training data by rejecting them.

Create the Job

  1. Click Manufacturing This is an image of the Data Manufacturing icon. in the left navbar.
  2. In the Jobs page, click Add Job.

  3. Select the job type (Paraphrasing, Validation, or Annotation).

  4. Select the language that's used by the crowd workers. The default language is the skill's predominant language, but you can choose from other natively supported languages. You can't, however, choose a language that's enabled by a translation service.
  5. Click Launch. After you launch a job, its status is noted as Running in the Jobs page. You can’t edit a job when it's running. If you need to make a change, you need to first cancel the job, duplicate it, and then edit it before relaunching it.
  6. To send the job to the crowd, click Copy Link. Then paste the link to an email that's broadcast to the crowd.

    Crowd workers accept the job by clicking this link. After a crowd worker accepts the job, he or she reviews general rules for Paraphrasing, Annotation, or Validation jobs.

    Note:

    Crowd workers provide their names and email addresses for tracking purposes. You sort the results by their names to gauge their success in completing the tasks.

Monitor the Progress of the Crowd Workers

You can monitor the crowd workers' progress for the running job in the Jobs page, but you can’t access or view the results when a job's status is Running. You can only access the results after workers have completed the job, or if you’ve canceled a job because you think it's as complete as it can be. If this is the case, click Cancel. The job results will contain all of the completed records up to the point that you canceled it.

Jobs that have finished, or have been canceled, are available for download in the Results page. Typically, you’d use this page to access and modify results by downloading them as a CSV file that you can manipulate in a spreadsheet program like Excel.

As the number of jobs increases, you can filter them as Accepted, Rejected, or Undecided, which is for jobs that have neither been accepted or rejected.

Tip:

Because the Accepted or Rejected status signifies that your work on a job has concluded, you're likely to filter by Undecided most often.

Review the Results

  1. Click View to examine the results.

    If most of the results are uniformly incorrect, then click Reject. You might reject a poorly conceived job that confused or mislead crowd workers, or you might reject a job because it was a test. If you find the results are correct, then you can add them to your training set by clicking Accept. Before you choose this option, keep in mind that you cannot undo this operation, which adds the entire set of results to your training data. Because you may inadvertently add bad utterances that you can only remove by editing the intent, we recommend that you download the results and edit them before accepting them into the training corpus.

    Editing the downloaded CSV files enables you to clean up the results. For Paraphrasing jobs, editing the results before committing them to your training set enables you to change utterances or delete the incorrect ones.

    Note:

    The content in the result column depends on the Job type. For a Paraphrase job, the crowd worker's utterances populate this column. For a Validation job, it's the worker's evaluation of the utterance against the task (Correct, Incorrect, Not sure), and for an Annotation job, it's the intent that matches the utterance.


    To download the job, click Download This is an image of the Download icon. either in Results page or in the View Results dialog. Save the job to your local system and then open the CSV with a spreadsheet program.
  2. After you’ve finished your edits, click Upload in the Results page.

  3. Retrain your skill.

Paraphrasing Jobs

You collect utterances from the crowd workers through Paraphrasing jobs. Workers who accept the Paraphrasing job produce valid utterances using guidelines in the form of prompts and hints that you provide. A prompt captures the essence of what users expect the skill to do for them. A hint, which is optional, provides the crowd worker with further detail, such as wording and entity values. For example, "Create an expense for a merchant using a dollar amount" is a prompt for a Create Expense Report intent. The accompanying hint is "Use the merchant name ACME and a dollar amount of less than $50."

Gathering a collection of utterances that are linguistically diverse, yet semantically correct, starts with the design of the prompt and the hint. Keep the following in in mind when composing your prompts:
  • A prompt is not an utterance. Utterances can stifle crowd worker's creativity because of their specificity. Rather than serve as an example, they instead encourage workers to simply produce slight variations. These redundant phrases add noise to your training corpus and will not improve your skill's cognition.
  • If you have more than one prompt for an intent, vary them. Each prompt in a Paraphrase job is likely to be distributed to different crowd workers. Even if a crowd worker gets more than one prompt from the same Paraphrase job, having a different prompt will change the worker's perspective.
  • Use hints to encourage variation. You can define hints for anything that you want the utterances to include (or not include). As you vary your prompts, you should likewise vary their corresponding hints.
    Prompt Hint
    Create an expense for airport parking Include the airport code (SFO, LAX, etc.), a full date (dd/mm/yyyy), and an amount in US dollars.
    Create an expense for a meal Include the name of the restaurant, a full date (dd/mm/yyyy), and an amount in US dollars.

Create the Paraphrasing Job

You can create a Paraphrasing from a CSV file that contains the intent names and their corresponding prompts and hints, or by adding individual prompts and hints for a selected intent. You can use either method, but you can't combine the two.
  1. If you haven't created any jobs yet, click Add Job (either in the landing page if you haven’t created any jobs yet, or in the Jobs tab if there are existing jobs).
  2. Select Paraphrasing.
  3. Enter a job name.
  4. Select the language that's used by the crowd workers. By default, the skill's predominant language displays, but you can choose from other natively supported langauges that have been set for the skill.
  5. Add your prompts and hints (which are optional) for the intents. You can create these offline in a CSV file with columns named intentName, prompt, and hint, and add them as a batch, or you can them one by one with the New Job dialog. You can add these columns in any order in the CSV file.

    Note:

    If you selected a language other than the predominant language of the skill, then your prompts and hints must also be in that language.
  6. If you've added the prompts, hints, and intent names to a CSV, click Upload, then browse to, and select, the file. Then click Continue.
  7. If needed, add additional prompts This is an image of the Add icon., edit This is an image of the Edit ML Entity icon, or delete This is an image of the Delete icon. prompts where needed, or change the number of utterances per prompt. Click Launch.

  8. To create a Paraphrase job manually, click Select.
  9. Click the Intents field to select and intent from the menu, or click the Select all ... intents option if you want to add prompts to your entire set of intents.

  10. Click Continue.
  11. Click within the Prompt field, or click Edit This is an image of the Edit ML Entity icon, to enter your prompt.
  12. Click within the Hint field, or click Edit This is an image of the Edit ML Entity icon, to enter your prompt.
    • Click Add This is an image of the Add icon. to create another prompt. Keep in mind that each new prompt is potentially a separate job, handled by a different crowd worker.
    • If needed, delete or revise any prompts or hints.
  13. Select the number of paraphrases per prompt.
  14. When you’re finished, click Launch.

    The Paraphrasing job displays as a new row in the Jobs page with its status noted as Running.
  15. Click Copy Link in the row, then paste the link into an email that you broadcast to the crowd. Crowd workers accept the job by clicking this link. After a crowd worker accepts the job, he or she reviews general rules for Paraphrasing jobs.

    Note:

    The locale of the worker's browser is set to the language selected in the Create Job dialog.
    Workers then submit their paraphrases.

    You can monitor the progress of the running in real time from the Jobs page.

Review the Paraphrasing Job

Before adding the utterances to your training set, you probably want to review them for semantics, misspellings, or spam.

To review the Paraphrasing job:
  1. You can wait for a job to complete, or if you believe that all contributions have been made to a running job, click Cancel in the Jobs page.
  2. Click Results. Only canceled or completed jobs display in the Results page.

  3. Click View for a read only view of the utterances. Using the options in this dialog, you can download the job as a CSV, or accept or reject it in its entirety.

    Before committing to rejecting or accepting all of the jobs tasks at once (which can't be undone automatically), you may instead want to download the job and clean up the results before adding them to the training set. If you follow this route, click Download either in this dialog or in the Results page.

    Tip:

    Before you can't easily remove utterances after you've accepted them, you may want to create a new version of your skill, or clone it as a precaution.
  4. Open the CSV file with spreadsheet program.
    Review the utterances in the result column against the IntentName and prompt columns. Update the utterances in the result column where needed or delete an entire row (or rows). If the utterance is beyond repair, you can delete the entire row. If a crowd worker repeatedly enters bad utterances because he didn't understand the prompts or follow the general paraphrasing guidelines, then you can sort the worksheet by the contributor column and then delete rows. If you do delete a row, be sure to delete it completely. You won't be able to upload the file otherwise.

    Tip:

    You only need to focus on the intentName, prompt, result, and contributor columns of the spreadsheet. You can ignore the others.


  5. When you're finished, click Upload in the Results page. Browse to, then select, the CSV file. Select Intent Paraphrasing, enter a name, then click Upload.

  6. If you want to add the utterances to an intent's training data, click Accept in the Results page. Click Reject if you don't want to add them to the training set. You might want to reject a job if it's a test, or if it can't be salvaged because of ill-conceived prompts and hints.

Annotation Jobs

Whenever you have chat data that needs to be mapped to an intent or annotated for ML Entities, you can create an Annotation job. Workers complete annotation jobs for intents by matching an utterance to an intent. For entity annotation jobs, workers label the text in the utterance for an ML Entity. You can create these jobs using a CSV file with an utterance column, previously completed annotation jobs, or by combining the two approaches. You can also create an intent annotation job from the utterances collected in the Retrainer.
Description of annotation_csv.png follows
Description of the illustration annotation_csv.png

Create the Intent Annotation Job

  1. Click + New Job in the Jobs page.
  2. Select Intent Annotation.
  3. Enter a name.
  4. Enter the language that's used by the crowd workers.
  5. Upload the file, click Continue, note the number of items for the job, then click Launch.
  6. Click Copy Link and then paste the link into an email that’s broadcast to crowd workers. Workers accept the job by clicking this link. After they sign in, crowd workers review basic rules on how to classify utterances.

    Active learning helps crowd workers out by ranking all of the skill's intents by their likelihood to match the utterance. The intent that has the highest potential to match the utterance is first. Likewise, the utterances that are currently in the corpus, which crowd workers use as a guide, are also ranked by their likelihood to match the utterance.

    You can monitor the progress in the Jobs page.

Review the Annotation Job

  1. After the job has completed, or when you've clicked Cancel because you think that job is as complete as it can be, click View.

  2. If you disagree with some of the crowd worker's decisions, click Download to download a CSV of the results to your local system.
  3. In the CSV, enter the intent name that you expect in the intentName column.
  4. Override the intent chosen by the crowd worker in the result column by entering the Conversation Name for the intent that you entered in the intentName column.
  5. When you've finished your review, click Upload in the Results page. Then select the file, enter a name, and then click Upload.

    Note:

    You can't remove an entry from the results. The results retain all of the entries, even if you delete a row from the CSV before you upload. If you want to remove bad entries (because you don't want to reject the entire job), then you need to create a separate set of results that do not belong to any job by removing the contents from the jobId and Id columns before you upload the file.
    The results will be merged into the current job.
  6. Retrain the skill.

    Note:

    Only the utterances that match an intent get added to the training data. The ones classified as None of these Intents or I’m not sure are excluded.

Create the Entity Annotation Job

Your skill needs at least one ML Entity for this job. You can't create an Entity Annotation Job with non-ML Entities.

  1. Click + New Job in the Jobs page.
  2. Select Entity Annotation.
  3. Enter a name.
  4. Enter the language that's used by the crowd workers.
  5. Select the ML Entity (or ML Entities) that crowd workers will select from. Ideally, these entities will have helpful names and succinct descriptions.
  6. If this is your first Entity Annotation job, browse to, then select a CSV file. You can provide workers with either annotated or unannotated utterances, depending on the format of this file:
    • For unannotated utterances, upload a CSV that organizes the plain utterances under a single column, utterance:
      utterance
      I want to order a family size pepperoni pizza with thin crust and mozzarella cheese
      I want to order a large supreme pizza with regular crust and provolone cheese
      I want to order a medium size meat-lover pizza with gluten-free crust and goat cheese
      
    • For annotated utterances, upload a CSV with a single column, annotation with each utterance represented as a JSON object. The beginOffset and endOffset properties represent the beginning and end of the text labeled for the ML Entity. Create ML Entities describes the other properties in this object.
      annotation
      "[
         {
            ""Utterance"":{
               ""utterance"":""I want to order a family size pepperoni pizza with thin crust and mozzarella cheese"",
               ""languageTag"":""en"",
               ""entities"":[
                  {
                     ""entityValue"":""family"",
                     ""entityName"":""MLPizzaCrust"",
                     ""beginOffset"":18,
                     ""endOffset"":24
                  },
                  {
                     ""entityValue"":""mozzarella"",
                     ""entityName"":""MLCheeseType"",
                     ""beginOffset"":66,
                     ""endOffset"":76
                  },
                  {
                     ""entityValue"":""pepperoni"",
                     ""entityName"":""MLPizzaType"",
                     ""beginOffset"":30,
                     ""endOffset"":39
                  }
               ]
            }
         }
      ]"
      "[
         {
            ""Utterance"":{
               ""utterance"":""I want to order a large supreme pizza with regular crust and provolone cheese"",
               ""languageTag"":""en"",
               ""entities"":[
                  {
                     ""entityValue"":""supreme"",
                     ""entityName"":""MLPizzaType"",
                     ""beginOffset"":24,
                     ""endOffset"":31
                  },
                  {
                     ""entityValue"":""provolone"",
                     ""entityName"":""MLCheeseType"",
                     ""beginOffset"":61,
                     ""endOffset"":70
                  },
                  {
                     ""entityValue"":""regular"",
                     ""entityName"":""MLPizzaCrust"",
                     ""beginOffset"":43,
                     ""endOffset"":50
                  },
                  {
                     ""entityValue"":""large"",
                     ""entityName"":""MLPizzaSize"",
                     ""beginOffset"":18,
                     ""endOffset"":23
                  }
               ]
            }
         }
      ]"
      Crowd workers will review the existing labels defined by these offsets, and change them when they're incorrect.
    You can combine previously completed annotation jobs into a single job, and also combine CSVs with completed annotation jobs. If you're adding a prior job, then some of the utterances will already be annotated.

  7. Click Continue, verify the number of records, then Launch.
  8. Copy then paste the link into an email that’s broadcast to crowd workers. Workers accept the job by clicking this link. Before they begin labeling the utterances with annotations, they review basic rules on how to label content with annotations. If the utterance includes text that matches one the ML Entities listed in the page, a crowd worker highlights the applicable text and applies the ML Entity label. If the utterance is already annotated, workers can review the labels and adjust them when needed.

  9. When the job completes (either because workers have completed the annotations or because you canceled it), you can view the results and accept it into the ML Entity's training corpus.

    Before adding the results, however, you can have crowd workers verify them by launching an Entity Validation Job. Only the correct results from a validation job are added to the corpus. If needed, you can make additional corrections and additions to the job results in the ML Entity's Dataset tab.

Validation Jobs

For Validation jobs, crowd workers review the results from Paraphrase jobs, Entity Annotation jobs, or Intent Validation jobs generated from the Retrainer. To validate a Paraphrase Job, they compare the utterances (the results of a Paraphrasing job or from ) against a task, the Paraphrasing job's prompt. For Entity Annotation Jobs, they review the utterances to ensure that the correct ML entity has been identified and that the text has been labeled completely.

Create an Intent Paraphrasing Validation Job

  1. Click Add Job in the Jobs page.
  2. Select Intent Paraphrase Validation.
  3. Enter a name.
  4. Enter the language that's used by the crowd workers.
  5. Add Paraphrasing jobs that have not yet been accepted (that is, the Finished or Canceled jobs). You can either upload a CSV file from your local system, select one or more Paraphrasing jobs, or create a job from both.
  6. Click Continue, verify the number of records, then Launch.

  7. After the row for the Validation job is added to the Jobs page (you may need to click Refresh ), click Copy Link.

  8. Paste the link into an email that's broadcast to crowd workers. After the crowd workers accept the job, they review basic rules for evaluating the utterances.

    They then evaluate an utterance.

    You can monitor the worker’s progress from the Jobs page.

Review a Validation Job

After workers complete the Validation job (or if you cancel the job because it's close enough to completion), you can view it, accept or reject the job in its entirety. Even though Validation jobs may contain thousands of results (which is why you'd source them to the crowd), you may still want to review them individually. For example, you might see answers in the View Results dialog that you disagree with. You'll want to change or remove them before committing the results to your training data.
Description of review_validation_job.png follows
Description of the illustration review_validation_job.png

To edit the results one by one:
  1. Download the job as a CSV, either from the View Results dialog, or by clicking Download in the Results page.
  2. Open the CSV in a spreadsheet program.
  3. Compare the entries in the IntentName and prompt columns and then change the results entry when needed. You can edit just this column, or remove an entire row.

    In general, you only need to focus on these three columns. That said, you can sort by the contributor column to isolate the work of a particular crowd worker. If this worker's decisions are consistently unreliable, then you can delete all of the rows for this contributor.

    Note:

    If you delete a row, make sure that you delete it entirely. You can't upload a CSV with a partial row.
  4. When you’re finished, click Upload in the Results page. Browse to, then select, the CSV file. Select Validation, enter a name, then click Upload.
  5. Click Accept or Reject. If you accept the job, only the "correct" utterances are added to the training set. You can't undo this operation. You can only remove these utterances manually.
  6. Retrain the skill.

Create an Entity Annotation Validation Job

  1. Click New Job in the Jobs page.
  2. Select Entity Validation.
  3. Enter a name.
  4. Enter the language that's used by the crowd workers.
  5. You can either upload a CSV file from your local system, select one or more completed Entity Annotation Jobs (which includes jobs that workers have completed or that you canceled), or combine them to create a single job. The CSV has the same format as the one used to add annotated utterances to an Entity Annotation Job: it has the single annotation column and JSON objects for utterances:
    annotation
    "[
       {
          ""Utterance"":{
             ""utterance"":""I want to order a family size pepperoni pizza with thin crust and mozzarella cheese"",
             ""languageTag"":""en"",
             ""entities"":[
                {
                   ""entityValue"":""family"",
                   ""entityName"":""MLPizzaCrust"",
                   ""beginOffset"":18,
                   ""endOffset"":24
                },
                {
                   ""entityValue"":""mozzarella"",
                   ""entityName"":""MLCheeseType"",
                   ""beginOffset"":66,
                   ""endOffset"":76
                },
                {
                   ""entityValue"":""pepperoni"",
                   ""entityName"":""MLPizzaType"",
                   ""beginOffset"":30,
                   ""endOffset"":39
                }
             ]
          }
       }
    ]"
    ...
  6. Click Continue, verify the number of records, then click Launch.

  7. Paste the link into an email that's broadcast to crowd workers.

    After the crowd workers accept the job, they review basic rules for evaluating the annotations. From there, they review the annotations by classifying them as correct, incorrect, or not sure.

    You can monitor the worker’s progress from the Jobs page. When the job completes, you can review the results before accepting or rejecting it.

    By clicking Accept, you add the correct results to the ML Entity's training set. If needed you can edit them further in the Dataset tab.

Create Test Suites

You can create Test Cases from the results of Intent Annotation and Intent Validation jobs.
  1. Select the report in the Results page, then Click menu icon
  2. select Test Suite.
  3. Complete the dialog by giving the test suite and name and selecting the language that the utterances will be tested in. Then click Create.
  4. Open the Utterance Tester to run the test suite.