Siebel Smart Answer Guide > Using the Siebel Smart Answer Administration Tool >

Building a Corpus


Building a corpus is one of the most critical parts of the knowledge base model building process since the quality of the corpus determines the quality of the performance of Smart Answer.

Building, Preparing, and Importing a Corpus for the Siebel Smart Answer Knowledge Base Model

Use the following procedure to build, prepare, and import a corpus for the Siebel Smart Answer knowledge base model.

To build, prepare, and import a corpus for the Siebel Smart Answer knowledge base model

  1. Carefully select emails and any other documents that represent your business domain to ensure that Siebel Smart Answer has enough content to learn the concept models for the categories you will later use as part of the Siebel Call Center. The documents can be of mixed language and mixed format.

    The supported formats include:

    • Raw Text
    • HTML
    • Microsoft Word
    • PDF
    • PostScript
    • XML
    • E-mail
    • CSV
    • ENV

      The supported languages include:

    • English
    • French
    • Italian
    • German
    • Spanish

      NOTE:  The corpus can be heterogeneous and it is not required for the corpus to contain documents in the same format or in the same language.

  2. Navigate to the Corpus Tab and then click the Import button. A popup box will appear that requires the name of the corpus, the location of the corpus, and the file type of the corpus. Create a new Corpus by completing the necessary fields.

    Some fields are described in the following table:

    Corpus Import Fields
    Input Type

    Input knowledge base

    Directory or File

    Type

    Drop Down List

    • During the import process, a status screen will appear that indicates the number of entries processed. When the import is complete if there were any errors, both the file and line number of the file will be listed. Carefully read the error screen and correct each error by either removing the files or editing them so that they are compatible with the expected format.
    • After the errors are corrected, it is not necessary to re-import the entire corpus. Instead navigate to the Corpus Browse tab and select Add Entry for each entry that had errors. This completes the process of preparing and importing a corpus.
  3. After the corpus is imported the corpus entries are summarized on the Corpus Browse tab. Each corpus entry is parsed so that multiple fields are associated with it. These include, a unique identifier, the language of the corpus entry, an optional Categories, To, From, and Subject, and then the Message field. In the case the entry is an email, the To, From, Subject, and Message will all be populated. For non-email corpus entries all content will be associated with the Message field.
Siebel Smart Answer Guide Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Legal Notices.