Configuration

You can use the Configuration page to upload and download data sets, create and edit a taxonomy, create and improve a knowledge base, define additional attributes, run classifications, and monitor various activities.

On the Configuration page you can:

  1. Set up the taxonomy you want to use for classifying your spend transactions. This could be your existing purchasing categories, a revised version of those categories, or a brand new taxonomy.
  2. Create a training set. A training set is the sample of correctly classified data that's used by the spend classification machine learning engine to make category predictions based on the data in each individual spend record.
  3. Build a knowledge base from the training data set. A knowledge base uses an algorithm to act on a set of data using patterns identified within the training set.
  4. After the knowledge base is created and tested, you can use Spend Classification to process any number of batches that contain wrongly categorized data or data that's not classified.
  5. Review the batch results. Spend Classification tags all predictions with high, medium, or low confidence levels to help during assessment. You can make any required corrections and approve the batch.

    Before you can view the details of a batch, run the ESS job to create index definition and perform initial ingest to OSCS scheduled process with fa-prc-poi as the input parameter. This process ensures creation of an index for the index-based search engine provided by the Oracle Search Cloud Service (OSCS). After this process is complete, you can use the smart filter for a focused review of the batch. You can also use filter chips to search for transactions with specific categories or a different classification status.

Note: For access to Oracle Spend Classification, you need a configured job role that has these privileges:
  • Administer Spend Classification Application (POI_ADMINISTER_SPEND_CLASSIFICATION_PRIV)
  • Manage Spend Classification Batch (POI_MANAGE_SPEND_CLASSIFICATION_BATCH_PRIV)
  • View Spend Classification Work Area (POI_SPEND_CLASSIFICATION_WORKAREA_PRIV).

Classification Controls

These are the values that can be configured in the Classification Controls tab:

Parameter

Description

Confidence Threshold Percentage

Percentage value that indicates the confidence value that categorizes the transaction.

  • Classified with high confidence: Transactions where confidence of the predicted category is more than the value specified by Confidence Threshold Percentage, at all levels of the taxonomy.

  • Classified with medium confidence: Transactions where confidence of the predicted category is more than the value specified by Confidence Threshold Percentage, at least at one level of taxonomy but not all levels.

  • Classified with low confidence: Transactions where confidence of the predicted category is less than the value specified by Confidence Threshold Percentage, at all levels of the taxonomy.

The default value is 70.

Lexer Name

Name of the lexer used to enhance the usability of the data mining process. It is used to identify how to break the strings in a transaction using keywords and use them in data mining. You can use:

  • Basic lexer: Includes data in English. It's the default value.
  • World lexer: Includes data in languages other than English.

Lexer Parameters

Parameters that indicate the keywords used in data mining. The defaults are:
  • INDEX_STEMS,ENGLISH, INDEX_TEXT,YES for basic lexer
  • MIXED_CASE,NO for world lexer.

Classifying Datasets with a Printjoins-Enabled Knowledge Base

Printjoins characters are nonalphanumeric characters that you want to include in index tokens, so that compound words such as read-only are indexed as read-only not as two separate words, read and only. The Printjoins setting can be specified in classification controls at the lexer level.

Note: This setting should be managed by advanced users like expert data scientists. Because it’s an optional setting, you can continue working with Spend Classification without using this advanced feature.

This example explains the use of Printjoins and its impact on the classification process.

  1. Go to the Classification Controls tab and enter or select these values:
    • Classification Threshold Percentage: 70
    • Lexer Name: Basic
    • Lexer Parameters: INDEX_STEMS,ENGLISH, INDEX_TEXT,YES, PRINTJOINS,-

    In the Lexer Parameters field, we entered PRINTJOINS,- so that the hyphen is not considered as a word separator.

  2. Upload a training data set with transactions that contain special characters which you want to skip while tokenization. Here’s a snippet of the Spend001 training data set that uses hyphens in transaction descriptions:
    Source Transaction ID Data Set Identifier Data Set Purpose Transaction Number Line Number Transaction Description Category
    10001 Spend001 Training 30049 1 Laptop purchase for new hire emp long-term onsite assignment Hardware
    10002 Spend001 Training 89987 1 Consulting services for term sheet prep and year end reporting Consulting
    10003 Spend001 Training 87782 1 Retainers for a corporate event Legal Services
    10004 Spend001 Training 99012 1 Travel tickets for a long-term onsite assignment Project Expenditure
  3. On the Knowledge Base tab, create a new knowledge base with the name KB001.
  4. On the Data Sets tab, select your training data set Spend001 and from the options and click Classify. In the Classify Data Set window, select the KB001 knowledge base to classify the data set and initiate the classification process.

Here's an analysis of the classification result:

Transaction Description Significant Keywords Category Prediction without Printjoins Category Prediction with Printjoins
Per diem for long-term project services (year-long) term, services, year Consulting
Per diem for long-term project services (year-long) long-term, employee Hardware

With the use of the Printjoins lexer parameter, hyphenated words like long-term were not broken down to individual keywords and the category prediction was more accurate.