Configuration

You can use the Configuration page to upload and download data sets, create and edit a taxonomy, create and improve a knowledge base, define additional attributes, run classifications, and monitor various activities.

On the Configuration page you can:

Set up the taxonomy you want to use for classifying your spend transactions. This could be your existing purchasing categories, a revised version of those categories, or a brand new taxonomy.
Create a training set. A training set is the sample of correctly classified data that's used by the spend classification machine learning engine to make category predictions based on the data in each individual spend record.
Build a knowledge base from the training data set. A knowledge base uses an algorithm to act on a set of data using patterns identified within the training set.
After the knowledge base is created and tested, you can use Spend Classification to process any number of batches that contain wrongly categorized data or data that's not classified.
Review the batch results. Spend Classification tags all predictions with high, medium, or low confidence levels to help during assessment. You can make any required corrections and approve the batch.
Before you can view the details of a batch, run the ESS job to create index definition and perform initial ingest to OSCS scheduled process with fa-prc-poi as the input parameter. This process ensures creation of an index for the index-based search engine provided by the Oracle Search Cloud Service (OSCS). After this process is complete, you can use the smart filter for a focused review of the batch. You can also use filter chips to search for transactions with specific categories or a different classification status.

Note: For access to Oracle Spend Classification, you need a configured job role that has these privileges:

Administer Spend Classification Application (POI_ADMINISTER_SPEND_CLASSIFICATION_PRIV)
Manage Spend Classification Batch (POI_MANAGE_SPEND_CLASSIFICATION_BATCH_PRIV)
View Spend Classification Work Area (POI_SPEND_CLASSIFICATION_WORKAREA_PRIV).

Classification Controls

These are the values that can be configured in the Classification Controls tab:

Parameter	Description
Confidence Threshold Percentage	Percentage value that indicates the confidence value that categorizes the transaction. Classified with high confidence: Transactions where confidence of the predicted category is more than the value specified by Confidence Threshold Percentage, at all levels of the taxonomy. Classified with medium confidence: Transactions where confidence of the predicted category is more than the value specified by Confidence Threshold Percentage, at least at one level of taxonomy but not all levels. Classified with low confidence: Transactions where confidence of the predicted category is less than the value specified by Confidence Threshold Percentage, at all levels of the taxonomy. The default value is 70.
Lexer Name	Name of the lexer used to enhance the usability of the data mining process. It is used to identify how to break the strings in a transaction using keywords and use them in data mining. You can use: Basic lexer: Includes data in English. It's the default value. World lexer: Includes data in languages other than English.
Lexer Parameters	Parameters that indicate the keywords used in data mining. The defaults are: INDEX_STEMS,ENGLISH, INDEX_TEXT,YES for basic lexer MIXED_CASE,NO for world lexer.

Parameter

Description

Confidence Threshold Percentage

Percentage value that indicates the confidence value that categorizes the transaction.

Classified with high confidence: Transactions where confidence of the predicted category is more than the value specified by Confidence Threshold Percentage, at all levels of the taxonomy.
Classified with medium confidence: Transactions where confidence of the predicted category is more than the value specified by Confidence Threshold Percentage, at least at one level of taxonomy but not all levels.
Classified with low confidence: Transactions where confidence of the predicted category is less than the value specified by Confidence Threshold Percentage, at all levels of the taxonomy.

The default value is 70.

Lexer Name

Name of the lexer used to enhance the usability of the data mining process. It is used to identify how to break the strings in a transaction using keywords and use them in data mining. You can use:

Basic lexer: Includes data in English. It's the default value.
World lexer: Includes data in languages other than English.

Lexer Parameters

Parameters that indicate the keywords used in data mining. The defaults are:

INDEX_STEMS,ENGLISH, INDEX_TEXT,YES for basic lexer
MIXED_CASE,NO for world lexer.

Classifying Datasets with a Printjoins-Enabled Knowledge Base

Printjoins characters are nonalphanumeric characters that you want to include in index tokens, so that compound words such as read-only are indexed as read-only not as two separate words, read and only. The Printjoins setting can be specified in classification controls at the lexer level.

Note: This setting should be managed by advanced users like expert data scientists. Because it’s an optional setting, you can continue working with Spend Classification without using this advanced feature.

This example explains the use of Printjoins and its impact on the classification process.

Go to the Classification Controls tab and enter or select these values:
- Classification Threshold Percentage: 70
- Lexer Name: Basic
- Lexer Parameters: INDEX_STEMS,ENGLISH, INDEX_TEXT,YES, PRINTJOINS,-
In the Lexer Parameters field, we entered PRINTJOINS,- so that the hyphen is not considered as a word separator.

Upload a training data set with transactions that contain special characters which you want to skip while tokenization. Here’s a snippet of the Spend001 training data set that uses hyphens in transaction descriptions:

Source Transaction ID	Data Set Identifier	Data Set Purpose	Transaction Number	Line Number	Transaction Description	Category
10001	Spend001	Training	30049	1	Laptop purchase for new hire emp long-term onsite assignment	Hardware
10002	Spend001	Training	89987	1	Consulting services for term sheet prep and year end reporting	Consulting
10003	Spend001	Training	87782	1	Retainers for a corporate event	Legal Services
10004	Spend001	Training	99012	1	Travel tickets for a long-term onsite assignment	Project Expenditure

On the Knowledge Base tab, create a new knowledge base with the name KB001.
On the Data Sets tab, select your training data set Spend001 and from the options and click Classify. In the Classify Data Set window, select the KB001 knowledge base to classify the data set and initiate the classification process.

Here's an analysis of the classification result:

Transaction Description	Significant Keywords	Category Prediction without Printjoins	Category Prediction with Printjoins
Per diem for long-term project services (year-long)	term, services, year	Consulting
Per diem for long-term project services (year-long)	long-term, employee		Hardware

With the use of the Printjoins lexer parameter, hyphenated words like long-term were not broken down to individual keywords and the category prediction was more accurate.