Oracle® Enterprise Data Quality for Product Data Knowledge Studio Reference Guide Release 5.6.2 Part Number E23610-03 |
|
|
View PDF |
This chapter describes techniques and information related to the classification of data using the Knowledge Studio.
This section describes the various classification functions that you can use to narrow the classification of your data, including an example.
The Addition function is intended to include two or more grammars whose union defines the classification of the item.
For example, 'Power Nail Stapler' versus 'Paper Stapler'. Stapler may be enough to classify the office product Stapler though an additional attribute is needed to correctly classify Power Nail Stapler as a power tool. So you would include Nail with the item Stapler.
The Masking function is intended to disqualify all grammars below the masked phrase. It is typically used for inclusions that are not part of the primary item to be classified.
For example, 'Drill with Charger'. Here the item is Drill and not Charger. An example follows showing the use of Masking and the associated grammars.
The Negation function is intended to disqualify all grammars where the inclusion or preposition is implied but not stated.
For example, 'Toner Cartridge HP Printer'. The item is a Toner Cartridge not a printer.
The Parent function is intended to reference a grammar at a higher level in the classification tree. Its application is to apply inheritance from the high level to a lower level where other discriminating attributes are defined.
For example, resistors contain both Fix and Variable types. The [product_resistor]
term would reside at the resistor level in the schema and variable or fixed would reside at a lower level in the tree. The connection between [product_resistor]
and [attr_variable]
is through the term $parent
+ [attr_variable]
where $parent
references [product_resistor
. This is useful for bulk classifying data at a higher level to get an initial classification, and then refining the classification at a later stage.
This example shows a collection of items that pose complications in classification. The use of the previously described functions removes ambiguities allowing each item to be uniquely classified.
To add masking drag the root level grammar to be masked over the mask icon at the top of the classification tree.
All grammars that appear under this grammar will then be hidden from further classification. Any grammar that is hidden under the masked grammar though is visible in other phrase structures can be used for classification.
You can configure as many classification types as you need using the Classification Type feature. The benefit is that the phrases and terms defined in a single data lens project can be reused to define any number of classification results.
For example, you may want to classify data to an UNSPSC schema and simultaneously to an eCl@ss schema or to a user-defined schema.
The process for creating a new Classification Type. For more information, see "Classification Type".
You should apply the following considerations when configuring multiple classification schemas:
When creating a name for the new classification type, you should include the classification version number information in the name to enable differentiation. For example, when using UNSPSC 11.1, use a name that is similar, like UNSPSC_11_1.
To reuse the rules already created in a previous classification type, select the 'Base classifications on other classifications' check box, and then select the classification type on which you want to base the new type.
At some point, you may have the need to upgrade from one classification type to a newer version of that same classification type. For example, you can upgrade from UNSPSC classification version 9.2 to the newer version 11.1. You can upgrade to a newer classification type and retain all of the knowledge in the previous version.
Upgrading from one classification type to a new type requires basing the new type on the existing type. When using the process to create a new classification type, ensure that you select the Base classifications on other classifications check box so that you can select the appropriate schema to base the new type on from the Based On: list. For more information, see "Classification Type".
If Knowledge Studio encounters a classification mapping that existed in the previous version that no longer exists in the new version, a message is display that indicate the nature of change in category structure.
Both classification schemas are loaded and you can toggle between the two by using the black arrows.
The Knowledge Studio allows you to create your own custom schema that can be used to auto-classify; this is known as a User-Defined Classification Type.
The following example creates a Parent/Category/Description schema:
Open Excel and create a new spreadsheet.
In the first row, create a header consisting of the following three columns:
Enter your schema hierarchy into each column as appropriate. The Parent column can be left blank for the highest tree nodes; however, it must be entered for all children nodes.
Save the spreadsheet as a comma delimited file, from the File menu, select Save as….
From the Save as type list, select CSV (comma delimited) (*.csv
).
Enter a file name and click Save.
The schema file you have just created is saved as a comma delimited file.
You can use any text editor to create a comma-delimited file that contains the same information as described in the previous section. Ensure that the first line of this file must contain the following header information:
Parent, Category, Description
A simpler Parent/Category schema can be created in the same manner if you have a classification with no codes or the category is the code.
You create a new User-Defined Classification Type as would any other type though you choose the comma delimited file that you created as the Master Classification File. The following is an example of what a user-defined schema might look like:
For information about creating Classification Types, see "Creating a Classification Type".
When creating user-defined classification schemas, you can make this it global. This allows the update by a single user of the classification schema and the changes are made available to any data lens that is using that user-defined classification schema.
When adding a new user-defined classification type, ensure that you select the Global Classification File check box.
For information about creating Classification Types, see "Creating a Classification Type".
The Knowledge Studio supports extensions to the UNSPSC part classification system at the vendor specific level (level 5). Consult the Oracle Consulting Services customer training for details on how to create these extensions.
If your company has proprietary or internally developed classification systems, the Knowledge Studio has a format that allows these schemas to be imported. This format allows up to five levels of classification hierarchy. Consult the Oracle Consulting Services customer training for details on how to create these extensions.