Oracle® Enterprise Data Quality for Product Data Knowledge Studio Reference Guide Release 11g R1 (11.1.1.6) Part Number E29134-02 |
|
|
PDF · Mobi · ePub |
This chapter describes techniques and information related to the classification of data using the Knowledge Studio.
This section describes the various classification functions that you can use to narrow the classification of your data, including an example.
The Addition function is intended to include two or more grammars whose union defines the classification of the item.
For example, 'Power Nail Stapler' versus 'Paper Stapler'. Stapler may be enough to classify the office product Stapler though an additional attribute is needed to correctly classify Power Nail Stapler as a power tool. So you would include Nail with the item Stapler.
The Masking function is intended to disqualify all grammars below the masked phrase. It is typically used for inclusions that are not part of the primary item to be classified.
For example, 'Drill with Charger'. Here the item is Drill and not Charger. An example follows showing the use of Masking and the associated grammars.
The Negation function is intended to disqualify all grammars where the inclusion or preposition is implied but not stated.
For example, 'Toner Cartridge HP Printer'. The item is a Toner Cartridge not a printer.
The Parent function is intended to reference a grammar at a higher level in the classification tree. Its application is to apply inheritance from the high level to a lower level where other discriminating attributes are defined.
For example, resistors contain both Fix and Variable types. The [product_resistor]
term would reside at the resistor level in the schema and variable or fixed would reside at a lower level in the tree. The connection between [product_resistor]
and [attr_variable]
is through the term $parent
+ [attr_variable]
where $parent
references [product_resistor
. This is useful for bulk classifying data at a higher level to get an initial classification, and then refining the classification at a later stage.
This example shows a collection of items that pose complications in classification. The use of the previously described functions removes ambiguities allowing each item to be uniquely classified.
To add masking drag the root level grammar to be masked over the mask icon at the top of the classification tree.
All grammars that appear under this grammar will then be hidden from further classification. Any grammar that is hidden under the masked grammar though is visible in other phrase structures can be used for classification.
You can configure as many classification types as you need using the Classification Type feature.
For example, you may want to classify data to an UNSPSC schema and simultaneously to an eCl@ss schema or to a user-defined schema.
You should apply the following considerations when using several schemas in a single data lens:
When creating a name for the new classification type, you should include the classification version number information in the name to enable differentiation. For example, when using UNSPSC 11.1, use a name that is similar, like UNSPSC_11_1.
To reuse the rules already created in a previous classification type, select the Base classifications on other classifications check box, and then select the classification type on which you want to base the new type.
At some point, you may have the need to upgrade from one classification type to a newer version of that same classification type. For example, you can upgrade from UNSPSC classification version 9.2 to the newer version 11.1. You can upgrade to a newer classification type and retain all of the knowledge in the previous version.
Upgrading from one classification type to a new type requires basing the new type on the existing type. When using the process to create a new classification type, ensure that you select the Base classifications on other classifications check box so that you can select the appropriate schema to base the new type on from the Based On: list. See "Classification Type".
If Knowledge Studio encounters a classification mapping that existed in the previous version that no longer exists in the new version, a message is display that indicate the nature of change in category structure.
Both classification schemas are loaded and you can toggle between the two by using the black arrows.
The Knowledge Studio allows you to create your own custom schema that can be used to auto-classify; this is known as a User-Defined Classification Type. You can use one of the template schemas as a master classification file when creating your new classification type or you can modify the examples in Excel directly to add your data. You must use the correct header row as shown from one of the preceding templates.A set of Excel (.cvs
) templates are delivered with Knowledge Studio as follows:
Defines parent and child Item Definitions by ID.
Defines parent and child Item Definitions by name.
Defines all levels of an Item Definitions by name.
Defines all levels of an Item Definitions by name then ID.
Defines all levels of an Item Definitions by ID.
Defines all levels of an Item Definitions by ID then name.
These Excel template files are installed on the user's system in the C:\Users\
user_name
\AppData\Roaming\DataLens\system\schemas
directory.
The following example creates a parent and child id schema:
Open Excel to a new worksheet or open the UserDefined_Parent_Child_IDs_Template.cvs
template file.
In the first row, ensure that the first row of the columns A - C are Parent, Category, and Description respectively as in the following:
Enter your schema hierarchy into each column as appropriate. The Parent column can be left blank for the highest tree nodes; however, it must be entered for all children nodes as in the template.
Save the spreadsheet as a comma delimited file, from the File menu, select Save as….
From the Save as type list, select CSV (comma delimited) (*.csv).
Enter a file name and click Save.
The schema file you have just created is saved as a comma delimited file. You can use it to create a new classification type. See "Classification Type".
You can use any text editor to create a comma-delimited file that contains the same information as described in the previous section. Ensure that the first line of this file must contain the following header information:
Parent, Category, Description
A simpler Parent/Category schema can be created in the same manner if you have a classification with no codes or the category is the code.
You create a new User-Defined Classification Type as would any other type though you choose the comma delimited file that you created as the Master Classification File. The following is an example of what a user-defined schema might look like:
For information about creating Classification Types, see "Creating a Classification Type".
You can create a user-defined schema that is global. This allows the update by a single user of the classification schema and the changes are made available to any data lens that is using that user-defined classification schema.
When adding a new user-defined classification type, ensure that you select the Global Classification File check box.
For information about creating Classification Types, see "Creating a Classification Type".