|Oracle® Enterprise Data Quality for Product Data Knowledge Studio Reference Guide
Part Number E23610-03
A Smart Glossary is set of semantic knowledge (phrases and terminology) that can be imported into other data lenses.
Importing Smart Glossaries is an excellent reuse feature. When you import a Smart Glossary your data lens quickly and efficiently gains the phrase and term rules contained in the Smart Glossary. If the Smart Glossary is modified and you import the Smart Glossary again, term and phrase rules in your data lens (whether you have modified them since the import or not) will not be changed. Some standardization rules may be affected, depending on the options you select, as explained later in this section.
You can import a Smart Glossary into a new or existing data lens using the following steps:
From the File menu, click Import Smart Glossaries.
Select one or more of the Smart Glossaries listed. You can use the Ctrl key to discontinuously select items from the list.
Use the options to choose one of the following Standardization Options:
Import new standardization rules only
This option merges new term and phrase rules and new standardization rules from the Smart Glossary with the rules that are already in your target data lens. If you import it again it does not overwrite changes you have made in your target data lens. This is the default.
Merge new standardization productions
This option imports new standardization rules and adds new standardization productions to your target data lens that have been added to the Smart Glossary since last import into your target data lens. If you import it again, the changes in standardization productions in your target lens are preserved.
Replace all standardization rules
This option implements global changes in standardization rules for your target data lens.
The Smart Glossary is imported into your data lens and the knowledge is applied.
You can create a new Smart Glossary using the following steps:
Note:Creating importable Smart Glossaries is an advanced feature. If you receive error messages that you cannot debug when you import the new Smart Glossary, contact Oracle Consulting Services for assistance.
Open the data lens you want to designate as a Smart Glossary.
On the Data Lens menu, click Data Lens Options.
The Data Lens Options dialog is displayed.
Select the Importable Lens check box so that this Smart Glossary can be imported into other data lenses.
Note:To activate the Importable Lens functionality you must contact Oracle Consulting Services.
Check-in the data lens to the Oracle DataLens Server.
The data lens is now importable into other data lenses as a Smart Glossary.
This section describes the Enterprise DQ for Product Smart Glossaries included in the software release. Smart Glossary files are identified with the
DLS_ prefix. Item Definitions have not been used in the Smart Glossaries though they can be imported into data lens that use Item Definitions.
All Smart Glossaries have undergone extensive testing over a large variety of data to enable recognition of the most common relevant forms across the majority of data sets. For your specific data, however, a SME should review recognition output in order to assure results are correct for your purposes.
The Colors Smart Glossary (
DLS_Colors) is designed to help you quickly recognize colors and color families. Colors are organized into color families. A ”Basic Colors” standardization type is provided that allow you to standardize color terms to color families. For example, the color cerulean would be standardized to the blue color family.
The ”Basic Colors” standardization type standardizes each color to one of 11 color families.
The Counts Smart Glossary (
DLS_Counts) is designed to help you quickly recognize counts of specific items such as legs or outlets.
The Counts Smart Glossary recognizes different types of counted items that appear in domains such as electronic components, retail, lighting, and domestic appliances. This smart glossary recognizes integers from small values (such as '2') to large values (such as '12,000'), as well as, alphabetic representations of integers from 'one' to 'twelve'.
The following are examples of the forms recognized:
Terms not included in this smart glossary are those that are found in
DLS_Product_Packaging, such as 'pair', 'item', and 'count'.
Variants for the terms used in this lens are reasonable abbreviated forms as well as likely misspellings.
One known ambiguity has been identified. If the data contains a part number followed by a counted term that appears in
DLS_Counts (for example, 'UPC: 123123123 door'), the part number will not be properly recognized. This is easily fixed by removing the improper phrase structure rule from
The Packaging for Sale Smart Glossary (
DLS_Product_Packaging ) recognizes a set of common packages, quantities, and units used to describe packaging for sale of merchandise. The data lens has been tested against products in a large selection of markets for packaged goods including office supplies, tissues, biowaste disposal products, toys, paper products, household supplies, food, garden supplies, and hand tools.
The Packaging for Sale Smart Glossary recognizes different types of packaging and all combinations of those packaging types, such as tubes per box, boxes per carton, tubes per carton, boxes per case, and so on. It recognizes numerical quantities with and without comma separators ("12,000 or "12000), alphabetical numbers from one to twelve, and alphabetical quantities such as 'pair', 'dozen', 'ream', and 'gross'.
This Smart Glossary recognizes two levels of packaging:
Units per package, such as '18 tubes per box'
Packages per container, such as '28 boxes per case'
All units are standardized to numerals.
If your data requires text-based quantity terms or package types not included in the Smart Glossary rule set, you can easily modify existing rules to accommodate these.
While this Smart Glossary is designed to maximize recognition of packaging units, some items represent packages in one domain and items or products in another. For example, paper products are sometimes produced in sheets that are packaged in pads. If pads represent items rather than packaging, you could easily modify the target data lens to exclude 'pad' as a package type.
This Smart Glossary does not recognize pricing information. Prices are commonly excluded from input data. If you want to accommodate price information in data that includes packaging for sale information, you could add a phrase rule that includes both pricing information and packaging quantity information to differentiate these two types of information. The best practice is to eliminate the price information from data sets.
This smart glossary does not recognize units that are quantified by weight such as '14 ounces per box'. It does not generally recognize mathematical-formula style descriptions such as 'bags per box [=] 15 boxes per case [=] 12'.
The Units of Measure Smart Glossary (
DLS_Units_of_Measure) should provide users with a quick start on detecting the most common units of measure with minimal effort.
This Smart Glossary recognizes a broad range of common units of measure to serve a large number of target markets, including:
Length and distance
Data and data rates
The Units of Measure Smart Glossary also accommodates unit conversion if you need to convert between units of the same type, such as the following:
Length and distance - Meters to feet or inches
Volume - Liters to gallons or quarts
Power - Kilowatts to watts
Resistance - Ohms to kilohms
A number of unavoidable conflicts of terminology or their abbreviations exist within the Smart Glossary for Units of Measure. This means that after import, you might need to either delete some rules or augment the rule set (using additional rules or using Item Definitions) to uniquely identify the desired units in your data as explained in this section.
There are a number of standard abbreviations that are not included in the Smart Glossary to avoid ambiguities with other terms that share the same abbreviation:
M - Used as an abbreviation for megabytes or megabits; applies only to meters
W - Used as an abbreviation for 'width' and for 'watts'; is not included
L - Used as an abbreviation for 'length' and 'liters'; is not included
F - Used as an abbreviation for both Fahrenheit and Farad; is included as an abbreviation for Fahrenheit only
Additionally, your data may include product numbers or product codes that could be detected as units of measure. You can correct this with minimal refactoring of the target data lens, using strategies such as removing unused productions from rules, removing line-initial and line-final quotation marks, or using Item Definitions to differentiate items in their context.
In addition, while C and F are recognized in this smart glossary as abbreviations for the temperature scales Celsius and Fahrenheit, this may occasionally cause unintended results. You can correct this easily by such methods as removing the abbreviations where they are not needed or employing value logic within Item Definitions to rule out invalid temperature ranges. For assistance with setting value logic, contact Oracle Consulting Services.
The Units of Measure Retail Smart Glossary (
DLS_Units_of_Measure_Retail) contains only the units of measure commonly found in retail data, as more fully described in the next section. For recognition of more specialized units of measure, such as farads, picofarads, joules, microhenrys, or awg values, users should import the standard units of measure Smart Glossary,
DLS_Units_of_Measure. Use this smart glossary to recognize units of measure in retail data without adding extra term and phrase rules for less common units of measure.
Some of the types of units this smart glossary recognizes are:
This Smart Glossary is designed for use without
DLS_Units_of_Measure. If you import
DLS_Units_of_Measure_Retail into a data lens into which you have previously imported
DLS_Units_of_Measure, the hierarchical structure of the
DLS_Retail_Units_of_Measure may combine with the hierarchical structure of
In addition, while C and F are recognized in this Smart Glossary as abbreviations for the temperature scales Celsius and Fahrenheit, this can occasionally cause unintended results. You can correct this easily by such methods as removing the abbreviations where they are not needed or employing value logic within Item Definitions to rule out invalid temperature ranges. For assistance with setting value logic, contact Oracle Consulting Services.