Skip Headers
Oracle® Enterprise Data Quality for Product Data Knowledge Studio Reference Guide
Release 5.6.2

Part Number E23610-03
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

11 Standardizing Data Further

This chapter explains techniques and information related to the standardization of product data using the Knowledge Studio.

Term Standardization

The following sections explain how to use rules to further standardize your terms.

Case Replacement

You have the ability to override the case of string replacements. For example, if all string replacements have been set to lower case, and you want to switch to all upper case, you can do this quickly by selecting the Case option in the string replacement tab.

For example, if the term 'resistor' was originally replaced with a proper cased 'Resistor', you can change the term to replace all data with uppercase using the Case option of the terms Rewrite rule.

Surrounding text describes stanfurcase.png.

The result of changing this setting is that all instances of the 'resistor' term are changed to upper case.

Surrounding text describes stanfurcaseresult.png.

Regular Expression Replacement

Regular expression replacement is meant for complex string replacement when the text being replaced may be a variable. The example in this section shows how you can handle an intelligent conversion of two digit years to four digit years and place the years in the correct century based on a predefined boundary.

Suppose that there is a defined rule for two digit years and you want to convert the two digit years to four digit years. To place the years in the correct century, all years from 29 and below should be placed in the 21st century, and all two-digit years 30 or above should be placed in the 20th century.

First, you would create a terminology rule for year using a regular expression as follows:

Surrounding text describes regexexmp.jpg.

Next, you would create a regular expression string replacement to make the two to four digit years similar to the following:

Surrounding text describes stanfurregexp.png.

Note:

This example strings together multiple regular expression replacements to achieve the desired result. Singular regular expression replacements can be used by separating each expression with a semi-colon, ';'.

The example string replacement results in the following standardization result of two digit years:

Surrounding text describes stanfuryear.png.

As expected, the number 20 preceded all two-digit years less than 30, and the number 19 proceeded all two-digit years over 30.

Note:

Unless you are very familiar with regular expressions, the preceding replacements should only be used after training from Oracle Consulting Services.

Note:

For more information, see "Regular Expressions".

Individual Replacement

When performing individual replacements of terms, the case sensitivity of the terminology rule is honored by the replacement rule. For example if the term rule is defined to be case insensitive (the default), and the replacement rule is as follows:

Surrounding text describes stanfurindr1.png.
Surrounding text describes stanfurindr2.png.

The preceding replacement rule replaces both 'RES' and 'res' with the text 'Resistor'.

Multiple Standardization Types

You can configure as many standardization types as needed using the Standardization Types dialog. The benefit is that the phrases and terms defined in a single data lens project can be reused to define any number of standard results.

For example, you may want to have a standardization type that creates a very readable long form description, without abbreviations, for use on your web site. You may also need a short form standardization type that creates an abbreviated description that conforms to the character length limits imposed by a database table.

For more information, see "Creating a Standardization Type".

Unit of Measure Standardization Types

If the standardization requirements need the data to be converted to a single unit of measure this can be handled by using a unit of measure conversion type.

If your data is in multiple units of measures that are defined to terminology and phrase rules, then these rules can be defined to take the different unit of measures and convert them to a single unit of measure to provide a standard description.

For example, data for [a_ec_dimension_inch] can be in inches or foot. When the data is standardized, the required unit of measure is inches. The text is parsed to the phrase to be converted from 'foot' to 'inches' as in the following.

Surrounding text describes uomstantype.jpg.

You can create your own standardization schemas to be used throughout your data lens. Standardization types can be used for a multitude of uses.

Note:

If you change the default integer, fraction, decimal rules that are created by the Knowledge Studio with each data lens, it causes the unit conversion to fail. For example, adding the text 'one' to the [integer] rule.

Creating a Unit Conversion Type

  1. From the Data Lens menu, select Unit Conversion Types….

  2. Click the Add New button.

    Surrounding text describes stanfurunitcontype.png.
  3. Enter the requested information to create your new unit conversion type that will be added as a selection option to the Unit Conversion Types list.

  4. If you already have a unit conversion type created and you want to reuse that knowledge in a new version of the same standardization, select the Base type on other type check box, and then select the appropriate classification type from the Based On: list.

    Note:

    This check box is not active if there are no other unit conversion types.
  5. Click OK.

Like standardization types, there can be multiple unit of measure conversion types associated with a data lens.

Deleting Unit Conversion Types

You can delete standardization types if necessary.

  1. Ensure that you have checked in your latest data lens version.

  2. From the Data Lens menu, select Unit Conversion Types….

  3. Select the standardization type that you want to delete, and then right-click on it.

  4. Click Delete Unit Conversion.

    A deletion verification dialog is displayed.

  5. If you want to delete the selected unit conversion type, click OK otherwise click Cancel.

  6. Click OK.

Creating a Unit Conversion for a Phrase

When creating a unit conversion for a phrase, you must have a number and a unit in every production that you want to unit convert or the conversion will fail.

  1. Select the Standardize tab and the Unit Conversion sub-tab.

  2. Select a phrase production that requires a unit of measure conversion.

    Note:

    Phrases that do not have a Unit of Measure Standardization type associated with them have a round, blue icon next to the phrase. Phrases that have a unit of measure conversion have a round, purple icon. Parent phrases of converted productions change from red boxes to round, yellow icons.
    Surrounding text describes stanfurunitcon2.png.
  3. Click Next to begin creating your unit conversion table

    Surrounding text describes stanfurunitcon3.png.
  4. Click Next to accept the selection of the rule [number] for Number and [u_length] for the Unit to convert.

  5. Select an existing table name or enter in a new table name.

    Surrounding text describes stanfurunitcon4.png.
  6. If this information is correct, click Next. If not, select the correct table that should be used from the drop down box for this phrase, or create a new table for selection.

  7. Click Next to advance the wizard.

    If a table with pre-existing unit of measure conversions is select, they are displayed in the table; otherwise, a blank table appears.

    Surrounding text describes stanfurunitcon5.png.
  8. If a new table is created, then the new unit of measure conversions, you must create a units table using the Add Row and Remove Row buttons.

  9. Enter a unit name when the dialog is displayed and then provide a conversion factor in the relative size field.

  10. Complete all rows as required and then click Next.

    Surrounding text describes stanfurunitcon6.png.
  11. Select the target unit of measure.

    For example, if the data contains feet or foot and needs to be converted to inches then the target [inch] should be selected.

  12. Click Next.

The icon next to the phrase should change to purple to indicate that a unit of measure standardization type has been associated with the phrase and a red box will precede the associated phrase rule.

To test the unit of measure standardization, select the Test Standardization sub-tab and enter text that should be converted.

Surrounding text describes stanfurunitcon7.jpg.

The unit of measure conversion should convert the text entered into the correct standardization unit of measure.

Turning on unit conversion allows the use of ranges under standardized attributes, as well as, value and search logic in an Item Definition. The unit conversion must be created to realize these benefits though it does not need to be selected. For example, if you want to output fractions, do not disable Unit Conversion rather set it to none so that value logic and ranges still operate properly.

Share Standardizations Within a Data Lens

The effort to create standardization knowledge can be substantial, depending on your data so the ability to share (reuse) your standardization rules saves labor costs because it avoids inputting the information repeatedly in various Standardization Types and Item Definitions. While the ability to create a Standardization Type based on an existing one is useful, sharing standardization knowledge after it has been refined takes the next step in easily defining your data further.

The following sections describe how to copy:

Throughout these sections the following terms are used:

Source

The Standardization Type or Item Definition that contains the knowledge (standardization rules) that you want to copy.

Target

The destination Standardization Type or Item Definition that you want to receive a set of selected knowledge.

Scope

How you want to copy the knowledge from the source to the target; new only, merge, or replace. Each copy operation causes the Knowledge Studio to inspect both the source and target from the highest level (phrase or parent Item Definition) to the lowest possible level (production or attribute) to determine the differences in each rule, production and table (if relevant).

Contents

What standardization rules you want to copy from the source to the target which you select from a list. For example, data originally standardized to lowercase is easily changed to upper case by changing the case setting rule in one Standardization Type and copying it to the other Standardization Types in your data lens thus this change is quickly effected.

Copying Global Standardizations

You can copy the global standardization rules that you have refined in one Standardization Type to another in your data lens.

This option can be used on any tab in Knowledge Studio because the current Standardization Type is the source and you select the target Standardization Type that the standardization rules will be copied to.

  1. From the Standardization Type list on the toolbar, select a source Standardization Type.

  2. From the Data Lens menu, click Copy Global Standardizations.

    Surrounding text describes copy_global_stan.gif.
  3. Select the target Standardization Type from the list.

  4. Select one of the following copy scope options to copy the global standardization rules from the selected Standardization Type:

    • Copy new only

      Only the standardization rules for the contents (selected in the next step) of the source Standardization Type that exist in the source Standardization Type and do not exist in the target Standardization Type are copied. In other words, the standardizations that are 'new' to the target Standardization Type are copied.

    • Merge

      The standardization rules for the contents (selected in the next step) of the source Standardization Type are merged with the target Standardization Type. Only standardization rules that do not exist in the target Standardization Type are copied. For example, new productions and entries in a rewrite table are copied. If conflicts are encountered, the rule is ignored and is not copied.

    • Replace

      All standardizations for the contents (selected in the next step) of the source Standardization Type are copied to the target Standardization Type overwriting existing standardization rules.

      If the Knowledge Studio detects that you are attempting to copy a blank source Standardization Type (contains no rules) to overwrite a target that contains rules, a message is displayed that it is not possible and the copy is terminated. Review the source and target Standardization Types to ensure that you identified them correctly.

  5. Select the contents of what you want to copy with the Standardization Type using the check boxes as follows:

    • String Replacement

      All individual string replacement rules in all text replace tables including no replacement, replace all, regular expressions replacement, and individual replacement tables, rewrite rules that appear in the Standardize Terms sub-tab of the Standardize tab.

    • Case

      All individual case replacement rules that appear in the Standardize Terms sub-tab of the Standardize tab. This does not include the default case setting for a Standardization Type.

    • Phrase Rewrite/Ordering

      All term and phrase ordering rules that are defined in the Standardize Phrases sub-tab of the Standardize tab.

    • Numeric

      All numeric text replacement rules in all value range rewrite rules tables that are defined in the Standardize Terms sub-tab of the Standardize tab including text and conversion tables.

    • Join

      All term joining rules at the phrase level defined in the Standardize Phrases sub-tab of the Standardize tab at the phrase level not the production level.

  6. Click OK.

    The global standardization rules are copied from the source to the target Standardization Type using the scope and contents you selected. A confirmation message detailing the changes is displayed upon completion.

Select and review the target Standardization Type to ensure that your rules were copied correctly.

Sharing Item Definitions Standardizations

You can share Item Definition standardization rules with other Item Definitions both within and across Standardization Types. This is generally performed after you have created and standardized an Item Definition. This functionality also allows you to continue to modify your Item Definition and copy the standardization rules to other Item Definitions and Standardization Types.

This feature relies on the existence of the same attribute, phrase, and production structure in both the source and target Item Definitions. Copying standardization rules from between Item Definitions does not create attributes in the target that exist in the source. In the following example, the Highlighters and Mechanical_Pencils Item Definitions both contain a Size attribute with the production, [a_length]. The standardization rules for Size and [a_length] can be copied from one Item Definition to the other.

Surrounding text describes stanfuridcopy2.png.

The standardization rules for the productions of a phrase in an attribute that exist in both Item Definitions can be shared. In the following example, the standardization rules for the productions of [a_length] in Mechanical_Pencils can be shared with Highlighters.

Surrounding text describes stanfuridcopy1.png.

There are three options for copying standardization rules from one Item Definition to another:

Copy to…

Copies standardization rules from a source Item Definition in the currently selected Standardization Type to a target Item Definition in the target Standardization Type. Only the parent Item Definition is copied even if it contains child Item Definitions.

Copy with Children to…

Copies standardization rules from a parent Item Definition and all of its child Item Definitions in the currently selected Standardization Type to the same set of Item Definitions in the target Standardization Type.

Copy to another Item Definition

Copies standardization rules from a source Item Definition to a target Item Definition within the currently selected Standardization Type. Only the standardization rules for those attributes, phrases, and term rules that are common to both Item Definitions are copied.

The scope and depth of the standardization rules that you copy are selective by you.

Tip:

You can copy inactive Item Definitions and activate them in the target Standardization Type or Item Definition.

Copying Item Definitions Between Standardization Types

  1. From the Standardization Type list on the toolbar, select a source Standardization Type.

  2. Select the Standardize Items tab and the Standardize Attributes sub-tab.

  3. Right-click the Item Definition that you want to copy, and then select the Copy to… or Copy with Children to… option.

    Note:

    If you select an Item Definition that has no children, the Copy with Children to… option is not available; the options are context-sensitive.
    Surrounding text describes copy_options.gif.

    The use of these two dialogs is the same though the extent that the source Item Definition is shared is different as previously described.

  4. Select the target Standardization Type that you want to copy the source Item Definition to from the list of Standardization Types.

  5. Select one of the following options to define how the source Item Definition is copied:

    • Copy new only

      Only the standardization rules for the contents (selected in the next step) that exist in the source and do not exist in the target Item Definition in the target Standardization Type are copied. In other words, the standardization rules that are 'new' to the target Item Definition in the target Standardization Type are copied.

    • Merge

      The standardization rules for the contents (selected in the next step) of the source Item Definition in the source Standardization Type are merged with the target Item Definition in the target Standardization Type. Only standardization rules that do not exist in the target are copied. If conflicts are encountered, the rule is ignored and is not copied.

    • Replace

      All standardization rules for the contents (selected in the next step) of the source Item Definition in the source Standardization Type are copied to the target Item Definition in the target Standardization Type overwriting existing standardization rules.

      Note:

      The Item Definition attribute ordering and match ranking standardizations defined on the Order Attributes and Match Weights sub-tabs of the Standardize Items tab are not included when coping Item Definitions.
  6. Select the scope of what you want to copy with the Item Definition using the check boxes as follows:

    • String Replacement

      All individual string replacement rules in all text replace tables including no replacement, replace all, regular expression replacement, and individual replacement tables, rewrite rules that appear in the Standardize Attributes sub-tab of the Standardize Items tab.

    • Case

      All case replacement rules that appear in the Standardize Attributes sub-tab of the Standardize Items tab.

    • Phrase Rewrite/Ordering

      All term and phrase ordering rules that are defined in the Standardize Attributes sub-tab of the Standardize Items tab.

    • Numeric

      All numeric text replacement rules in all value range rewrite rules tables that are defined in the Standardize Attributes sub-tab of the Standardize Items tab including text and conversion tables.

    • Join

      All term joining rules at the phrase level defined in the Standardize Phrases sub-tab of the Standardize tab at the parent phrase level not the production level.

      This check box is active only when the Copy Global Standardizations check box is selected.

  7. Select the Copy Global Standardizations check box to copy the global standardization rules within the source Item Definition based on your scope and contents selections. This option is "sticky" so your last selection will be redisplayed each time you use this dialog.

    Caution:

    When selecting this option and no changes have been made by the copy standardization, then the global copy changes are terminated and the copy will fail. This is because it is risky to copy global changes when nothing was changed at the Item Definition level. Review what your intent was with copying the Item Definition standardization rules. It is possible to force the Global Standardizations to copy, see "Copying Global Standardizations".
  8. Click OK.

    The source Item Definition standardization rules in the source Standardization Type are copied to the target Item Definition in the target Standardization Type using the scope and contents you selected. A confirmation message detailing the changes is displayed upon completion.

Select the target Standardization Type and Item Definition, and then review the standardization rules. Verify that the results you wanted have been achieved.

Caution:

When using the Replace option, if you are attempting to replace more than 20% of the source standardization rules in the target Standardization Type a warning is displayed. This is intended to aid you in avoiding overwriting existing rules in the target and losing valuable standardization knowledge.

You should closely review your source and target copy results as it is possible to inadvertently overwrite standardization rules that you want to keep.

Copying Between Item Definitions

  1. From the Standardization Type list on the toolbar, select a source Standardization Type

  2. Select the Standardize Items tab and the Standardize Attributes sub-tab.

  3. Right-click the source Item Definition that you want to copy from, and then select the Copy to another Item Definition option.

    Surrounding text describes copy_to_other.gif.
  4. Select the target Item Definition that you want to copy the source Item Definition to from the list.

  5. Select one of the following options to define how the Item Definition is copied:

    • Copy new only

      Only the standardization rules for the contents (selected in the next step) that exist in the source and do not exist in the target Item Definition are copied. In other words, the standardization rules for a production of an existing phrase in an attribute that are 'new' to the target Item Definition are copied.

    • Merge

      The standardization rules for the contents (selected in the next step) of the source Item Definition are merged with the target Item Definition. Only standardization rules that do not exist in the target are copied. If conflicts are encountered, the rule is ignored and is not copied.

    • Replace

      All standardizations rules for the contents (selected in the next step) of the source Item Definition are copied to the target Item Definition overwriting existing standardization rules.

      Note:

      The Item Definition attribute ordering and match ranking standardizations defined on the Order Attributes and Match Weights sub-tabs of the Standardize Items tab are not included when coping Item Definitions.
  6. Select the scope of what you want to copy with the Item Definition, as defined on the Standardize Attributes sub-tab of the Standardize Items tab, using the check boxes as follows:

    • String Replacement

      All individual string replacement rules in all text replace tables including no replacement, replace all, regular expressions replacement, and individual replacement tables, rewrite rules.

    • Case

      All case replacement rules.

    • Phrase Rewrite/Ordering

      All term and phrase ordering rules.

    • Numeric

      All numeric text replacement rules in all value range rewrite rules tables including text and conversion tables.

  7. Click OK.

    The source Item Definition rules are copied to the target Item Definition using the scope and contents you selected. A confirmation message detailing the changes is displayed upon completion. The Item Definition pane is refreshed and any expanded Item Definitions are collapsed.

Select the target Item Definition and review it to ensure that it was copied correctly.