|Oracle® Enterprise Data Quality for Product Data Knowledge Studio Reference Guide
Release 11g R1 (126.96.36.199)
Part Number E29134-02
|PDF · Mobi · ePub|
This guide describes basic and advanced techniques that you can use to maximize the effectiveness of the Enterprise DQ for Product (EDQP) Knowledge Studio. These techniques help refine your knowledge about your data and supply your Subject Matter Experts (SME) with in-depth information on important aspects of the DataLens methodology.
The Knowledge Studio allows you to create data lenses, which are collections of rules that enable the recognition, classification, and standardization of data. There are three main activities required to build a data lens:
Create rules to recognize the data and build variant forms into the lens.
Identify the attributes necessary to accurately define an item.
Create standardization rules for terms, phrases, and Item Definitions.
This reference guide will help you understand the process of building a data lens using writing instruments product data.
You start your Oracle DataLens Server, and then use the Welcome Launch Pad to start the Knowledge Studio by clicking the Knowledge Studio button. For details, see Oracle Enterprise Data Quality for Product Data Getting Started.
The Knowledge Studio graphical user interface (GUI) provides the client workspace used to create and manage a data lens.
Note:Functionality that has not been configured or that the current user is not authorized to use is dimmed.
The Knowledge Studio client workspace frame contains useful information and interactive functions including the following:
Indicates the current application and open project.
Provides the processing status of the data lens one line at a time. This field can be resized and the scroll arrows on the right-hand side can be used to view all available status information. The status data does not change based on the selected tab, rather it is a compilation of all data.
Controls whether the Status Field is displayed or not.
Returns you to the last Enterprise DQ for Product application used.
This button opens the Oracle Enterprise Data Quality for Product Data Launch Pad so that you can select other applications.
The time is displayed and when you hover over this field, the date displays.
Indicates the amount of memory cache currently used and the total amount allowed. You can dump the memory cache by clicking on the trash can icon in this interactive field.
Note:This feature is only used for system diagnosis and should not be used unless requested by the support team.
The Knowledge Studio toolbar allows easy access to the most frequently used Knowledge Studio functions. Though the set of toolbar buttons remains the same during user interface operation the buttons are enabled or disabled based the current state of you interface and options set. Buttons displayed with shades of gray are disabled. Full-color buttons are enabled. All toolbar buttons are standard push buttons, requiring a single click of the mouse to activate.
The following briefly describes the toolbar buttons from left to right.
The Knowledge Studio GUI menus provide access to most Knowledge Studio functions. All of the buttons on the toolbar have a corresponding menu command, which are indicated on each menu with the button icon displaying adjacent to the command. The set of menu commands remains the same during the GUI operation.
Menu commands are enabled or disabled based on the current state of the data lens; commands that are dimmed are unavailable. Some menu commands perform functions that are more complex and are indicated by an ellipsis symbol (...). These commands open dialogs to collect information needed to complete the requested function. Menu commands that toggle user functions are preceded by check mark.
Tip:The tooltips appear when you rest your mouse pointer on a menu item, button, tab, icon, or similar content.
The following sections briefly describe each of the Knowledge Studio menu commands and corresponding buttons.
|New Data Lens…
Creates a new data lens file for processing data. Data lens files are stored in the following directory:
Open Data Lens…
Opens an existing data lens file and closes any open data lens file.
Provides a list of recently opened data lens for you to select from so that you can quickly open your data lenses.
Select Data File
Opens a sample data file associated with the current data lens and closes the currently open sample data file.
Close Data Lens
Closes the open data lens file.
Saves all contextual changes to disk and creates a version of the data lens that you can revert to.Save As
Allows you to save the current data lens to a new name.
Delete Data Lens
Allows you to delete the open data lens from your local
|machine. A warning message is displayed prior to deletion. Only the local copy of the data lens is deleted. If you checked in the data lens into the server, that copy is still present on the server and must be deleted from the server. See "Deleting Data Lenses".|
|Delete Read-Only Lenses…
Allows you to delete any unwanted 'read only' data lens from your local machine. See "Deleting Read-only Data Lenses".
|Delete Sample Files…
Allows you to delete the sample files associated with the data lens that you are currently editing. You can designate 'All' or a specific sample file for deletion. See "Deleting Sample Files".
|Update Regression Base
Allows you to update the current regression testing base based on contextual changes in the tab currently open.
|Create New Regression Base
Creates a new regression base file, which identifies the effects of your changes as changes are made to terminology and phrases.
Allows you to select a report formats for viewing results. See "Complexity Reports".
Allows you to select a report that shows the complexity of the data. See "Complexity Reports".
Allows you to select a report that counts the parsed phrase context of the data within the selected data lens. See "Semantic Reports".
|Export Phrases for Translation
Exports phrases from the translation dictionary. See "Translation Tab".
|Import Translated Phrases
Imports phrases into the translation dictionary.
|Import Current/All Translated Phrases
Imports some or all phrases. See "Translation Tab".
Allows you to create or update a translation glossary on the Oracle DataLens Server.
|Export Data Lens
Exports the entire data lens and creates a data lens export file project directory:
|Import Data Lens
Imports an exported data lens from the specified export directory. See "Importing a Data Lens".
Allows you to export term and phrase rules. See "Exporting Rules".
|Export Rules by Domain
Allows you to export term and phrase rules by a domain. See "Exporting Rules".
Allows you to export attributes (from Item Definitions) to an Excel spreadsheet file. The report provides attribute information at Item Definition level that shows Attribute Type, Attribute Alias, Attribute Name, Rules defining the attribute and the order for each Standardization.
|Import Enrichments from server
Allows you to import data enrichment knowledge created in Governance Studio into your data lens directly from your Oracle DataLens Server. See "Importing from a Downloaded File".
|Import Enrichments File
Allows you to import data enrichment knowledge created in Governance Studio into your data lens from a file you download from a task. See "Importing from a Downloaded File".
|Import Phrases and Terms
Allows you to import knowledge (phrase rules, terminology rules, and term variants) into a data lens from an Excel spreadsheet or a tab-delimited file. See "Importing Phrases and Terms".
|Import Item Definitions
Allows you to import Item Definitions into a data lens from a tab-delimited file. See "Exporting and Importing Item Definitions".
|Import Smart Glossaries
Allows you to import foundation data lenses to your current data lens. See "Importing a Smart Glossary".
|New Sample Data
Allows you to create new sample data files to add to the existing set of samples. See "Sample Files".
|Rename Sample Files…
Allows you to rename existing sample files associated with the data lens. See "Sample Files".
|Combine sample data
Allows you to combine selected sample files into a single file to be used for regression testing. See "Sample Files".
|Revert to prior Data Lens
Allows you to revert to a previous version of the current data lens. The data lenses that are listed are local copies only and are not the Oracle DataLens Server.
Exits the Knowledge Studio application; a prompt is given for unsaved changes.
Deletes the selection and copies it to the clipboard.
Copies the selection to the clipboard.
Pastes contents of the clipboard at the current insertion point.
Searches for and replaces the specified text on the Translate tab.
Allows you to globally rename phrase rules to consolidate them. This feature is only available on the Define Phrase sub-tab of the Phrases tab. See "Global Phrase Rule Renaming".
Allows you to drag and drop rules across Domains (folders) in the hierarchical folder style Move Rules dialog. For example, you can move a rule from a Smart Glossary into the phrase structure of your data lens.
|Delete Unused Terms
Allows you to delete unused terms. An unused term is a term that is not referenced by any rules or phrases. It is denoted by the purple ball with a "u" inside icon.
|Edit Attributes Aliases…
Allows you to edit the attribute aliases of phrases and terminology. See "Aliases".
|Edit Phrase and Term Attributes…
Allows you to edit the attributes of phrases and terminology. See "Editing Multiple Phrases and Terms".
|Edit Lens Description…
Allows you to modify the data lens description. See "Editing a Data Lens Description".
|Edit History Notes
Allows you to enter text regarding the data lens maintenance to provide an audit trail for ongoing support. If Foundation or Domains are imported into the data lens, this information is included with a date and timestamp. See "Editing Data Lens History Notes".
Allows you to specify a search string (regular expression) and attempts to find it. The left-hand tree panes of the Knowledge Studio creation tabs (Phrases, Standardize, and Classify tabs) are searched.
Repeats the last search defined by a Find operation.
Removes any changes that you have made and reverts the data lens to the last saved state.
All possible rules that could apply to the input data for an individual sample row, based on confidence ratings and meeting the Prediction Threshold, are displayed for you to choose from or a message that advises you why no predictions are available. Predict Terms only works in the context of Item Definition where the sample row has an associated Item Definition. You can select the appropriate rule or reject the predictions. Rejecting predictions is only applicable to the current data lens editing session and is reset when you close the data lens.
|View My Tasks
Allows you to view any tasks that are scheduled or have run. See "Viewing Tasks".
Allows you to filter the displayed data based on text or a text pattern. The filter operation applies only to the currently selected tab. Only the rows that match the text entered in the Filter dialog are displayed in the task pane.
Removes the filter currently applied and all data is displayed.
Redisplays the data including changes that were just applied using the Apply function.
Displays the ID column in tabular panes when selected; selecting again removes the column from the task pane.
Returns to the previous phrase or rule ambiguity.
Advances to the next phrase or rule ambiguity.
Allows you to search the Internet for the text selected in the Input Data field on the Define Phrases or Define Items sub-tabs of the Phrases tab. Your default browser application is launched and a search is performed using the selected text as the search string.
Allows you to search the Internet for the images matching the text selected in the Input Data field on the Define Phrases or Define Items sub-tabs of the Phrases tab. Your default browser application is launched and an image search is performed using the selected text as the search string.
Allows you to search for the selected line of data so that you can select it in a different context. This feature is only available on the Translation tab.
|List Regression Tests
Displays information about regression tests that are associated with selected data lens. The display will show the type of regression created and the sample file that the regression test is against.
|View Lens Information
Displays specific information about the data lens and data file that is currently being used.
|View Attributes for Deployed Lens
Displays attribute information about the currently deployed data lens by Item Definition including attribute use.
|View Server Information
Displays server information for the Oracle DataLens Server.
|View Check-In History
Lists the data lenses that you have checked in including the comments regarding the check-in.
|View My Checkouts
Lists the data lenses that you have checked out.
|View All Checkouts
Lists all data lenses on the Oracle DataLens Server that have been checked out.
|View as Production
Displays the output data from Item Definitions set to inactive. See "Active vs. Inactive Item Definitions".
|Check-In Data Lens…
Allows you to check-in a data lens file into your Oracle DataLens Server repository. Each time you check a data lens into the Oracle DataLens Server, the data lens revision number is incremented. The Oracle DataLens Server maintains all of the previous revisions of a data lens. You can check in a data lens under one of two conditions: it has never been checked in before or it was previously checked out and locked for editing by you. The Check-In dialog allows you to enter a comment to be stored with this revision of the data lens. If you want to continue to edit the data lens, select the Keep Locked for More Editing check box so the data lens can only be checked-out by another person in a 'Read Only' mode. Selecting this option dims the Delete local Data Lens command, which removes the local copy of the data lens from your client. See "Checking In a Data Lens".
|Check-Out Data Lens…
Allows you to select the data lens and the specific revision number to check out from the Oracle DataLens Server repository and automatically locks it for editing. You can also check out the data lens and assign a new name, which creates a new data lens from an existing data lens. See "Checking Out a Data Lens"
|Unlock Data Lens
Unlocks the current data lens from the repository in the Oracle DataLens Server.
|Copy Global Standardizations
Copies the global standardization rules from the current Standardization Type to another. See "Copying Global Standardizations".
Activates the knowledge that you have just created. This option is active only when there is knowledge you have not saved. After you apply your changes, use the Refresh command to see the effect on your sample data.
Performs the translation of phrases (Translate tab) or complete content lines (Test Translations tab). See "Translating Data".
Allows you to edit the source formatting expressions. See "Source Format".
|Standardization Repair Formats
Allows you to enter
|Translation Repair Formats
Allows you to enter
|Open Excel Override File
Starts Excel with a spreadsheet that can be used to enter specific context to be used within this data lens. This feature will be deprecated and should not be used.
Allows you to remove any grammar rules that are not being utilized based on the data within the lens. See "Compacting Grammar".
|Unit Conversion Types…
Allows you to add, select, and activate the Enterprise DQ for Product supplied unit conversions. Unit conversions enable the creation of output with consistent use of units. For example, your data may express resistance in ohms, kilo-ohms, and mega-ohms. With a unit conversion, consistency of output could be maintained by converting each of the preceding to ohms. See "Unit of Measure Standardization Types".
Allows you to add, select, and activate the Enterprise DQ for Product supplied unit conversions. Standardization types also allows you to create your own standardization schemas for use throughout your data lens. See "Standardization Types".
Allows you to add and use schemas to automatically match data. See "Match Type".
Allows you to add and use schemas to automatically classify data. See "Classification Type".
Allows you to select the locales/languages for which you want data translation. This option is not available until your data lens is standardized. Activates the Translation tab. See "Translation Target".
|Data Lens Options
Allows selection of the global data lens parameters including text case sensitivity, whether the data lens can be imported, and the behavior of the Apply functionality. See "Setting Data Lens Options".
|Open Oracle DataLens Governance Studio…
Starts the EDQP Governance Studio. See Oracle Enterprise Data Quality for Product Data Governance Studio Reference Guide.
Open Oracle DataLens Application Studio…
Starts the EDQP Application Studio. See Oracle Enterprise Data Quality for Product Data Application Studio Reference Guide.
|Open Oracle DataLens Task Manager…
Starts the EDQP Task Manager. See Oracle Enterprise Data Quality for Product Data Task Manager Guide.
Open Oracle Enterprise Data Quality for Product Data Launch Pad…
Starts the Oracle Enterprise Data Quality for Product Data Launch Pad so that you can start other applications and the Enterprise DQ for Product Oracle DataLens Server Administration Web pages.
|Open Character Map…
Opens the Windows Character Map dialog to enable character mapping changes. This function is provided as a shortcut way of inserting special characters and symbols not available on the keyboard when translating phrases.
Opens a list of Enterprise DQ for Product documents for your selection in a browser.
Provides information regarding the product including the version number and a link to view third party product licenses.
The following table contains keyboard shortcuts that can help make the Knowledge Studio easier to use.
|New Data Lens||Ctrl+N|
|Open Data Lens||Ctrl+O|
A tab groups like information into easy to read and access areas that include sub-tabs, panes, and text entry boxes. Tabs are displayed in the Workspace directly under the toolbar and can be invoked in any order. Not all tabs are available at all times. For example, the Translate tab and sub-tabs are not visible until a translation target is activated.
A sub-tab operates like a tab and provides specific functionality or utilities related to each tab and so are different for each tab.
The tabs and the related sub-tabs included in the Knowledge Studio are as follows:
|Phrases (Chapter 2, "Phrases in Data" )||Define Phrases
|Standardize (Chapter 3, "Standardize Data")||Standardize Terms
Test Global Standardization
|Standardize Items (Chapter 4, "Standardizing Item Definitions")||Standardize Attributes
Test Item Standardization
|Classify (Chapter 5, "Classify Data")||Classify from Data
Classify from Item Definitions
Classify from Rules
|Translate (Chapter 6, "Translating Data")||New Phrases and Known Phrases
New and Known Variable Term Phrases
Test Translated Attributes
Test Item Translation
Test Global Translation
The interactive task panes allow you to perform actions specific to the type of pane and these actions are described throughout this reference. In general, the task panes included in the Knowledge Studio are as follows:
Data is represented in a tree-like structure that shows how nodes are related. You can drag and drop the nodes into other panes some though not all cases. The parent nodes can be expanded to view all related children nodes.
Data is entered into fields and options are selected to build knowledge.
Data is represented with graphical icons that you can drag and drop to change it.
Data is displayed in tabular form similar to a Microsoft Excel spreadsheet.
Data is collected via queries in a step-by-step manner.
The small up/down arrows between the panes, allow you to resize the panes. In addition, you can fully expand either pane to see more data by clicking on an arrow, which makes the pane inactive. To redisplay the inactive pane, click the opposite arrow and the pane reappears.
There are various context-sensitive (shortcut) menus that appear in the Knowledge Studio panes when you right-click on data within a task pane. The contents of these menus are described throughout this reference though may contain the following standard options:
These options filter and un-filter data as previously described.
Explains each of the icons that can appear and is context-sensitive.
Expands all sub-nodes (phrases or terms) of the selected node in a hierarchical manner.
Expands all sub-nodes (phrases or terms) of the selected node in a hierarchical manner.
Locates text data as previously described.
Removes the selected category from displaying in the pane. The categories are only removed for the current data lens editing session and are reset when you close the data lens.
Searches the Internet for the selected text, which appears as part of the menu selection name.
If this is the first time you have started the Knowledge Studio, the client workspace appears blank as in the following figure; otherwise, the results from the last job run are displayed.
The status field at the bottom of the Knowledge Studio client workspace provides information about any data lenses you load in the white field, and the date and time, and memory usage are displayed in the grey fields. The status field is blank until you have created your first Knowledge Studio project, at which time the status of your project is displayed. See "Understanding the Client Workspace".
When you launch the Knowledge Studio, you are prompted to select an existing data lens to open.
Since you are starting a new data lens, click Cancel to close this dialog. From the File menu, click New Data Lens data lens create your new data lens.
Enter the unique name for this data lens.
Note:Entering a space results in an underscore.
Enter a description and select a Character Encoding from the list.
Click on the Select button adjacent to the Data Source field to select the file that contains your data. The Data Source dialog appears.
Select the MS Excel file option and click the Specify button.
Click Browse, locate your data file, and then select it.
The Excel file field names are displayed.
Select the Id field and click on the right arrow to populate the ID list, and then select the Description field to populate the Description list. The ID corresponds to a part number field in the Excel spreadsheet; the Description is a description of a part, including the item name and several attributes.
Click OK. The Data Source dialog appears, indicating the source file that you specified and the number of lines of data.
You are returned to the New Data Lens dialog.
Click OK. The Knowledge Studio creates your new data lens, including a set of sample files. These sample files are XML representation of the data in your Excel spreadsheet.
Your new project is located in:
C:\Documents and Settings\
Your content and sample files are located in:
data lens name
The sample files have the
.xml file extension.
When prompted to select a sample data file, click Browse.
Select your sample data file, and click Open.
Your data lens opens and is now ready for use.
There are various global options that you can set to configure how the Knowledge Studio operates. From the Tools menu, select Options. Select the global application options as follows:
|Parse Tree Node Font Size
Allows you to select the font size you want for the display of phrase trees in the Graphical Rule Editor pane on the Define Phrases tab. A smaller font allows you to see more phrases for longer lines.
Number of Apply's before Save
Allows you to automatically save your data lens as you apply knowledge.
Number of Save's before Backup
Allows you to automatically backup your data lens after a determined number of Saves.
Maximum number of backups
Allows you to set the maximum number of data lens revisions that will be retained on your Oracle DataLens Server. The default setting is three; the maximum setting is 10. You can control how much disk space your server is using and the speed of data lens check in and check out by setting this number to a lower setting.
Allows you to set to percentage of ghosting that will be used to display terms and phrases that are not associated with an Item Definition. Percentage can be from 10% to 100%. A lower percentage setting will result in the terms and phrases being shown lighter (more ghosted).
|Two-line tool bar
Allows you to choose whether the toolbar is display on a single line or on two lines. Choosing a two-line tool bar allows you to see all of the toolbar items even when the Knowledge Studio screen is smaller than normal size.
Allows you to 'jump' or switch to between views of a selected node by double-clicking on an empty area of the pane. This functionality is context-sensitive and changes the active tab.
|Show Source-Formatted Text
Enables the display of text that has been reformatted by the Source Formatting feature so that you can quickly identify this data for further standardization.
Enables the textual display of the predictions for unknown data nodes. Controls whether the prediction options on the Edit menu and the Define Phrases sub-tab Graphical Rule Builder pane context-sensitive menu are active.
|Enable Bidirectional Text Dialog
Activates the Bidirectional Text tab in the Data Lens Options dialog. See "Setting Data Lens Options".