Oracle® Enterprise Data Quality for Product Data What's New in EDQP Releases 5.0, 5.1, 5.5.03, 5.5.03.02, 5.6, 5.6.0, 5.6.2 Part Number E24155-02 |
|
|
View PDF |
What's New in EDQP
Releases 5.0, 5.1, 5.5.03, 5.5.03.02, 5.6, 5.6.0, 5.6.2
E24155-02
November 2011
Welcome to Oracle Enterprise Data Quality for Product Data (EDQP), formerly known as Oracle Product Data Quality. The following sections describe new and changed functionality in EDQP releases 5.0 through 5.6.2.
The following sections outline the changed and new features in release 5.0 of OPDQ.
The AutoBuild functionality has been enhanced to allow you to leverage existing information to reverse engineer the semantic model from available metadata and sample data to improve initial data lens development time. These enhancements include the ability to:
Leverage existing information for operation immediately upon installation
Term variant creation to increase recognition immediately upon installation
Greater flexibility to import metadata rules
Default Match Types
Incremental processing to augment existing data lenses with additional metadata.
This section describes new features to Governance Studio.
This release introduces the Governance Console feature to enable you to more effectively apply and maintain data governance standards.
Provides the ability to incorporate manual effort and ongoing learning into integrated process as follows:
Item level validation, approval and edit
Task-specific UIs for exception processing
Matching
Enrichment and correction
Allows manual intervention while maintaining process integrity and efficiency
This release introduces the Governance Dashboard feature to enable you to more effectively apply and maintain data governance standards.
Provides process visibility and control to drive continuous improvement of the following:
Aggregate quality and productivity metrics
Data quality by source
Process effectiveness
Identifies source quality and process improvement opportunities
The following sections outline the changed and new features in release 5.1 of OPDQ.
This section describes changed and new features to Application Studio.
The AutoAbbreviate functionality has been changed to provide the ability to repurpose and publish data into any form. It includes the ability to progressively abbreviate descriptions to fit the fixed-line lengths often required by legacy and many Enterprise Resource Planning systems. This enables you to use a systematic process to create shortened descriptions.
This release introduces the semantic index functionality. It enables highly-scalable semantic matching and de-duplication of unstructured, non-standard data. This task is often required in Product Information Management (PIM), master data management (MDM), and system consolidation situations.
Semantic indexing allows for faster and more scalable semantic matching.
This section describes changed and new features to Governance Studio.
This release introduces the AutoSuggest functionality that provides system-based suggestions to expand recognition rules based on real operational data. Using inferences from the underlying semantic model, the AutoSuggest makes suggestions, that are used to enrich the model, so the system gets "smarter" through use thus allowing you to develop knowledge rapidly.
This release introduces the Prediction functionality to provide a design time version of AutoSuggest so that it is available to the data analysts using the Knowledge Studio. This changed feature allows you to develop knowledge rapidly.
The following sections outline the changed and new features in release 5.5 of OPDQ.
This section describes changed and new features to Application Studio.
A new Semantic Index type was created to provide dynamic semantic matching:
Eliminates index rebuild and the need for multiple semantic indexes if match rules change, as long as, the total set of attributes does not change.
Allows you to experiment with matches.
Dynamically operating algorithm that leverages multiple match types and weights in real time.
This release introduces the ability to specify from one to multiple classifications from a single Item Definition Transformation, which simplifies processing.
This release introduces the ability to use dedicated secondary Data Service Applications (DSAs) for additional processing, which includes the following options:
Apply DSA to process a set of rows.
Re-Run DSA to re-process a set of rows.
Completion DSA to complete a process and save the results.
This section describes changed and new features to Governance Studio.
The application of the rule changes from AutoSuggest to enrich the semantic model has been automated thus allowing you to develop knowledge rapidly.
The following features have been changed to enhance usability and data remediation processing:
Field level edit restrictions
List values
Hidden columns
Normalized view and output column renaming to simplify results
Category filtering on all output to allow you to filter results by a category
Messages
Renaming output columns
Project reuse through saving as a new project from an existing project
The following sections outline the changed and new features in release 5.5.03 of OPDQ.
This section describes new features in Application Studio.
This release introduces a set of DSA widgets to create solutions that handle non-category specific global data for systems migration and consolidation as well as clean-up, match, merge and de-duplication. These features enable statistical (global)-based processing for faster development and when the high confidence match techniques are not appropriate:
Confidence index provides scoring process that includes exact and fuzzy matches.
Quick Lookup DSA shows the matches in the context of their original descriptions.
The following sections outline the changed and new features in release 5.5.03.02 of OPDQ.
This section describes changes to Application Studio.
The truncation of step descriptions occurs at run time to provide the first 255 characters of the description on the DSA Job Status page giving context for which DSA is running while simplifying the input to the internal database. The step descriptions are displayed in their entirety in the DSA so that you can create large descriptions for their step names and descriptions.
The Db2 9.x as a valid data source in Application Studio is certified in this release.
This section describes changed and new features to Governance Studio.
This release introduces the Job Continuation feature. This feature allows large batch jobs to continue to run, new API jobs to start and run even when Transform Servers lose connectivity with the internal database repository (the Oracle DataLens Administration Server is down).
An option to allow you to disable DSA processing on the Oracle DataLens Administration Server is introduced.
Various memory and performance enhancements have been implemented.
The following sections outline the changed and new features in release 5.6 of OPDQ.
This section describes changed and new features to Application Studio.
The automatic abbreviation of Item Definition descriptions (AutoAbbreviate) functionality as defined in Item Definition Transformations is enhanced. It more intelligently abbreviates descriptions, without reducing the readability, by applying an understanding of the important terms in the description. AutoAbbreviate prevents important attributes from being abbreviated though you must create abbreviations for required attributes. Additionally, it no longer abbreviates any units of measure, part numbers, or model numbers by not abbreviating any words associated with attributes containing numbers.
The creation of Matching Ngrams is enhanced to provide the following capabilities:
Selection of the type of Ngrams to produce
Avoid saving the duplicate Ngrams
Truncation of Soundex (RSIndex) algorithms to a specific number of characters
Specify the minimum number of characters created by per Ngram type
The ability to restrict the use of DSAs by OPDQ users is introduced. Users can be restricted from using and viewing DSAs on your various types of servers by using these new options in conjunction with the use of the Application Studio options on Role Administration page in the Oracle DataLens Servers Administration Web pages.
Apache Tomcat version 6.0.29 and Java Development Kit (JDK) 6 Update 21 (1.6.0_21) are certified in this release. The improved native 64-bit support in Tomcat and enhanced memory management in JDK 21 result in better overall performance.
This section describes changed and new features to Governance Studio.
The Apply Augmentations feature is changed so that the missing attribute suggestions are written to an Excel file. This file can then be reviewed and changed, and then imported into a given data lens in Knowledge Studio.
The ability to filter data using more than one column is introduced. Filtering data with two or more columns is enabled using an 'and' criteria so that the data displayed can be reduced further.
Memory can be conserved in the Governance Studio by populating the tab when you select it, as well as, the ability to remove tab results from memory when a different tab is selected if that tab has more rows than a preset limit. The tab data reloads when you select the tab again. The source data can be read directly from disk and streamed directly to the server without having to store the source data in the project.
The ability to open an existing project file without processing and displaying data results in any Output tabs is introduced. The new Open Project - No Results option on the File menu allows you to open a project that previously could not be opened because it exceeded the maximum memory allocation.
This section describes changes to Knowledge Studio.
The ability to use AutoBuild to create an entire Item Definition structure with all Item Definitions designated as inactivate (for Production) so that a user in Knowledge Studio can activate individual Item Definitions as they are ready for production use is introduced. The data steward retains the ability to fully test all the interactions across all Item Definitions within the Knowledge Studio.
Copying standardization rules from a source type to a destination within the Item Definition tree either individually or with children is introduced. It includes the ability to merge or replace and the ability to copy global standardizations from one type to another and from one Item Definition to another.
The data parsing functionality is enhanced to eliminate extra parsing by reducing duplicated tokenization, parsing, and item definition thus improving performance
Item Definitions no longer automatically display the underlying attributes. You must double-click on an Item Definition to expand the full attribute structure for viewing and standardizing. This change reduces the amount of memory needed thereby improving performance.
The Governance Studio Apply Augmentations feature creates an Excel file that contains missing attribute suggestion knowledge, which is now imported into the open data lens.
The Test Translation sub-tab is deprecated and replaced by the following:
The Test Item Translation sub-tab is introduced. It allows you to review the translated phrases for the Item Definition for your source language sample data to validate your results.
The Test Global Translation sub-tab is introduced. It allows you to review the translated phrases for the Item Definition for your source language sample data to validate your results.
Significantly better performance by allowing the server to automatically create multiple threads to maximize the CPU processing for lens transformation steps if Processor Percentage is set to value greater than 0. For example, if the server has 8 CPUs and the Processor Percentage is set to 50% then the server will automatically create four processing threads.
Improved Memory management by leveraging JDK 21 and removing explicit garbage collection requests to increase performance and memory processing.
The thread pool size can now be set from the server to allow better thread management in a single server environment with multiple applications.
In the event that the underlying database goes down, the server will automatically reconnect the entire db connection pool once the database is available again thus eliminating the need to restart the server.
Improved data lens load times by changing and simplify the data lens load processing steps.
Improved Internationalization support, through externalizing strings, install and run using source OS and adding 29 new supported languages in the Knowledge Studio.
Setting the ServerID
for Transformation Servers is now automatic and set by the server to ensure each ServerID
is unique and sequential.
The following Configuration options are deprecated in this release:
Apply Augmentation options 2 and 3 that enabled the automatic update of data lenses
Unused Transform Map and DGD email addresses
The Oracle DataLens Server options are changed as follows:
Max Batch Jobs is deprecated
Processor Percentage is introduced
Thread Pool Size is introduced
The ability to restrict the use of DSAs by OPDQ users is introduced. Users can be restricted from using and viewing DSAs on your various types of servers by using these new options in conjunction with the use of the Application Studio options on Role Administration page in the Oracle DataLens Servers Administration Web pages.
The following sections outline the changed and new features in release 5.6.0 of OPDQ.
This section describes changed and new features to AutoBuild.
The AutoBuild Wizard has been modified to streamline the process of building or updating a data lens.
The ability to add to an existing data lens is enhanced.
The ability to activate Item Definitions in generated (new or updated) data lenses is introduced; inactive is the default.
Populating data lens descriptions or DSA comment fields with category and attribute identifiers, which are stored in the system_id
and group_id
internal fields though are not visible in the Knowledge Studio.
Selecting a Term Library to add new terms and full-form term variants.
Unused options have been removed. The following options are introduced:
Step 1:
Generate a new DataLens
Add to an existing DataLens
Add alternate catalog to Existing DataLens
List of Term Libraries
Step 2:
Include attribute name in phrase for context - Never
Map values directly to each attribute
Step 4:
Add alternate catalog to selected DataLens
Use alias for ids
Activate Item Definitions
For detailed information about these options and the wizard, see AutoBuild Reference Guide.
This section describes changed and new features to Services for Excel.
The installation and uninstallation processes have been simplified with use of scripts for both 32-bit and 64-bit client systems.
The following files and folder are deprecated:
ExcelServicesResource.xml
system.xml
common
folder
Configuration information is now contained in the dlsforexcel_cfg.xml
file. It is located in the application_data
/DataLens/config
directory, where application_data
is the users application directory for the version of Windows that is running. This directory also contains other configuration files necessary to the application including dlsforexcel_termlib_General_English.xml
, which is used to add terminology (term) variants to term libraries.
Log files are located in the application_data
/DataLens/log
directory, where application_data
is the users application directory for the version of Windows that is running. The information logged includes installation specifics, job status, server interaction, and any errors. These log files can be reviewed to ascertain errors and informational purposes.
Translation property files are located in the application_data
/DataLens/locale
directory, where application_data
is the users application directory for the version of Windows that is running. This directory is examined for existing property files that can be used for translation and populates the Target Locale option, which is defaulted to English if no translation property files exist.
The ability to restrict the use of DSAs is introduced. The use of DSAs can be restricted to specific users. The use of DSAs can be restricted to specific users using the Oracle DataLens Server Administration Web pages and Application Studio. All DSAs are displayed when selecting a DSA including restricted DSAs. If a restricted DSA is selected and the user does not have permission to run it, an error is displayed.
The toolbar has been enhanced to add a Test menu to allow easier access to regression testing and data comparison utilities.
The ability to select a DataLens Server Group is introduced.
Job processing, and obtaining status and results has been modified to closely mimic that of Governance Studio to provide more consistent use in OPDQ.
Logging into the Oracle DataLens Server has been modified to use the OPDQ Launch Pad to provide more consistent use in OPDQ. You must log in to the server for all job processing activities. The log in applies to all spreadsheets within the open workbook. You can change the user logged in or open an entirely new workbook to login as a different user.
The ability to create or update Term Libraries is introduced to allow you to easily add new variant for use in data lens processing.
The ability to upgrade your Oracle DataLens Server from OPDQ releases, 5.0.01, 5.0.02, 5.1, and 5.5.x to 5.6 is introduced. The OPDQ upgrade methodology is to install the 5.6 Oracle DataLens Administration Server, while maintaining the ability to roll back to your existing server.
Note:
Assistance for upgrades can be obtained by contacting Oracle Consulting Services. Upgrades should be a planned migration to ensure the retention of your Server Group Topology, data repository, DSAs, data lenses, and data.This section outlines the changed and new features in release 5.6.2 of EDQP, which is a cumulative release of this and the limited 5.6.1 patch release.
The Product Data Quality Solution is now part of a larger suite of products known as Oracle Enterprise Data Quality. As a result of this transition and to align with the family of data quality capabilities, Oracle Product Data Quality will now be known as "Oracle Enterprise Data Quality for Product Data" with two separate product names on the Oracle Global Pricing list as follows:
Oracle Enterprise Data Quality Product Data Parsing and Standardization
Oracle Enterprise Data Quality Product Data Match and Merge
The product will continue to be packaged as one installation, Oracle Enterprise Data Quality for Product Data, but the Oracle Enterprise Data Quality Product Data Match functionality will need to be enabled on the server. New customers who have purchased the matching functionality, as well as, existing customers who upgrade from the current Oracle Product Data Quality release, must enable the matching functionality from the Oracle DataLens Administration Server, and then restart it to enable the matching functionality; the matching functionality is off by default.
This section describes changed and new features to Application Studio.
The ability to output Item Definition IDs from Item Definition Transformation Maps is introduced.
The default Standardization QI option is changed to deselected (though the default value remains at 15).
The Semantic Key 2 scoring calculation is improved so that all attributes, both required or scoring, are included and creates fully normalized scores for each scoring type, exponential, linear and equal scoring.
The Count ALL rows option is introduced. You can now count the number of matched sets (duplicates) in the bottom pane of a match output tab rather than the default top pane.
This section describes changed and new features to Governance Studio.
The filtering functionality is enhanced to enable filtering by blank and non-blank values and to clear all filters.
This section describes changes to Knowledge Studio.
Modified to ensure that the original source text is displayed in the Source field so that it can be edited. (For example, the Test Global Standardization sub-tab of the Standardize tab.)
The Colors Smart Glossary has been enhanced. It now contains a Basic_Colors standardization type that standardizes each color to one of 11 color families. This set of standardization rules can easily be copied into other standardizations using the Copy Global Standardizations option.
Smart Glossary enhancements include:
Increased coverage to better handle health care data
Units of Measure standardized to System International (SI) standards
Increased term coverage
Increased coverage of Units of Measure (for example, range rules like 2D and 3D size were added)
Reduced coverage in Counts Smart Glossary (for example, a smaller set of phrase rules)
Simplified finishes and materials coverage (rules reduced producing a more elegant structure; easy to augment)
Data lens collaboration functionality (checking components in and out) is deprecated.
This section outlines the changed and new features in the patch release 5.6.2 of EDQP, which is a cumulative release of this and the limited 5.6.1 patch release.
This section describes changed and new features to AutoBuild.
Oracle R12 Product Information Management (PIM) support enhancements include:
Attributes can now be added to mid level Item Definitions that also have children Item Definitions.
Performance is enhanced for data lens lookups for very large source metadata.
Value set options that allow the automatic creation of standardization rules for all attribute values that are identified as value sets (for example, a value set column is present in the source metadata and the value was flagged as a value set in that column.) So that they can be easily recognized, all PIM value set phrase names that are created are appended with _vs
and valid value term names with _vv
.
Attribute value columns containing units of measure no longer produce multiple terms when spaces exist; rather, a single term is produced. If, however, a unit of measure columns is present in the source meta data (attr_uom
), then the unit of measure will be parsed separately from the attribute value.
Fusion PIM is support is enhanced as follows:
Language column support for AutoBuild metadata input is added.
UOM base value column support for AutoBuild metadata input is added.
Automatic selection of required settings from either Fusion PIM or R12 PIM metadata import formats when they are recognized is introduced. This applies to creating and updating data lenses from PIM metadata, adding sample data, and adding alternate catalogs to an existing data lens.
New options in Step 1 of the Wizard are introduced as follows:
The Keep Category Name Case option allows you to keep the category name case when creating the corresponding Item Definition name. For example, using this option on the category name "TV Camera" generates the Item Definition name "TV_Camera" rather than "Tv_camera".
The data lens creation options now provide a choice of activities for you to perform: create a data lens, update an existing data lens, or adding an alternative catalog to an existing data lens.
This section describes changes to R12 PIM Connector.
Updated the product name to R12 PIM Connector.
The getversion
function is introduced. It returns a static string containing the installed PIM Connector version number. The getversion
function can be called from SQLPlus or through a JDBC connected program.
The product packaging has been changed to rename the services_for_excel_templates to spreadsheet_templates.
This section describes changes to Services for Excel.
Menus and toolbar have been rearranged to more easily access functions.
The Next Difference button is added to the toolbar to make locating regression testing differences easier; remains on the context-sensitive menu.
The Load Meta-Data Files... menu option is introduced to load exported tab-separated PIM metadata files avoiding loss of data integrity through Excel auto-formatting (for example, the loss of leading zeros on part numbers or codes.)
Installing and uninstalling Services for Excel independent of EDQP Client software is introduced. Additionally, installation is enhanced to upgrade to a newer AutoBuild data lens template and term libraries when a newer version is available.
Use of Secure HTTP is introduced. Logging into an Oracle DataLens Server can use HTTP or HTTPS. One or the other protocol must be used homogeneously throughout EDQP.
Regression testing is enhanced to accept all like differences in a column.
Job Options dialog is simplified to include Oracle DataLens Server and Server Group, and allow you to make one selection, the desire DSA, in most cases without any additional selections needed. The Details button is introduced for use when additional custom settings are desired. The Insert Headings button is introduced to add headings in the open spreadsheet corresponding to the input and output fields of the selected DSA.
Job Results is enhanced to prompt for Job ID when none are running.
Unused Smart Glossary phrases and terms are now trimmed when generating a new data lens so that the data lens is cleaner, easier to maintain, and faster at run-time.
Note:
The Services for Excel 5.6.2 release is only compatible with 5.6.x releases; 5.5.x releases are not compatible or supported.For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc
.
Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info
or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs
if you are hearing impaired.
Oracle Enterprise Data Quality for Product Data What's New in EDQP, Releases 5.0, 5.1, 5.5.03, 5.5.03.02, 5.6, 5.6.0, 5.6.2
E24155-02
Copyright © 2010, 2011 Oracle and/or its affiliates. All rights reserved.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable:
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.