Skip Headers
Oracle® Enterprise Data Quality for Product Data Getting Started
Release 11g R1 (11.1.1.6)

Part Number E35635-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
PDF · Mobi · ePub

Oracle® Enterprise Data Quality for Product Data

Getting Started

Release 11g R1 (11.1.1.6)

E35635-02

February 2013

This document introduces Oracle Enterprise Data Quality for Product Data and describes how to get started installing and using it, and contains the following sections.

Introduction to Enterprise DQ for Product

This section provides an introduction to the major concepts and components of Oracle Enterprise Data Quality for Product Data, and contains the following sections.

Enterprise DQ for Product (EDQP) is built from the ground up to tackle the unique challenges of assessing, improving, and managing product data. The EDQP solution is built on market-leading semantic-based technology and has been proven in a wide range of customer scenarios involving product, item, asset, SKU, and other forms of product or product-like data across a range of industries.

Typical product data is unstructured, non-standard, and often missing important information. With EDQP, you can create consistent and reliable product data by quickly identifying and applying standards to product data across systems, repositories, and processes, including the ability to identify and remediate problem data.

What is Data Quality?

Data quality is foundational to almost any business process since the benefits of consistent and reliable product data can be felt in every aspect of the process. The typical benefits of improved data quality are reaped in the areas of improving revenue, cost efficiencies, IT projects, and reporting and compliance.

What is Enterprise DQ for Product?

EDQP can be integrated with any application or process, and is pre-integrated at a semantic level with the Oracle Product Hub to reduce the time and cost to deploy and operate your Master Data Management system for product data, or any other system of record while also extending its capability and benefit.

EDQP is designed to handle data that is:

  • Poorly structured—Requires sophisticated semantic parsing capabilities.

  • Non-standard—Requires standards to be applied and data transformed to meet enterprise standards.

  • Highly variable—Requires flexible recognition and transformation capabilities to address nearly infinite combinations of acronyms, spelling, and vocabulary variations.

  • Category-specific—Requires the ability to both categorize an item and apply different rules based on content and context.

  • Variable quality—Requires integrated exception management and remediation capabilities.

  • Duplicated—Requires sophisticated semantic matching capabilities.

  • Poor and sparse data—Requires probabilistic matching approach.

EDQP delivers the following:

  • Semantic-based recognition—Context-based recognition enables accurate parsing, standardization, and matching along with auto-learning to handle the extreme variability and unpredictability of product data.

  • StatSim matching—Probabilistic matching engine designed to provide quick ready-to-use match results even for data poor in attribution.

  • Scalability—Manages millions of items across thousands of categories.

  • Integrated governance—Allows data stewards to monitor overall process effectiveness as well as drive direct data remediation.

  • Business user interface—Code-free interface streamlines use for business users, who best understands the rules and nuances of the data.

  • Enterprise-wide applicability—Standard process easily plugs in to existing systems and processes to enforce product data quality standards in any process or system.

Using Enterprise DQ for Product

EDQP uses data service applications (DSAs) to take incoming data through a customized business task flow and data lenses to use semantic knowledge to interpret and standardize unstructured, disparate information. It contains the following product components:

Knowledge Studio

Data lenses enable:

  • Contextual recognition

  • Very precise Semantic Form, Fit, Function match

  • Transformation and standardization to conform to any format or standard

  • Classification to any taxonomy, whether industry-standard or custom

  • Translation for any language to any language, including double-byte languages

Data lenses are designed to be built and maintained by business users who understand the nuances in meaning of product descriptions

Includes facilities to 'AutoBuild' data lenses from available metadata (extracted from PIM or legacy systems, rules or standards)

Application Studio

Data Service Applications (DSAs) enable:

  • Implementation of business rules for imposition of data standards

  • Management of both 'good' items and exceptions through a full workflow process including both automated and manual remediation

  • Real-time or batch integration—take data from any source and return it to any destination

  • Data enrichment using internal and external sources as well as manual effort as required

  • Ability to create a "quick match" application leveraging new statistical match capabilities using EDQP StatSim

Data Service Applications can be called by any system or process in either real-time or batch mode using Web Service or API calls.

Governance Studio

The Governance Studio presents a user interface specifically designed for process governance and data remediation and includes:

  • Dashboard view of process and data quality metrics—so data stewards can monitor and drive continuous process improvements

  • Data transformation review—allowing product specialists to review recognized and transformed data as required

  • Exception management view—for product specialists to review remaining data problems on an exception basis

  • Match review—for product specialists to review system-generated matches (full review or exception-based)

  • Auto-learning—system creates inference rules for unrecognized data

  • Data remediation capabilities - allowing product specialists to fix issues with the data; information captured during the remediation process generates rule augmentation tasks that are reviewed and used to enrich the DQ rules

The DataLens Governance Studio can be used by a broad audience of Data Stewards and product specialists to monitor and drive data quality for their area of responsibility

Task Manager

The Task Manager manages tasks:

  • generated from the Task Manager directly

  • generated from the Governance Studio or Knowledge Studio

  • that enrich your data and are created using the enrichment and AutoSuggest functionality in the Application Studio and Governance Studio respectively

AutoBuild and Services for Excel

Autobuild constructs the initial data lens by examining the structured category and attribute information. Given sufficient information, the AutoBuild application can accomplish the following:

  • Construct a full Item Definition hierarchy, complete with required, scoring, and optional attributes.

  • Construct rich term and phrase recognition rules.

  • Provide an initial set of standardization, classification, and match rules.

The AutoBuild application is included in the Services for Excel additionally installed (add-in) product, which provides a custom toolbar that is added to Excel. Services for Excel interfaces directly with the Oracle DataLens Server to execute Data Service Applications (DSAs) to provide enhanced, tailored, spreadsheet-based transformations of your data.

Oracle DataLens Servers

The EDQP Oracle DataLens Server is configured to run with multiple servers:

  • Oracle DataLens Administration Server

  • Oracle DataLens Transform Server

The administration of all servers in a multi-server configuration is controlled with the Oracle DataLens Administration Server. The purpose of the Administration Server is to manage the various administrative tasks of the servers for the Server Groups (referred to as Transform Servers) and can itself serve as its own Transform Server when installed alone in a single node configuration. By spreading the data processing load across multiple servers the Oracle DataLens Server system provides scalability and configuration control over the various functional areas involved in developing, testing, and ultimately executing Oracle data lens jobs.

EDQP Web Pages

Oracle DataLens Servers are configured, monitored, and administered from the EDQP Web application hosted by the Administration Server both locally and remotely. These web pages are installed as part of the Oracle DataLens Server and include:

Welcome

This is the default page when you log into your Oracle DataLens Server.

Administration

Used to administer and monitor all aspects of Oracle DataLens Servers, user access, connections and services, DSAs, data lenses, and jobs.

See "How to Use the Oracle DataLens Web Pages".

PIM Integrations

The integration of the EDQP solution with the Oracle Fusion PIM Data Hub (PIMDH) and R12 PIMDH independently provide an integrated set of capabilities to categorize, standardize, match, govern, validate, and correct product data being introduced from any source system(s) or catalog(s). For more information about either integration, see Oracle Enterprise Data Quality for Product Data Fusion PIM Integration Implementation and User's Guide or Oracle Enterprise Data Quality for Product Data Fusion R12 PIM Connector User's Guide.

Roadmap to Using EDQP

This section describes the task flows involved in planning, installing, upgrading, configuring, and administering EDQP.

Table 1 Installing and Using EDQP

Task Type of User Detailed Instructions

Verify the supported hardware and software configurations.

Administrator/IT

Oracle Enterprise Data Quality for Product Data Hardware and Software Specification

Install an Administration Server using Console Mode Installation

Administrator/IT

Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Installation Guide

Install a Transform Server using Console Mode Installation

Administrator/IT

Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Installation Guide

Configure a database or FTP connection

Administrator/IT

Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Administration Guide

Create an initial DSA

Business Owner

Oracle Enterprise Data Quality for Product Data Application Studio Reference Guide

Create an initial data lens using AutoBuild or Manually

Business Owner

Oracle Enterprise Data Quality for Product Data AutoBuild Reference Guide

Oracle Enterprise Data Quality for Product Data Knowledge Studio Reference Guide

Start a client application from the launchpad

All

"How to Use the Oracle DataLens Web Pages"


How to Use the Oracle DataLens Web Pages

The Welcome and Administration Web pages are available post-installation. The Welcome Web page is described in the following sections. The Administration Web page and how you can manage your Oracle DataLens Server is described in Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Administration Guide.

Starting the Oracle DataLens Server Web Pages

To access your Oracle DataLens Server Web Pages:

  1. Start your Oracle DataLens Server.

  2. Open one of the supported Web browsers for your environment. See the Oracle Enterprise Data Quality for Product Data Certification Matrix at

    http://www.oracle.com/technetwork/middleware/ias/downloads/fusion-certification-100350.html

    Locate Oracle Enterprise Data Quality in the Product Area column and then click the System Requirements and Supported Platforms for Oracle Enterprise Data Quality for Product Data 11gR1 (11.1.1.x) Certification Matrix (xls) link.

  3. Enter the following URL:

    http://hostname:port/datalens

    where hostname is the DNS name or IP address of the Administration Server and port is the listen port on which the Administration Server is listening for requests (port 2229 by default).

    If you configured the Administration Server to use Secure Socket Layer (SSL) you must add s after http as follows:

    https://hostname:port/datalens

  4. When the login page appears, enter a user name and the password. Typically, this is the user name and password you specified during the installation process.

    The Oracle DataLens Server Web pages are displayed and default to the Welcome tab.

Elements of the Welcome Page

The Welcome page is the starting point in accessing most product components. It is a centralized location for your data governance workflow and is updated dynamically as the various tasks, jobs, DSAs, and data lenses change. You use it to:

  • Install and start client applications using the Application Launcher panel, which is a centralized launchpad for starting client applications.

  • View and refresh Tasks and Jobs that have been created or run.

  • View and refresh data lenses and DSAs that are checked into your Oracle DataLens Server.

  • Access the EDQP 11g documentation library and previous release documentation.

The Welcome page user interface includes the following panels. Any of these panels can be hidden or expanded by clicking the adjacent arrow button. The informational panels that contain columns allow you to resize the columns. Additionally, the blue, double-arrow Refresh button allows you to refresh the information being displayed with the latest from the server.

Surrounding text describes applaunch.png.

Application Launcher

This is the starting point to install or launch the client software applications using the application buttons in this launchpad.

Surrounding text describes applaunch.png.

Installing the Client Applications

The AutoBuild button downloads the Services for Excel, which includes AutoBuild, executable installation files for 32-bit (Microsoft Office 2003, 2007, and 2010) and 64-bit (Microsoft Office 2010) Windows operating systems in one executable file. For installation instructions, see Oracle Enterprise Data Quality for Product Data Services for Excel Reference Guide.

EDQP uses Java Web Start to install and launch the remaining four client applications on your client system from your Oracle DataLens Server.

To install and launch an application:

  1. Click a button in the Application Launcher panel.

  2. The EDQP .jnlp file is downloaded from the server.

    Depending on how you have configured your browser, you may have to allow your browser to save the file and set this type of file to run automatically. For example, in Google Chrome you may have to click Keep in the download panel at the bottom of your browser window to save the file, and then right-click on the arrow to the right of the file name and select Always open files of this type. Alternatively, you can double-click on the downloaded.jnlp file to run it.

    The application download and verification begins. After the verification completes, the installation begins. EDQP Java WebStart files are digitally signed by a trusted source so the following security warning is displayed:

    Surrounding text describes jnlpwarn.png.

    Tip:

    To avoid this security dialog in the future, select the Always trust content from this publisher check box.
  3. Click Run.

    The selected application installation completes and you are prompted to log in to your server.

    Surrounding text describes oralogon.png.
  4. Enter your user name and password. You can avoid entering your password every time you logon by selecting the Remember Password check box.

  5. If you want to change your Oracle DataLens Server or use HTTP Secure (HTTPS), click Change Server.

    The HTTPS option is only certified to run on an Oracle DataLens Server using WebLogic as the application server and its use is recommended by Oracle to ensure a secure operating environment. For more information about configuring HTTPS, see Oracle Fusion Middleware Application Security Guide at

    http://docs.oracle.com/cd/E23943_01/core.1111/e10043/toc.htm

    Surrounding text describes oralogonhttps.png.
  6. To change your Oracle DataLens Server, enter the hostname or IP Address of your Oracle DataLens Administration Server and its port number.

    To use HTTPS to communicate with your Oracle DataLens Server, select the Use HTTPS check box.

    Note:

    Your Oracle DataLens Server (and all other servers in the EDQP topology) must all be configured to use HTTPS, for communication outside of the firewall. For more information about configuring and using HTTPS, see Oracle Fusion Middleware Securing a Production Environment for Oracle WebLogic Server 11g.
  7. When all of the information is correct, and click OK.

EDQP Documentation

You can click the Documentation Library link to access the EDQP documentation that supports your product in your browser.

Surrounding text describes doclib.png.

Tasks

The tasks that are scheduled or have completed for the user currently logged in to the server are listed in the Tasks panel.

Surrounding text describes tasks.png.
Task

The task number is assigned by the system during job submission and is comprised of the user name followed by a job number. You can click on a task number in the Task column to view further details about the selected task.

Surrounding text describes taskdetail.png.
Status

This field shows the status of the job. Status definitions are as follows:

  • Open

  • Fixed

  • Rejected

  • Deleted

  • Closed

Description

The description the user entered when scheduling the task. If the task is created in Governance Studio from enrichment data, the description is automatically populated using the following syntax:

Enrichments for Data Lens: DATA_LENS_NAME, from Project: GOVERNANCE_STUDIO_PROJECT_NAME, Job ID: ##

Date

Shows the date and time the job started in YYYY-MM-DD HH:MM:SS format.

Assigned

This field shows the user name of the person who is assigned to the task.

Jobs

The jobs that are scheduled or have completed for the user currently logged in to the server are listed in the Jobs panel.

Surrounding text describes jobs.png.
Job Id

The Job Id is assigned by the system during job submission. You can click on a job number in the Job ID column to view further details about the job.

Surrounding text describes jobdetail.png.
Status

This field shows the status of the job. Status definitions are as follows:

  • Running

    The job is currently running.

  • Pending

    The job has not started, but will start as soon as a slot becomes available. Only two jobs can run concurrently on the Oracle DataLens Server. Jobs submitted while two jobs are already running will get a status of "Pending" and will start in order of submission as the others jobs finish processing.

  • Finished

    The job has successfully finished processing.

  • Cancelled

    The Administrator canceled the job during processing or before processing started.

  • Failed

    The job failed. This status means that something went wrong during the submission or processing of the data. Failed jobs will yield an entry in the Oracle DataLens Server Log.

Description

The description the user entered when scheduling the job. Additionally, jobs that are run from the Governance Studio or Services for Excel are identified as such.

Duration

This shows the total time in hours/minutes/seconds for a completed job.

Input Line Count

This field shows the number of records processed so far for the DSA job. If a job is in "Running" status, this number will update when you click the Refresh button.

Start

Shows the date and time the job started in YYYY-MM-DD HH:MM:SS format.

Server

The server that the job was sent to for processing. In a server group with more than one server, there may be multiple servers handling the request.

Priority

The priority that the job was given.

  • Low priority

    Jobs are large batch-type jobs processing tens of thousands and millions of lines of data.

  • Medium priority

    Jobs are jobs where the results should be obtained while any low priority job is running.

  • High priority

    Jobs are jobs with just a few lines to process, or jobs run from an interactive user environment, where the results need to be returned immediately.

Owner

This field shows the user name of the person who scheduled the job. Other users may be able to run the job if access to the specified DSA is allowed though the owner of the job is always displayed.

Data Quality Rules

This panel contains the two tabs described in the following sections.

Data Lenses

The data lenses displayed are those that are locked by the user or if the data lens is not locked then the user that last checked in the data lens are listed when you click the Data Lenses tab.

Surrounding text describes datalens.png.

The columns in the Data Lenses tab contain the following information:

  • Data Lens Name—This is the name of the data lens and is a link to the complete history information for the data lens. Click the link to review the data lens details.

  • Days Checked Out—The number of days that the data lens has been checked out of your Oracle DataLens Server.

  • Deployment Status—The status of how the data lens is currently deployed, Development (or the Administration/development), Quality Assurance (QA), or Production Server.

  • Comment—The comment that was entered when the given version of the data lens was checked into the repository.

DSAs

The DSAs displayed are those that are locked by the user or if the DSA is not locked then the user that last checked in the DSA are listed when you click the DSA tab.

Surrounding text describes dsas.png.

This DSAs tab provides information in the same manner as described in the previous section, see "Data Lenses".

Toolbar

The tool bar at the top right of the page includes the following elements:

Toolbar Element Description
Accessibility Click to select and apply the following accessibility options to your current browser session:
  • Screen Reader - Activates the interface to an installed screen reading application.

  • High contrast - Activates a higher color contrast between the text and background.

  • Large fonts - Increases the pitch of the displayed font.

  • Enable Animations - Activates animated images; enabled is the default.

About... Click to view the release information and access the EDQP product website.
Logout username Click to log out of the Launchpad page.

EDQP Documentation Set

For comprehensive guidelines for installing, configuring, and using EDQP, refer to the documents summarized in the following table:

See the latest version of this and all documents listed at the Oracle Enterprise Data Quality for Product Data Documentation Web site at

Table 2 Documentation Summary

Review this document... To learn how to...

Hardware and Software Specification

Identify the supported software and hardware specifications necessary for the EDQP software.

What's New in EDQP

Identify the features and changes in each 11g (11.1.1.6) release of EDQP.

Glossary

Use commonly used EDQP terms.

Security Guide

Securely install and configure your EDQP environment.

Oracle DataLens Server Installation Guide

Install both types of Oracle DataLens Servers, install a WebLogic Server and domain for EDQP, and upgrade an existing server.

Oracle DataLens Server Administration Guide

Configure, monitor, and administer Oracle DataLens Servers, DSAs, data lenses, and users.

Application Studio Reference Guide

Create, enhance, and maintain Data Service Applications (DSAs).

AutoBuild Studio Reference Guide

Creating an initial data lens based on existing product information and data lens knowledge or updating an existing data lens.

Knowledge Studio Reference Guide

Create, maintain, and enhance data lenses to refine your data.

Governance Studio Reference Guide

Build projects to analyze your transformed data, create reports to show the quality of your data, and identify missing attributes.

Services for Excel Reference Guide

Install the application and use it in conjunction with DSAs to transform your spreadsheet-based data from within Microsoft Excel.

Task Manager Reference Guide

Managing tasks created with the Task Manager or Governance Studio.

Java Interface Guide

.Net Interface Guide

Web Service Access to Oracle DataLens Servers Interface Guide

Install and use the EDQP APIs to interface with Oracle DataLens Servers.


http://docs.oracle.com/cd/E35636_01/index.htm

Documentation Accessibility

For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.

Access to Oracle Support

Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.


Oracle Enterprise Data Quality for Product Data Getting Started, Release 11g R1 (11.1.1.6)

E35635-02

Copyright © 2012, 2013,  Oracle and/or its affiliates. All rights reserved.

This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable:

U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.

This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.