Oracle® Enterprise Data Quality for Product Data Getting Started Release 11g R1 (11.1.1.6) Part Number E35635-02 |
|
|
PDF · Mobi · ePub |
Getting Started
Release 11g R1 (11.1.1.6)
E35635-02
February 2013
This document introduces Oracle Enterprise Data Quality for Product Data and describes how to get started installing and using it, and contains the following sections.
This section provides an introduction to the major concepts and components of Oracle Enterprise Data Quality for Product Data, and contains the following sections.
Enterprise DQ for Product (EDQP) is built from the ground up to tackle the unique challenges of assessing, improving, and managing product data. The EDQP solution is built on market-leading semantic-based technology and has been proven in a wide range of customer scenarios involving product, item, asset, SKU, and other forms of product or product-like data across a range of industries.
Typical product data is unstructured, non-standard, and often missing important information. With EDQP, you can create consistent and reliable product data by quickly identifying and applying standards to product data across systems, repositories, and processes, including the ability to identify and remediate problem data.
Data quality is foundational to almost any business process since the benefits of consistent and reliable product data can be felt in every aspect of the process. The typical benefits of improved data quality are reaped in the areas of improving revenue, cost efficiencies, IT projects, and reporting and compliance.
EDQP can be integrated with any application or process, and is pre-integrated at a semantic level with the Oracle Product Hub to reduce the time and cost to deploy and operate your Master Data Management system for product data, or any other system of record while also extending its capability and benefit.
EDQP is designed to handle data that is:
Poorly structured—Requires sophisticated semantic parsing capabilities.
Non-standard—Requires standards to be applied and data transformed to meet enterprise standards.
Highly variable—Requires flexible recognition and transformation capabilities to address nearly infinite combinations of acronyms, spelling, and vocabulary variations.
Category-specific—Requires the ability to both categorize an item and apply different rules based on content and context.
Variable quality—Requires integrated exception management and remediation capabilities.
Duplicated—Requires sophisticated semantic matching capabilities.
Poor and sparse data—Requires probabilistic matching approach.
EDQP delivers the following:
Semantic-based recognition—Context-based recognition enables accurate parsing, standardization, and matching along with auto-learning to handle the extreme variability and unpredictability of product data.
StatSim matching—Probabilistic matching engine designed to provide quick ready-to-use match results even for data poor in attribution.
Scalability—Manages millions of items across thousands of categories.
Integrated governance—Allows data stewards to monitor overall process effectiveness as well as drive direct data remediation.
Business user interface—Code-free interface streamlines use for business users, who best understands the rules and nuances of the data.
Enterprise-wide applicability—Standard process easily plugs in to existing systems and processes to enforce product data quality standards in any process or system.
EDQP uses data service applications (DSAs) to take incoming data through a customized business task flow and data lenses to use semantic knowledge to interpret and standardize unstructured, disparate information. It contains the following product components:
Data lenses enable:
Contextual recognition
Very precise Semantic Form, Fit, Function match
Transformation and standardization to conform to any format or standard
Classification to any taxonomy, whether industry-standard or custom
Translation for any language to any language, including double-byte languages
Data lenses are designed to be built and maintained by business users who understand the nuances in meaning of product descriptions
Includes facilities to 'AutoBuild' data lenses from available metadata (extracted from PIM or legacy systems, rules or standards)
Data Service Applications (DSAs) enable:
Implementation of business rules for imposition of data standards
Management of both 'good' items and exceptions through a full workflow process including both automated and manual remediation
Real-time or batch integration—take data from any source and return it to any destination
Data enrichment using internal and external sources as well as manual effort as required
Ability to create a "quick match" application leveraging new statistical match capabilities using EDQP StatSim
Data Service Applications can be called by any system or process in either real-time or batch mode using Web Service or API calls.
The Governance Studio presents a user interface specifically designed for process governance and data remediation and includes:
Dashboard view of process and data quality metrics—so data stewards can monitor and drive continuous process improvements
Data transformation review—allowing product specialists to review recognized and transformed data as required
Exception management view—for product specialists to review remaining data problems on an exception basis
Match review—for product specialists to review system-generated matches (full review or exception-based)
Auto-learning—system creates inference rules for unrecognized data
Data remediation capabilities - allowing product specialists to fix issues with the data; information captured during the remediation process generates rule augmentation tasks that are reviewed and used to enrich the DQ rules
The DataLens Governance Studio can be used by a broad audience of Data Stewards and product specialists to monitor and drive data quality for their area of responsibility
The Task Manager manages tasks:
generated from the Task Manager directly
generated from the Governance Studio or Knowledge Studio
that enrich your data and are created using the enrichment and AutoSuggest functionality in the Application Studio and Governance Studio respectively
Autobuild constructs the initial data lens by examining the structured category and attribute information. Given sufficient information, the AutoBuild application can accomplish the following:
Construct a full Item Definition hierarchy, complete with required, scoring, and optional attributes.
Construct rich term and phrase recognition rules.
Provide an initial set of standardization, classification, and match rules.
The AutoBuild application is included in the Services for Excel additionally installed (add-in) product, which provides a custom toolbar that is added to Excel. Services for Excel interfaces directly with the Oracle DataLens Server to execute Data Service Applications (DSAs) to provide enhanced, tailored, spreadsheet-based transformations of your data.
The EDQP Oracle DataLens Server is configured to run with multiple servers:
Oracle DataLens Administration Server
Oracle DataLens Transform Server
The administration of all servers in a multi-server configuration is controlled with the Oracle DataLens Administration Server. The purpose of the Administration Server is to manage the various administrative tasks of the servers for the Server Groups (referred to as Transform Servers) and can itself serve as its own Transform Server when installed alone in a single node configuration. By spreading the data processing load across multiple servers the Oracle DataLens Server system provides scalability and configuration control over the various functional areas involved in developing, testing, and ultimately executing Oracle data lens jobs.
Oracle DataLens Servers are configured, monitored, and administered from the EDQP Web application hosted by the Administration Server both locally and remotely. These web pages are installed as part of the Oracle DataLens Server and include:
This is the default page when you log into your Oracle DataLens Server.
Used to administer and monitor all aspects of Oracle DataLens Servers, user access, connections and services, DSAs, data lenses, and jobs.
The integration of the EDQP solution with the Oracle Fusion PIM Data Hub (PIMDH) and R12 PIMDH independently provide an integrated set of capabilities to categorize, standardize, match, govern, validate, and correct product data being introduced from any source system(s) or catalog(s). For more information about either integration, see Oracle Enterprise Data Quality for Product Data Fusion PIM Integration Implementation and User's Guide or Oracle Enterprise Data Quality for Product Data Fusion R12 PIM Connector User's Guide.
This section describes the task flows involved in planning, installing, upgrading, configuring, and administering EDQP.
Table 1 Installing and Using EDQP
Task | Type of User | Detailed Instructions |
---|---|---|
Verify the supported hardware and software configurations. |
Administrator/IT |
Oracle Enterprise Data Quality for Product Data Hardware and Software Specification |
Install an Administration Server using Console Mode Installation |
Administrator/IT |
Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Installation Guide |
Install a Transform Server using Console Mode Installation |
Administrator/IT |
Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Installation Guide |
Configure a database or FTP connection |
Administrator/IT |
Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Administration Guide |
Create an initial DSA |
Business Owner |
Oracle Enterprise Data Quality for Product Data Application Studio Reference Guide |
Create an initial data lens using AutoBuild or Manually |
Business Owner |
Oracle Enterprise Data Quality for Product Data AutoBuild Reference Guide Oracle Enterprise Data Quality for Product Data Knowledge Studio Reference Guide |
Start a client application from the launchpad |
All |
The Welcome and Administration Web pages are available post-installation. The Welcome Web page is described in the following sections. The Administration Web page and how you can manage your Oracle DataLens Server is described in Oracle Enterprise Data Quality for Product Data Oracle DataLens Server Administration Guide.
To access your Oracle DataLens Server Web Pages:
Start your Oracle DataLens Server.
Open one of the supported Web browsers for your environment. See the Oracle Enterprise Data Quality for Product Data Certification Matrix at
http://www.oracle.com/technetwork/middleware/ias/downloads/fusion-certification-100350.html
Locate Oracle Enterprise Data Quality in the Product Area column and then click the System Requirements and Supported Platforms for Oracle Enterprise Data Quality for Product Data 11gR1 (11.1.1.x) Certification Matrix (xls) link.
Enter the following URL:
http://
hostname:port/datalens
where hostname is the DNS name or IP address of the Administration Server and port is the listen port on which the Administration Server is listening for requests (port 2229 by default).
If you configured the Administration Server to use Secure Socket Layer (SSL) you must add s after http as follows:
https://
hostname:port/datalens
When the login page appears, enter a user name and the password. Typically, this is the user name and password you specified during the installation process.
The Oracle DataLens Server Web pages are displayed and default to the Welcome tab.
The Welcome page is the starting point in accessing most product components. It is a centralized location for your data governance workflow and is updated dynamically as the various tasks, jobs, DSAs, and data lenses change. You use it to:
Install and start client applications using the Application Launcher panel, which is a centralized launchpad for starting client applications.
View and refresh Tasks and Jobs that have been created or run.
View and refresh data lenses and DSAs that are checked into your Oracle DataLens Server.
Access the EDQP 11g documentation library and previous release documentation.
The Welcome page user interface includes the following panels. Any of these panels can be hidden or expanded by clicking the adjacent arrow button. The informational panels that contain columns allow you to resize the columns. Additionally, the blue, double-arrow Refresh button allows you to refresh the information being displayed with the latest from the server.
This is the starting point to install or launch the client software applications using the application buttons in this launchpad.
Installing the Client Applications
The AutoBuild button downloads the Services for Excel, which includes AutoBuild, executable installation files for 32-bit (Microsoft Office 2003, 2007, and 2010) and 64-bit (Microsoft Office 2010) Windows operating systems in one executable file. For installation instructions, see Oracle Enterprise Data Quality for Product Data Services for Excel Reference Guide.
EDQP uses Java Web Start to install and launch the remaining four client applications on your client system from your Oracle DataLens Server.
To install and launch an application:
Click a button in the Application Launcher panel.
The EDQP .jnlp
file is downloaded from the server.
Depending on how you have configured your browser, you may have to allow your browser to save the file and set this type of file to run automatically. For example, in Google Chrome you may have to click Keep in the download panel at the bottom of your browser window to save the file, and then right-click on the arrow to the right of the file name and select Always open files of this type. Alternatively, you can double-click on the downloaded.jnlp
file to run it.
The application download and verification begins. After the verification completes, the installation begins. EDQP Java WebStart files are digitally signed by a trusted source so the following security warning is displayed:
Tip:
To avoid this security dialog in the future, select the Always trust content from this publisher check box.Click Run.
The selected application installation completes and you are prompted to log in to your server.
Enter your user name and password. You can avoid entering your password every time you logon by selecting the Remember Password check box.
If you want to change your Oracle DataLens Server or use HTTP Secure (HTTPS), click Change Server.
The HTTPS option is only certified to run on an Oracle DataLens Server using WebLogic as the application server and its use is recommended by Oracle to ensure a secure operating environment. For more information about configuring HTTPS, see Oracle Fusion Middleware Application Security Guide at
http://docs.oracle.com/cd/E23943_01/core.1111/e10043/toc.htm
To change your Oracle DataLens Server, enter the hostname or IP Address of your Oracle DataLens Administration Server and its port number.
To use HTTPS to communicate with your Oracle DataLens Server, select the Use HTTPS check box.
Note:
Your Oracle DataLens Server (and all other servers in the EDQP topology) must all be configured to use HTTPS, for communication outside of the firewall. For more information about configuring and using HTTPS, see Oracle Fusion Middleware Securing a Production Environment for Oracle WebLogic Server 11g.When all of the information is correct, and click OK.
You can click the Documentation Library link to access the EDQP documentation that supports your product in your browser.
The tasks that are scheduled or have completed for the user currently logged in to the server are listed in the Tasks panel.
The task number is assigned by the system during job submission and is comprised of the user name followed by a job number. You can click on a task number in the Task column to view further details about the selected task.
This field shows the status of the job. Status definitions are as follows:
Open
Fixed
Rejected
Deleted
Closed
The description the user entered when scheduling the task. If the task is created in Governance Studio from enrichment data, the description is automatically populated using the following syntax:
Enrichments for Data Lens:
DATA_LENS_NAME
, from Project:
GOVERNANCE_STUDIO_PROJECT_NAME
, Job ID:
##
Shows the date and time the job started in YYYY-MM-DD HH:MM:SS format.
This field shows the user name of the person who is assigned to the task.
The jobs that are scheduled or have completed for the user currently logged in to the server are listed in the Jobs panel.
The Job Id is assigned by the system during job submission. You can click on a job number in the Job ID column to view further details about the job.
This field shows the status of the job. Status definitions are as follows:
Running
The job is currently running.
Pending
The job has not started, but will start as soon as a slot becomes available. Only two jobs can run concurrently on the Oracle DataLens Server. Jobs submitted while two jobs are already running will get a status of "Pending" and will start in order of submission as the others jobs finish processing.
Finished
The job has successfully finished processing.
Cancelled
The Administrator canceled the job during processing or before processing started.
Failed
The job failed. This status means that something went wrong during the submission or processing of the data. Failed jobs will yield an entry in the Oracle DataLens Server Log.
The description the user entered when scheduling the job. Additionally, jobs that are run from the Governance Studio or Services for Excel are identified as such.
This shows the total time in hours/minutes/seconds for a completed job.
This field shows the number of records processed so far for the DSA job. If a job is in "Running" status, this number will update when you click the Refresh button.
Shows the date and time the job started in YYYY-MM-DD HH:MM:SS format.
The server that the job was sent to for processing. In a server group with more than one server, there may be multiple servers handling the request.
The priority that the job was given.
Low priority
Jobs are large batch-type jobs processing tens of thousands and millions of lines of data.
Medium priority
Jobs are jobs where the results should be obtained while any low priority job is running.
High priority
Jobs are jobs with just a few lines to process, or jobs run from an interactive user environment, where the results need to be returned immediately.
This field shows the user name of the person who scheduled the job. Other users may be able to run the job if access to the specified DSA is allowed though the owner of the job is always displayed.
This panel contains the two tabs described in the following sections.
The data lenses displayed are those that are locked by the user or if the data lens is not locked then the user that last checked in the data lens are listed when you click the Data Lenses tab.
The columns in the Data Lenses tab contain the following information:
Data Lens Name—This is the name of the data lens and is a link to the complete history information for the data lens. Click the link to review the data lens details.
Days Checked Out—The number of days that the data lens has been checked out of your Oracle DataLens Server.
Deployment Status—The status of how the data lens is currently deployed, Development (or the Administration/development), Quality Assurance (QA), or Production Server.
Comment—The comment that was entered when the given version of the data lens was checked into the repository.
The DSAs displayed are those that are locked by the user or if the DSA is not locked then the user that last checked in the DSA are listed when you click the DSA tab.
This DSAs tab provides information in the same manner as described in the previous section, see "Data Lenses".
The tool bar at the top right of the page includes the following elements:
Toolbar Element | Description |
---|---|
Accessibility | Click to select and apply the following accessibility options to your current browser session:
|
About... | Click to view the release information and access the EDQP product website. |
Logout username | Click to log out of the Launchpad page. |
For comprehensive guidelines for installing, configuring, and using EDQP, refer to the documents summarized in the following table:
See the latest version of this and all documents listed at the Oracle Enterprise Data Quality for Product Data Documentation Web site at
Review this document... | To learn how to... |
---|---|
Hardware and Software Specification |
Identify the supported software and hardware specifications necessary for the EDQP software. |
What's New in EDQP |
Identify the features and changes in each 11g (11.1.1.6) release of EDQP. |
Glossary |
Use commonly used EDQP terms. |
Security Guide |
Securely install and configure your EDQP environment. |
Oracle DataLens Server Installation Guide |
Install both types of Oracle DataLens Servers, install a WebLogic Server and domain for EDQP, and upgrade an existing server. |
Oracle DataLens Server Administration Guide |
Configure, monitor, and administer Oracle DataLens Servers, DSAs, data lenses, and users. |
Application Studio Reference Guide |
Create, enhance, and maintain Data Service Applications (DSAs). |
AutoBuild Studio Reference Guide |
Creating an initial data lens based on existing product information and data lens knowledge or updating an existing data lens. |
Knowledge Studio Reference Guide |
Create, maintain, and enhance data lenses to refine your data. |
Governance Studio Reference Guide |
Build projects to analyze your transformed data, create reports to show the quality of your data, and identify missing attributes. |
Services for Excel Reference Guide |
Install the application and use it in conjunction with DSAs to transform your spreadsheet-based data from within Microsoft Excel. |
Task Manager Reference Guide |
Managing tasks created with the Task Manager or Governance Studio. |
Java Interface Guide .Net Interface Guide Web Service Access to Oracle DataLens Servers Interface Guide |
Install and use the EDQP APIs to interface with Oracle DataLens Servers. |
For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc
.
Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info
or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs
if you are hearing impaired.
Oracle Enterprise Data Quality for Product Data Getting Started, Release 11g R1 (11.1.1.6)
E35635-02
Copyright © 2012, 2013, Oracle and/or its affiliates. All rights reserved.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable:
U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.