Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Master Data Management Suite Primer Java CAPS Documentation |
Oracle Java CAPS Master Data Management Suite Primer
About the Oracle Java CAPS Master Data Management Suite
Java CAPS MDM Suite Architecture
Master Data Management Components
Java CAPS Data Quality and Load Tools
Java CAPS MDM Integration and Infrastructure Components
Oracle Java CAPS Enterprise Service Bus
Oracle Java CAPS Business Process Manager
Oracle Java System Access Manager
Oracle Directory Server Enterprise Edition
Oracle Java System Portal Server
NetBeans Integrated Development Environment (IDE)
Java CAPS Master Data Management Process
About the Standardization and Matching Process
Java CAPS Master Index Overview
Java CAPS Master Index Features
Java CAPS Master Index Architecture
Java CAPS Data Integrator Overview
Java CAPS Data Integrator Features
Java CAPS Data Integrator Architecture
Java CAPS Data Integrator Development Phase
Java CAPS Data Integrator Runtime Phase
Java CAPS Data Quality and Load Tools
Master Index Standardization Engine
Master Index Standardization Engine Configuration
Master Index Standardization Engine Features
Master Index Match Engine Matching Weight Formulation
Master Index Match Engine Features
Data Cleanser and Data Profiler
Data Cleanser and Data Profiler Features
Initial Bulk Match and Load Tool
Initial Bulk Match and Load Process Overview
Java CAPS Master Index provides a flexible framework that allows you to create matching and indexing applications, known as enterprise-wide master indexes. A master index uniquely identifies and cross-references the business objects stored in your system databases using data cleansing and matching technology to create a single view of all like objects. Business objects can be any type of entity about which you store information, such as customers, members, vendors, businesses, inventory items, and so on.
The following topics provide information about Java CAPS Master Index and its components.
With Java CAPS Master Index, you can create and configure an enterprise-wide master index for any type of data. The Master Index Wizard guides you through the initial setup steps and special editors are provided so you can further customize the configuration, processing rules, and database structure of the master index. The wizard automatically generates the components you need to implement a master index.
Java CAPS Master Index is highly configurable, allowing you to define the data structure of the information to be indexed and to define the logic that determines how data is updated, standardized, weighted, and matched in the master index database. The master indexes created by Java CAPS Master Index provide accurate identification of objects throughout your organization and cross-reference an object’s local identifiers using an enterprise-wide unique identification number (EUID) assigned by the master index. The master index also ensures accuracy by identifying potential duplicate records and providing the ability to merge or resolve duplicate records. All information is centralized in one shared index. Maintaining a centralized database for multiple systems enables the indexing application to integrate data throughout the enterprise while allowing local systems to continue operating independently.
Java CAPS Master Index provides your business with a powerful assortment of design-time features that you can use to create and configure master index applications. The runtime features allow you to manage the master index system and to perform continuous data cleansing in real time.
The Java CAPS Master Index tools provide your business with flexibility in designing and creating indexing applications. This flexibility allows you to perform the following tasks.
Rapidly develop a master index for any type of business entity using a wizard to create the framework and using a graphical editor to configure the attributes of the index.
Automatically create the primary components of the master index.
Customize the strategies that determine the field values to populate into the single best record (SBR).
Configure the matching algorithm and logic by specifying the fields to standardize, the fields to use for matching, and the matching logic to use for each field.
Incorporate a Java API that is customized to the object structure you define. You can call the operations in this API in the Collaboration Definitions or Business Processes of different Projects.
The components of the master index application are designed to uniquely identify, match, and maintain information throughout a business enterprise. These components are highly configurable, allowing you to create a custom master index suited to your specific data processing needs. Java CAPS Master Index applications provide the following features.
Centralized Information - The master index maintains a centralized database, enabling the integration of data records throughout the enterprise while allowing local systems to continue operating independently. The index stores copies of local system records and of SBRs, which represent the most accurate and complete data for each object.
Configurability - Before deploying the master index, you define the components and processing capabilities of the system to suit your organization’s processing requirements. You can configure the object structure, matching and standardization rules, survivorship rules, data filters, queries, Master Index Data Manager (MIDM) appearance, and field validation rules.
Cross-referencing - The master index is a global cross-indexing application that automates record matching across disparate source systems, simplifying the process of sharing data between systems. The master index uses the local identifiers assigned by your existing systems as a reference for cross-indexing, allowing you to maintain your current systems and practices.
Data Cleansing - The master index uses configurable matching algorithm logic to uniquely identify object records and to identify duplicate and potential duplicate records. The index provides the functionality to easily merge or resolve duplicates. The index can be configured to automatically merge records that are found to be duplicates of one another.
Data Updates - The master index provides the ability to add, update, deactivate, and delete data in the database tables through messages received from external systems. Records received from external systems are checked for potential duplicates during processing. Updates can be performed in real time or as batch processes.
Identification - The master index employs configurable probabilistic matching technology, using a matching algorithm to formulate an effective statistical measure of how closely records match. Using a state-of-the-art algorithm in real time and establishing a common method of locating records, the index consistently and precisely identifies objects within an enterprise.
Matching Algorithm - The matching algorithm and logic used by the master index application is highly configurable. The Master Index Match Engine and Master Index Standardization Engine are used for standardization and matching, but a master index can also be implemented with other match engines that provide a compatible API. The matching and standardization logic is customizable with a framework that allows you to plug in customized logic.
Unique Identifier - Records from various systems are cross-referenced using an enterprise-wide unique identifier (EUID) that the master index assigns to each object record. The index uses the EUID to cross-reference the local IDs assigned to each object by the various computer systems throughout the enterprise.
The Java CAPS Master Index design-time components allow you to define the data structure of the business objects to be stored and cross-referenced and to define the logic that determines how data is processed in the master index application. As shown in the following diagram, the design-time components include a wizard, editors, configuration files, and database scripts. When the master index project is built, a master index application is created that can be deployed to the application server.
Figure 2 Java CAPS Master Index Design-Time Components
Building and deploying the master index application creates the runtime components of Java CAPS Master Index, including components that process and persist data, master index services, and the Master Index Data Manager (a web-based GUI to monitor and maintain master index data). Runtime components also include the master index database. The following diagram illustrates the runtime components of a master index application.
Figure 3 Java CAPS Master Index Runtime Components
The development phase consists of standard tasks for creating an indexing application and advanced tasks for further customizing the applications you create.
The process of creating a master index begins with a thorough analysis of the structure and characteristics of the data you plan to store in the master index database and to share among external systems. The results of this analysis define the structure of the information stored in the master index database and provide information to help you customize the processing and matching logic for the master index.
From this analysis you can design the object structure, matching and standardization logic, any required custom processing, and the connectivity components for the indexing system. Once you have created the master index framework, you can generate custom tools to perform a more in-depth analysis and cleansing of the actual data to be stored in the master index database. For more information, see Data Cleanser and Data Profiler.
The following steps outline the basic procedure for developing a master index using Java CAPS Master Index.
Create a Master Index Application project in NetBeans.
Using the Master Index Wizard, define the data and message structures, the operating environment, and external systems sharing data with the master index application.
Using the Configuration Editor, customize the application.
Generate the application.
Customize the database scripts, and then create the database.
Define the database connection pools and JDBC resources.
Define security.
Build and deploy the fully configured master index application.
You can perform additional tasks during the development phase to customize your indexing application further.
Data Analysis and Cleansing – Generate tools from the master index project to help you analyze and cleanse the initial set of data to be loaded into the master index database. The analysis and cleansing steps are iterative, and each iteration will help you to fine-tune the standardization, matching, and filter logic.
Bulk Loading – Generate the Initial Bulk Match and Load tools to rapidly match, deduplicate, and load the initial data set into the master index database. For more information, see Initial Bulk Match and Load Tool.
Custom Plug-ins - Create Java classes to perform custom processing during the matching process and once the matching process is complete (such as performing additional operations before finalizing a transaction or validating certain field values).
Database Distribution - Before running the predefined scripts against the database, create additional tablespaces to distribute the tables of the master index.
Match Engine Configuration - Customize how weighting is performed by modifying the match engine configuration files included in the master index project.
External System Integration - When you generate the master index project, a set of operations are created that are specifically tailored to the object structure you defined. Use these operations to integrate the master index application using BPEL processes, web services, or Java clients.
The Master Index Wizard takes you through each step of the master index setup process and, based on the information you specify, creates the XML files that define the configuration of the application.
Figure 4 Field Properties on the Master Index Wizard
Once you create the Project files using the wizard, you can further customize the configuration of the master index application using the Configuration Editor.
Figure 5 Normalization Page of the Master Index Configuration Editor
With the Configuration Editor, you can customize the following:
Object structure
Queries
Standardization rules, including field parsing, normalization, and phonetic encoding
Transaction mode (XA or non-XA)
Matching rules and thresholds
Once all of the analysis, design, and development tasks are complete and the system is running, you can perform any of these maintenance tasks.
Transform and route data between external systems and the master index application (where the matching process occurs)
Monitor and manage activities and alerts in the application server logs
Monitor and maintain the indexed records in the master index database using the MIDM
The Manager Service provides a session bean to all components of the master index, such as the MIDM, Query Builder, Update Manager, and so on. During the runtime phase, the Manager Service performs the following tasks:
Manages connectivity to the master index database
Specifies the query to use for the match process and the system parameters that control the match process
Coordinates the activities of the various components of the master index, including queries, updates, object persistence, system parameters, and so on
The components of a master index connect to the database to provide the following features:
Persistence - The Object Persistence service writes instance data to database tables to ensure that data is able to persist in the system.
Recoverability - The master index database allows you to recover data from the last state of consistency.
Transaction History - The database stores a description of the changes that occur for each transaction. This allows you to view a complete history of changes to each record in the database.
The Master Index Data Manager (MIDM) is a web-based interface that allows you to monitor and maintain the data in your master index database. The appearance and search capabilities of the MIDM are highly configurable to allow you to view and search for information in the way that best suits your business needs. The following figure shows a sample page on the MIDM.
Figure 6 Master Index Data Manager
The MIDM allows you to perform these primary functions to monitor and maintain the data in a master index database.
Transaction History - You can view a complete history of each object for both the local system records and the single best record.
Data Maintenance - You can add new records; view, update, deactivate, or reactivate existing records; and compare records for similarities and differences.
Search - You can perform searches against the database for a specific object or a set of objects. For certain searches, the results are assigned a matching weight indicating the probability of a match.
Potential Duplicate Detection and Handling - Using matching algorithm logic, the master index identifies potential duplicate records and provides the functionality to correct the duplication.
Merge and Unmerge - You can merge records you find to be actual duplicates of one another at either the enterprise-wide unique identifier (EUID) level or the system record level. Merges made in error can easily be unmerged.