Oracle Java CAPS Master Index Standardization and Matching Process - Oracle Java CAPS Master Index Match Engine Reference

Skip Navigation Links
Exit Print View
	Oracle Java CAPS Master Index Match Engine Reference Java CAPS Documentation

Oracle Technology Network

Document Information

Master Index Match Engine Reference

About the Master Index Match Engine

Master Index Match Engine Overview

Data Matching Concepts

Deterministic and Probabilistic Data Matching

Weighting Thresholds

Probabilities and Direct Weights

Matching and Unmatching Probabilities

Agreement and Disagreement Weight Ranges

How the Master Index Match Engine Works

Master Index Match Engine Structure

Master Index Match Engine Configuration Files

Master Index Match Engine Matching Weight Formulation

Master Index Match Engine Data Types

The Master Index Match Engine and the Master Index Standardization Engine

Oracle Java CAPS Master Index Standardization and Matching Process

Master Index Match Engine Matching Configuration

The Master Index Match Engine Match Configuration File

Master Index Match Engine Match Configuration File Format

Match Configuration File Sample

Probability Type Section

Matching Rules Section

Master Index Match Engine Matching Comparison Functions At a Glance

Master Index Match Engine Comparator Definition List

Master Index Match Engine Comparison Functions

Bigram Comparators

Bigram Comparator (b1)

Advanced Bigram Comparator (b2)

Uncertainty String Comparators

Advanced Jaro String Comparator (u)

Winkler-Jaro String Comparator (ua)

Condensed String Comparator (us)

Advanced Jaro Adjusted for First Names (uf)

Advanced Jaro Adjusted for Last Names (ul)

Advanced Jaro Adjusted for House Numbers (un)

Advanced Jaro AlphaNumeric Comparator (ujs)

Unicode String Comparator (usu)

Unicode AlphaNumeric Comparator (usus)

Exact Character-to-Character Comparator (c)

Numeric Comparators

Integer Comparator (nI)

Real Number Comparator (nR)

Condensed AlphaNumeric SSN Comparator (nS)

Date Comparators

Date Comparator With Years as Units (dY)

Date Comparator With Months as Units (dM)

Date Comparator With Days as Units (dD)

Date Comparator With Hours as Units (dH)

Date Comparator With Minutes as Units (dm)

Date Comparator With Seconds as Units (ds)

Prorated Comparator (p)

Creating Custom Comparators for the Master Index Match Engine

Custom Comparator Overview

About the Comparator Package

Defining Custom Comparators

Before You Begin

Step 1: Create the Custom Comparator Java Class

setRTParameters

Step 2: Register the Comparator in the Comparators List

To Register the Comparators

Step 3: Define Parameter Validations (Optional)

To Define Parameter Validations

validateComparatorsParameters

Step 4: Define Data Source Handling (Optional)

To Define Data Source Handling

handleComparatorsDataSources

DataSourcesProperties Class

getDataSourcesList

isDataSourceLoaded

setDataSourceLoaded

getDataSourceObject

Step 5: Define Curve Adjustment or Linear Fitting (Optional)

To Define Curve Adjustment or Linear Fitting

processCurveAdjustment

Step 6: Compile and Package the Comparator

Step 7: Import the Comparator Package Into Oracle Java CAPS Master Index

To Import a Comparison Function

Step 8: Configure the Comparator in the Match Configuration File

Master Index Match Engine Configuration for Common Data Types

The Master Index Match String

Master Index Match Engine Match String Fields

Person Data Match String Fields

Address Data Match String Fields

Business Name Match String Fields

Master Index Match Engine Match Types

Configuring the Match String for a Master Index Application

Configuring the Match String for Person Data

Configuring the Match String for Address Data

Configuring the Match String for Business Names

Fine-Tuning Weights and Thresholds for Oracle Java CAPS Master Index

Data Analysis Overview

Customizing the Match Configuration and Thresholds

Determining the Match Fields

Customizing the Match Configuration

Probabilities or Agreement Weights

Defining Relative Value

Determining the Weight Range

Weight Ranges Using Agreement Weights

Weight Ranges Using Probabilities

Comparison Functions

Determining the Weight Thresholds

Specifying the Weight Thresholds

Weight Distribution Method

Percentage Method

Fine-tuning the Thresholds

Oracle Java CAPS Master Index Standardization and Matching Process

In a default Oracle Java CAPS Master Index implementation, the master index application uses the Master Index Match Engine and the Master Index Standardization Engine to cleanse data in real time. The standardization engine uses configurable pattern-matching logic to identify data and reformat it into a standardized form. The match engine uses a matching algorithm with a proven methodology to process and weight records in the master index database. By incorporating both standardization and matching capabilities, you can condition data prior to matching. You can also use these capabilities to review legacy data prior to loading it into the database. This review helps you determine data anomalies, invalid or default values, and missing fields.

In a master index application, both matching and standardization occur when two records are analyzed for the probability of a match. Before matching, certain fields are normalized, parsed, or converted into their phonetic values if necessary. The match fields are then analyzed and weighted according to the rules defined in a match configuration file. The weights for each field are combined to determine the overall matching weight for the two records. After these steps are complete, survivorship is determined by the master index application based on how the overall matching weight compares to the duplicate and match thresholds of the master index application.

In a master index application, the standardization and matching process includes the following steps:

The master index application receives an incoming record.
The Master Index Standardization Engine standardizes the fields specified for parsing, normalization, and phonetic encoding. These fields are defined in mefa.xml and the rules for standardization are defined in the standardization engine configuration files.
The master index application queries the database for a candidate selection pool (records that are possible matches) using the blocking query specified in master.xml. If the blocking query uses standardized or phonetic fields, the criteria values are obtained from the database.
For each possible match, the master index application creates a match string (based on the match columns in mefa.xml) and sends the string to the Master Index Match Engine.
The Master Index Match Engine checks the incoming record against each possible match, producing a matching weight for each. Matching is performed using the weighting rules defined in the match configuration file.

Copyright © 2009, 2011, Oracle and/or its affiliates. All rights reserved. Legal Notices

Previous

Next