Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Master Index Match Engine Reference Java CAPS Documentation |
Master Index Match Engine Reference
About the Master Index Match Engine
Master Index Match Engine Overview
Deterministic and Probabilistic Data Matching
Probabilities and Direct Weights
Matching and Unmatching Probabilities
Agreement and Disagreement Weight Ranges
How the Master Index Match Engine Works
Master Index Match Engine Structure
Master Index Match Engine Configuration Files
Master Index Match Engine Matching Weight Formulation
Master Index Match Engine Data Types
The Master Index Match Engine and the Master Index Standardization Engine
Oracle Java CAPS Master Index Standardization and Matching Process
Master Index Match Engine Matching Configuration
The Master Index Match Engine Match Configuration File
Master Index Match Engine Match Configuration File Format
Match Configuration File Sample
Master Index Match Engine Matching Comparison Functions At a Glance
Master Index Match Engine Comparison Functions
Advanced Bigram Comparator (b2)
Uncertainty String Comparators
Advanced Jaro String Comparator (u)
Winkler-Jaro String Comparator (ua)
Condensed String Comparator (us)
Advanced Jaro Adjusted for First Names (uf)
Advanced Jaro Adjusted for Last Names (ul)
Advanced Jaro Adjusted for House Numbers (un)
Advanced Jaro AlphaNumeric Comparator (ujs)
Unicode String Comparator (usu)
Unicode AlphaNumeric Comparator (usus)
Exact Character-to-Character Comparator (c)
Condensed AlphaNumeric SSN Comparator (nS)
Date Comparator With Years as Units (dY)
Date Comparator With Months as Units (dM)
Date Comparator With Days as Units (dD)
Date Comparator With Hours as Units (dH)
Date Comparator With Minutes as Units (dm)
Date Comparator With Seconds as Units (ds)
Creating Custom Comparators for the Master Index Match Engine
Step 1: Create the Custom Comparator Java Class
Step 2: Register the Comparator in the Comparators List
Step 3: Define Parameter Validations (Optional)
To Define Parameter Validations
Step 4: Define Data Source Handling (Optional)
To Define Data Source Handling
Step 5: Define Curve Adjustment or Linear Fitting (Optional)
To Define Curve Adjustment or Linear Fitting
Step 6: Compile and Package the Comparator
Step 7: Import the Comparator Package Into Oracle Java CAPS Master Index
To Import a Comparison Function
Step 8: Configure the Comparator in the Match Configuration File
Master Index Match Engine Configuration for Common Data Types
Master Index Match Engine Match String Fields
Person Data Match String Fields
Address Data Match String Fields
Business Name Match String Fields
Master Index Match Engine Match Types
Configuring the Match String for a Master Index Application
Configuring the Match String for Person Data
Configuring the Match String for Address Data
Configuring the Match String for Business Names
Fine-Tuning Weights and Thresholds for Oracle Java CAPS Master Index
Customizing the Match Configuration and Thresholds
Customizing the Match Configuration
Probabilities or Agreement Weights
Weight Ranges Using Agreement Weights
Weight Ranges Using Probabilities
Determining the Weight Thresholds
The match configuration file, matchConfigFile.cfg, contains the matching logic for each field on which matching is performed. By default, this file defines the matching logic for the three primary data types (person names, business names, and addresses), and can also handle generic data types, such as dates, numbers, social security numbers, and characters.
The match configuration file defines matching logic for each field on which matching is performed. The Master Index Match Engine provides several comparison functions that you can call in this file to fine-tune the match process. Comparison functions contain the logic to compare different types of data in very specific ways in order to arrive at a match weight for each field. These functions allow you to define how matching is performed for different data types and can be used in conjunction with either matching and unmatching probabilities or agreement and disagreement weight ranges for each field. This file also defines how to handle missing fields.
The following topics describe the format of the configuration file and provide an overview of the predefined comparison functions:
These topics describe the format of the files so you can modify them directly. You can also modify the match configuration file using the Master Index Configuration Editor, which provides an easy, graphical way to configure matching rules.
The match configuration file is divided into two sections. The first section consists of one line that indicates the matching probability type. The second section consists of the matching rules to use for each match field. In a master index application, this file can be modified from the Matching tab of the Master Index Configuration Editor. For more information, see Configuring the Comparison Functions for a Master Index Application in Oracle Java CAPS Master Index Configuration GuideConfiguring the Comparison Functions
Following is an excerpt from the default match configuration file. This excerpt illustrates the components that are described in the following sections.
ProbabilityType 1 FirstName 15 0 uf 0.99 0.001 15 -5 LastName 15 0 ul 0.99 0.001 15 -5 String 25 0 ua 0.99 0.001 10 -5 DateDays 20 0 dD 0.99 0.001 10 -10 y 15 30 DateMonths 20 0 dM 0.99 0.001 10 -10 n DateHours 20 0 dH 0.99 0.001 10 -10 y 30 60 DateMinutes 20 0 dm 0.99 0.001 10 -10 y 300 600 DateSeconds 20 0 ds 0.99 0.001 10 -10 y 75 60 Integer 15 0 nI 0.99 0.001 10 -10 n Real 15 0 nR 0.99 0.001 10 -10 n Char 1 0 c 0.99 0.001 5 -5 pro 15 0 p 0.99 0.001 10 -10 20 5 5
The first line of the match configuration file defines the probability type to use for matching. Specify “0” (zero) to use m-probabilities and u-probabilities to determine a field’s match weight; specify “1” (one) to use agreement and disagreement weight ranges. If the probability type is set to use agreement and disagreement weight ranges, the m-prob and u-prob columns in the matching rules section are ignored. Likewise, if the probability type is set to use m-probabilities and u-probabilities, the agreement-weight and disagreement-weight columns in the matching rules section are ignored. The default is to use agreement and disagreement weight ranges because they are more intuitive.
For more information about probabilities and weights, see Probabilities and Direct Weights.
The section after the first line of the match configuration file contains match field rows, with each row defining how a certain data type or field will be matched. These are the rules you specify in the match string you define for a master index application. The syntax for this section is:
match-field size null-field function m-prob u-prob agreement disagreement params data-sources
The following table describes each element in a match field row.
Table 1 Match Configuration File Columns
|
Match field comparison functions, or comparators, compare the values of a field in two records to determine whether the fields match. The fields are then assigned a matching weight based on the results of the comparison function. You can use several different types of comparison functions in the match configuration file to define how the Master Index Match Engine should match the fields in the match string. The Master Index Match Engine provides several options to use with each function. You can also define custom comparison functions. For more information, see Creating Custom Comparators for the Master Index Match Engine.
The following table summarizes each comparison function. A complete reference of the comparison functions and their parameters is included in Master Index Match Engine Comparison Functions and Options.
Note - The names of these comparison functions are configurable. The following table lists their default names.
Table 2 Comparison Function Summary
The comparator definition list defines each comparator that is included in a master index application. If a comparator is not included in this list, it cannot be used in the application. If you define a comparator in this list that is not provided with the Master Index Match Engine, you need to define the logic of the new comparator in Java classes (for more information, see Creating Custom Comparators for the Master Index Match Engine.
Below is an excerpt from the default comparators list file that defines two numeric comparators, Real Number Comparator and Integer Comparator. Both comparators take two parameters, and are dependent on a second comparator class named CondensedStringComparator.
<comparator description="Numerics comparator"> <className>NumericsComparator</className> <codes> <code description="Real Number Comparator" name="n[R, ]"/> <code description="Integer Comparator" name="nI" /> </codes> <params> <param description="distance/string comparison option" name="switch" type="java.lang.String"/> <param description="Spectrum of comparison" name="range" type="java.lang.Integer|java.lang.Double"/> </params> <data-sources/> <dependency-classes> <dependency-class matchfield="CSC" name="com.sun.mdm.matcher.comparators.base.CondensedStringComparator"/> </dependency-classes> <curve-adjust status="false"/> </comparator>
The comparators are defined in XML format. The following table lists and describes each element in the XML file.
Table 3 Comparator Definition List Elements
|