Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Master Index Match Engine Reference Java CAPS Documentation |
Master Index Match Engine Reference
About the Master Index Match Engine
Master Index Match Engine Overview
Deterministic and Probabilistic Data Matching
Probabilities and Direct Weights
Matching and Unmatching Probabilities
Agreement and Disagreement Weight Ranges
How the Master Index Match Engine Works
Master Index Match Engine Structure
Master Index Match Engine Configuration Files
Master Index Match Engine Matching Weight Formulation
Master Index Match Engine Data Types
The Master Index Match Engine and the Master Index Standardization Engine
Oracle Java CAPS Master Index Standardization and Matching Process
Master Index Match Engine Matching Configuration
The Master Index Match Engine Match Configuration File
Master Index Match Engine Match Configuration File Format
Match Configuration File Sample
Master Index Match Engine Matching Comparison Functions At a Glance
Master Index Match Engine Comparator Definition List
Master Index Match Engine Comparison Functions
Advanced Bigram Comparator (b2)
Uncertainty String Comparators
Advanced Jaro String Comparator (u)
Winkler-Jaro String Comparator (ua)
Condensed String Comparator (us)
Advanced Jaro Adjusted for First Names (uf)
Advanced Jaro Adjusted for Last Names (ul)
Advanced Jaro Adjusted for House Numbers (un)
Advanced Jaro AlphaNumeric Comparator (ujs)
Unicode String Comparator (usu)
Unicode AlphaNumeric Comparator (usus)
Exact Character-to-Character Comparator (c)
Condensed AlphaNumeric SSN Comparator (nS)
Date Comparator With Years as Units (dY)
Date Comparator With Months as Units (dM)
Date Comparator With Days as Units (dD)
Date Comparator With Hours as Units (dH)
Date Comparator With Minutes as Units (dm)
Date Comparator With Seconds as Units (ds)
Creating Custom Comparators for the Master Index Match Engine
Step 1: Create the Custom Comparator Java Class
Step 2: Register the Comparator in the Comparators List
Step 3: Define Parameter Validations (Optional)
To Define Parameter Validations
Step 4: Define Data Source Handling (Optional)
To Define Data Source Handling
Step 5: Define Curve Adjustment or Linear Fitting (Optional)
To Define Curve Adjustment or Linear Fitting
Step 6: Compile and Package the Comparator
Step 7: Import the Comparator Package Into Oracle Java CAPS Master Index
To Import a Comparison Function
Step 8: Configure the Comparator in the Match Configuration File
Master Index Match Engine Configuration for Common Data Types
Master Index Match Engine Match String Fields
Person Data Match String Fields
Master Index Match Engine Match Types
Configuring the Match String for a Master Index Application
Configuring the Match String for Person Data
Configuring the Match String for Address Data
Configuring the Match String for Business Names
Fine-Tuning Weights and Thresholds for Oracle Java CAPS Master Index
Customizing the Match Configuration and Thresholds
Customizing the Match Configuration
Probabilities or Agreement Weights
Weight Ranges Using Agreement Weights
Weight Ranges Using Probabilities
Determining the Weight Thresholds
In a master index application, the match string processed by the Master Index Match Engine is defined by the match fields specified in mefa.xml, and the logic for how the fields are matched is defined in the match configuration file (matchConfigFile.cfg). The match engine can process any combination of fields you specify for matching using the predefined comparators or any new comparators you define. Not all fields in a record need to be processed by the Master Index Match Engine. Before you define the match string, analyze your data to determine the fields that are most likely to indicate a match or non-match between two records.
The following topics provide additional information about the match string for different data types:
By default, the match configuration file (matchConfigFile.cfg) includes rows specifically for matching on first name, last name, social security numbers, and dates (such as a date of birth). It also includes a row for matching a single character with logic specialized for a gender field. You can use any of the existing rows for matching or you can add rows for the fields you want to match. When matching on person names, determine whether you want to use the original field values, the normalized field values, or the phonetic values. The match engine can handle any of these types of fields, but the best comparator for each type might be different. Also determine how much weight you want to give each field type and configure the match configuration file accordingly.
By default, the match configuration file (matchConfigFile.cfg) includes rows specifically for matching on the fields that are parsed from the street address fields, such as the street number, street direction, and so on. The file also defines several generic match types you can configure for address fields. You can use any of the existing rows for matching or you can add rows for the fields you want to match. If you specify an “Address” match type for any field in the Master Index Wizard, the default fields that store the parsed data are automatically added to the match string in mefa.xml. These fields include the house number, street direction, street type, and street name. You can remove any of these fields from the match string.
When matching on address fields, determine whether you want to use the original field values, the standardized field values, or the phonetic values. The match engine can handle any of these types of fields, but the best comparator for each type might be different. Also determine how much weight you want to give each field type and configure the match configuration file accordingly.
By default, the match configuration file (matchConfigFile.cfg) includes rows specifically for matching on the fields that are parsed from the business name fields. The file also defines several generic match types you can customize to use with business name fields. You can use any of the existing rows for matching or you can add rows for the fields you want to match. If you specify a “BusinessName” match type for any field in the wizard, most of the parsed business name fields are automatically added to the match string in mefa.xml, including the name, organization type, association type, sector, industry, and URL. You can remove any of these fields from the match string.
When matching on business name fields, determine whether you want to use the original field values, the standardized field values, or the phonetic values. The match engine can handle any of these types of fields, but the best comparator for each type might be different. Also determine how much weight you want to give each field type and configure the match configuration file accordingly.