JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Java CAPS Master Index Match Engine Reference     Java CAPS Documentation
search filter icon
search icon

Document Information

Master Index Match Engine Reference

About the Master Index Match Engine

Related Topics

Master Index Match Engine Overview

Data Matching Concepts

Deterministic and Probabilistic Data Matching

Weighting Thresholds

Probabilities and Direct Weights

Matching and Unmatching Probabilities

Agreement and Disagreement Weight Ranges

How the Master Index Match Engine Works

Master Index Match Engine Structure

Master Index Match Engine Configuration Files

Master Index Match Engine Matching Weight Formulation

Master Index Match Engine Data Types

The Master Index Match Engine and the Master Index Standardization Engine

Oracle Java CAPS Master Index Standardization and Matching Process

Master Index Match Engine Matching Configuration

The Master Index Match Engine Match Configuration File

Master Index Match Engine Match Configuration File Format

Match Configuration File Sample

Probability Type Section

Matching Rules Section

Master Index Match Engine Matching Comparison Functions At a Glance

Master Index Match Engine Comparator Definition List

Master Index Match Engine Comparison Functions

Bigram Comparators

Bigram Comparator (b1)

Advanced Bigram Comparator (b2)

Uncertainty String Comparators

Advanced Jaro String Comparator (u)

Winkler-Jaro String Comparator (ua)

Condensed String Comparator (us)

Advanced Jaro Adjusted for First Names (uf)

Advanced Jaro Adjusted for Last Names (ul)

Advanced Jaro Adjusted for House Numbers (un)

Advanced Jaro AlphaNumeric Comparator (ujs)

Unicode String Comparator (usu)

Unicode AlphaNumeric Comparator (usus)

Exact Character-to-Character Comparator (c)

Numeric Comparators

Integer Comparator (nI)

Real Number Comparator (nR)

Condensed AlphaNumeric SSN Comparator (nS)

Date Comparators

Date Comparator With Years as Units (dY)

Date Comparator With Months as Units (dM)

Date Comparator With Days as Units (dD)

Date Comparator With Hours as Units (dH)

Date Comparator With Minutes as Units (dm)

Date Comparator With Seconds as Units (ds)

Prorated Comparator (p)

Creating Custom Comparators for the Master Index Match Engine

Custom Comparator Overview

About the Comparator Package

Defining Custom Comparators

Before You Begin

Step 1: Create the Custom Comparator Java Class

initialize

Description

Syntax

Parameters

Return Value

Throws

compareFields

Description

Syntax

Parameters

Return Value

Throws

setRTParameters

Description

Syntax

Parameters

Return Value

Throws

stop

Description

Syntax

Parameters

Return Value

Throws

Step 2: Register the Comparator in the Comparators List

To Register the Comparators

Step 3: Define Parameter Validations (Optional)

To Define Parameter Validations

validateComparatorsParameters

Description

Syntax

Parameters

Return Value

Throws

Step 4: Define Data Source Handling (Optional)

To Define Data Source Handling

handleComparatorsDataSources

Description

Syntax

Parameters

Return Value

Throws

DataSourcesProperties Class

getDataSourcesList

Description

Syntax

Parameters

Return Value

Throws

isDataSourceLoaded

Description

Syntax

Parameters

Return Value

Throws

setDataSourceLoaded

Description

Syntax

Parameters

Return Value

Throws

getDataSourceObject

Description

Syntax

Parameters

Return Value

Throws

Step 5: Define Curve Adjustment or Linear Fitting (Optional)

To Define Curve Adjustment or Linear Fitting

processCurveAdjustment

Description

Syntax

Parameters

Return Value

Throws

Step 6: Compile and Package the Comparator

Step 7: Import the Comparator Package Into Oracle Java CAPS Master Index

To Import a Comparison Function

Step 8: Configure the Comparator in the Match Configuration File

Master Index Match Engine Configuration for Common Data Types

The Master Index Match String

Master Index Match Engine Match String Fields

Person Data Match String Fields

Address Data Match String Fields

Business Name Match String Fields

Master Index Match Engine Match Types

Configuring the Match String for a Master Index Application

Configuring the Match String for Person Data

Configuring the Match String for Address Data

Configuring the Match String for Business Names

Fine-Tuning Weights and Thresholds for Oracle Java CAPS Master Index

Data Analysis Overview

Customizing the Match Configuration and Thresholds

Determining the Match Fields

Customizing the Match Configuration

Probabilities or Agreement Weights

Defining Relative Value

Determining the Weight Range

Weight Ranges Using Agreement Weights

Weight Ranges Using Probabilities

Comparison Functions

Determining the Weight Thresholds

Specifying the Weight Thresholds

Weight Distribution Method

Percentage Method

Fine-tuning the Thresholds

How the Master Index Match Engine Works

The Master Index Match Engine compares records containing similar data types by calculating how closely certain fields in the records match. The resulting comparison weight is either a positive or negative numeric value that represents the degree to which the two sets of data are similar. The match engine relies on probabilistic algorithms to compare data of a given type using a comparison function specific to the type of data being compared. The comparison functions for each matching field are defined in a match configuration file that you can customize for the type of data you are indexing. You can also define custom comparison functions to plug in to the match engine. The formula used to determine the matching weight is based on either matching and unmatching probabilities or on agreement and disagreement weight ranges (described in Probabilities and Direct Weights).

The following topics provide additional information about how the Master Index Match Engine works:

Master Index Match Engine Structure

The Master Index Match Engine was designed to be very flexible and generic, allowing you to customize existing matching rules and to define additional rules using Java. The match engine framework allows you to create and plug in custom matching comparison functions, or comparators, to the match engine to enable matching against any type of data. The Master Index Match Engine framework includes two main modules. The real-time module stores the predefined and user–defined Java classes that define the matching comparator logic. The design-time modules stores the configuration and validation classes for the comparators.

The Master Index Match Engine provides a wide variety of customizable comparators for you to choose from. You can also create comparators in the real-time module, and create new validation and configuration rules in the design-time module. The structure of the design-time module supports validations, weighting curves, and class dependencies. There is also an option that allows you load information from a data file and use that information to calculate a matching weight.

Master Index Match Engine Configuration Files

The Master Index Match Engine compares two records and returns a match weight indicating the likelihood of a match between the two records based on information provided in configuration files. In a master index application, the match engine is configured by these two files in the Match Engine node of the master index project: the matching configuration file (matchConfigFile.cfg) and the comparators list (comparatorsList.xml). The matching configuration file defines the configuration and parameters for the matching comparator functions and the comparators list defines each comparator available to the match engine.

Matching criteria and logic are defined in the match configuration file in the master index project (matchConfigFile.cfg). The data fields that are sent to the Master Index Match Engine for matching, known as the match string, are defined in the MatchingConfig section of mefa.xml in the master index project. The match engine configuration files define which matching rules to use to process each match field. The match engine provides a comprehensive set of comparator functions, and you can create custom comparators if needed.

Master Index Match Engine Matching Weight Formulation

The Master Index Match Engine determines the matching weight between two records by comparing the match string fields between the two records using the rules defined in the match configuration file and taking into account the matching logic specified for each field. The Master Index Match Engine can use either matching (m) and unmatching (u) conditional probabilities or agreement and disagreement weight ranges to fine-tune the match process. It uses the underlying algorithm to arrive at a match weight for each match string field. The weight generated for each field in the match string indicates the level of match between each field. The weights assigned to each field are then summed together for a total, composite matching weight between the two records. Agreement and disagreement weight ranges or m-probabilities and u-probabilities are defined in the match configuration file.

m-probabilities and u-probabilities are expressed as double values between one and zero (excluding one and zero) and can have up to 16 decimal points. Agreement and disagreement weights are expressed as double values and can have up to 16 decimal points. When using agreement and disagreement weights, the Master Index Match Engine assigns a matching weight to each field that falls between the agreement and disagreement weights specified for the field. Thus, the maximum agreement weight between two records is the sum of the defined agreement weights for each field. The minimum disagreement weight is the sum of the defined disagreement weights for each field. For more information about weight calculation, see Determining the Weight Range.

Master Index Match Engine Data Types

The Master Index Match Engine is built on a flexible framework that allows you to customize and create matching rules for various types of data. The match engine provides an extensive set of comparison functions for matching on various types of fields, such as numbers, dates, single characters, and so on. The match engine also provides more specialized comparison functions for searching on specific types of data, such as person names, address fields, social security numbers, genders. You can define custom comparison functions and custom standardization logic for different data types or variants on data types. These customizations are easily incorporated into a master index application, allowing you to completely customize the match and standardization process for your specific data format.

The Master Index Match Engine and the Master Index Standardization Engine

The Master Index Match Engine works with the Master Index Standardization Engine to provide an accurate comparison of two records. The standardization engine reads input data and determines how to parse, normalize, and phonetically encode the data in order to create a standard set of values to use for match comparison. The standardization engine can standardize free-form text fields, such as street address fields or business names, and separate them into their individual parts, such as house numbers, street names, and so on, allowing the match engine to generate a more accurate weight for free-form data.