Master Index Match Engine Reference

Customizing the Match Configuration

Once you determine the fields to use for matching, determine how the weights will be generated for each field. The primary tasks include determining whether to use probabilities or agreement weight ranges and then choosing the best comparison functions to use for each match field.

Probabilities or Agreement Weights

The first step in configuring the match configuration is to decide whether to use m-probabilities and u-probabilities or agreement and disagreement weight ranges. Both methods will give you similar results, but agreement and disagreement weight ranges allow you to specify the precise maximum and minimum weights that can be applied to each match field, giving you control over the value of the highest and lowest matching weights that can be assigned to each record.

Defining Relative Value

For each field used for matching, define either the m-probabilities and u-probabilities or the agreement and disagreement weight ranges in the match configuration file. Review the information provided under Master Index Match Engine Matching Weight Formulation to help determine how to configure these values. Remember that a higher m-probability or agreement weight gives the field a higher weight when field values agree.

Determining the Weight Range

In order to find the initial values to set for the match and duplicate thresholds, you must determine the total range of matching weights that can be assigned to a record. This weight is the sum of all weights assigned to each match field. Using the data analysis tool provided can help you determine the match and duplicate thresholds.

Weight Ranges Using Agreement Weights

For agreement and disagreement weight ranges, determining the match weight ranges is very straightforward. Simply total the maximum agreement weights for each field to determine the maximum match weight. Then total the minimum disagreement weights for each match field to determine the minimum match weight. The following table provides a sample agreement/disagreement configuration for matching on person data. As you can see, the range of match weights generated for a master index application with this configuration is from -36 to +38.

Table 16 Sample Agreement and Disagreement Weight Ranges

Field Name 

Maximum Agreement Weight 

Minimum Disagreement Weight 

First Name 

-8 

Last Name 

-8 

Date of Birth 

-5 

Gender 

-5 

SSN 

10 

-10 

Maximum Match Weight

38 

 

Minimum Match Weight

 

-36 

Weight Ranges Using Probabilities

Determining the match weight ranges when using m-probabilities and u-probabilities is a little more complicated than using agreement and disagreement weights. To determine the maximum weight that will be generated for each field, use the following formula:


LOG2(m_prob/u_prob)

To determine the minimum match weight that will be generated for each field, use the following formula:


LOG2((1-m_prob)/(1-u_prob))

The following table illustrates m-probabilities and u-probabilities, including the corresponding agreement and disagreement weights that are generated with each combination of probabilities. As you can see, the range of match weights generated for a master index application with this configuration is from -35.93 to +38

Table 17 Sample m-probabilities and u-probabilities

Field Name 

m-probability 

u-probability 

Max Agreement Weight 

Min Disagreement Weight 

First Name 

.996 

.004 

7.96 

-7.96 

Last Name 

.996 

.004 

7.96 

-7.96 

Date of Birth 

.97 

.007 

7.11 

-5.04 

Gender 

.97 

.03 

5.01 

-5.01 

SSN 

.999 

.001 

9.96 

-9.96 

Maximum Match Weight

   

38 

 

Minimum Match Weight

     

-35.93 

Comparison Functions

The match configuration file defines several match types for different types of fields. You can either modify existing rows in this file or create new rows that define custom matching logic. To determine which comparison functions to use, review the information provided in Master Index Match Engine Comparison Functions. Choose the comparison functions that best suit how you want the match fields to be processed.