Understanding the Master Index Match Engine

Determining the Weight Range

In order to find the initial values to set for the match and duplicate thresholds, you must determine the total range of matching weights that can be assigned to a record. This weight is the sum of all weights assigned to each match field. Using the data analysis tool provided can help you determine the match and duplicate thresholds.

Weight Ranges Using Agreement Weights

For agreement and disagreement weight ranges, determining the match weight ranges is very straightforward. Simply total the maximum agreement weights for each field to determine the maximum match weight. Then total the minimum disagreement weights for each match field to determine the minimum match weight. The following table provides a sample agreement/disagreement configuration for matching on person data. As you can see, the range of match weights generated for a master index application with this configuration is from -36 to +38.

Table 16 Sample Agreement and Disagreement Weight Ranges

Field Name 

Maximum Agreement Weight 

Minimum Disagreement Weight 

First Name 

-8 

Last Name 

-8 

Date of Birth 

-5 

Gender 

-5 

SSN 

10 

-10 

Maximum Match Weight

38 

 

Minimum Match Weight

 

-36 

Weight Ranges Using Probabilities

Determining the match weight ranges when using m-probabilities and u-probabilities is a little more complicated than using agreement and disagreement weights. To determine the maximum weight that will be generated for each field, use the following formula:


LOG2(m_prob/u_prob)

To determine the minimum match weight that will be generated for each field, use the following formula:


LOG2((1-m_prob)/(1-u_prob))

The following table illustrates m-probabilities and u-probabilities, including the corresponding agreement and disagreement weights that are generated with each combination of probabilities. As you can see, the range of match weights generated for a master index application with this configuration is from -35.93 to +38

Table 17 Sample m-probabilities and u-probabilities

Field Name 

m-probability 

u-probability 

Max Agreement Weight 

Min Disagreement Weight 

First Name 

.996 

.004 

7.96 

-7.96 

Last Name 

.996 

.004 

7.96 

-7.96 

Date of Birth 

.97 

.007 

7.11 

-5.04 

Gender 

.97 

.03 

5.01 

-5.01 

SSN 

.999 

.001 

9.96 

-9.96 

Maximum Match Weight

   

38 

 

Minimum Match Weight

     

-35.93