Once you determine the fields to use for matching, determine how the weights will be generated for each field. The primary tasks include determining whether to use probabilities or agreement weight ranges and then choosing the best comparison functions to use for each match field.
The first step in configuring the match configuration is to decide whether to use m-probabilities and u-probabilities or agreement and disagreement weight ranges. Both methods will give you similar results, but agreement and disagreement weight ranges allow you to specify the precise maximum and minimum weights that can be applied to each match field, giving you control over the value of the highest and lowest matching weights that can be assigned to each record.
For each field used for matching, define either the m-probabilities and u-probabilities or the agreement and disagreement weight ranges in the match configuration file. Review the information provided under Master Index Match Engine Matching Weight Formulation to help determine how to configure these values. Remember that a higher m-probability or agreement weight gives the field a higher weight when field values agree.
In order to find the initial values to set for the match and duplicate thresholds, you must determine the total range of matching weights that can be assigned to a record. This weight is the sum of all weights assigned to each match field. Using the data analysis tool provided can help you determine the match and duplicate thresholds.
For agreement and disagreement weight ranges, determining the match weight ranges is very straightforward. Simply total the maximum agreement weights for each field to determine the maximum match weight. Then total the minimum disagreement weights for each match field to determine the minimum match weight. The following table provides a sample agreement/disagreement configuration for matching on person data. As you can see, the range of match weights generated for a master index application with this configuration is from -36 to +38.
Table 16 Sample Agreement and Disagreement Weight Ranges
Field Name |
Maximum Agreement Weight |
Minimum Disagreement Weight |
---|---|---|
First Name |
8 |
-8 |
Last Name |
8 |
-8 |
Date of Birth |
7 |
-5 |
Gender |
5 |
-5 |
SSN |
10 |
-10 |
Maximum Match Weight |
38 | |
Minimum Match Weight |
-36 |
Determining the match weight ranges when using m-probabilities and u-probabilities is a little more complicated than using agreement and disagreement weights. To determine the maximum weight that will be generated for each field, use the following formula:
LOG2(m_prob/u_prob) |
To determine the minimum match weight that will be generated for each field, use the following formula:
LOG2((1-m_prob)/(1-u_prob)) |
The following table illustrates m-probabilities and u-probabilities, including the corresponding agreement and disagreement weights that are generated with each combination of probabilities. As you can see, the range of match weights generated for a master index application with this configuration is from -35.93 to +38
Table 17 Sample m-probabilities and u-probabilities
Field Name |
m-probability |
u-probability |
Max Agreement Weight |
Min Disagreement Weight |
---|---|---|---|---|
First Name |
.996 |
.004 |
7.96 |
-7.96 |
Last Name |
.996 |
.004 |
7.96 |
-7.96 |
Date of Birth |
.97 |
.007 |
7.11 |
-5.04 |
Gender |
.97 |
.03 |
5.01 |
-5.01 |
SSN |
.999 |
.001 |
9.96 |
-9.96 |
Maximum Match Weight |
38 | |||
Minimum Match Weight |
-35.93 |
The match configuration file defines several match types for different types of fields. You can either modify existing rows in this file or create new rows that define custom matching logic. To determine which comparison functions to use, review the information provided in Master Index Match Engine Comparison Functions. Choose the comparison functions that best suit how you want the match fields to be processed.