Once you determine the fields to use for matching, determine how the weights will be generated for each field. The primary tasks include determining whether to use probabilities or agreement weight ranges and then choosing the best comparison functions to use for each match field.
The first step in configuring the match configuration is to decide whether to use m-probabilities and u-probabilities or agreement and disagreement weight ranges. Both methods will give you similar results, but agreement and disagreement weight ranges allow you to specify the precise maximum and minimum weights that can be applied to each match field, giving you control over the value of the highest and lowest matching weights that can be assigned to each record.
For each field used for matching, define either the m-probabilities and u-probabilities or the agreement and disagreement weight ranges in the match configuration file. Review the information provided under Sun Match Engine Matching Weight Formulation to help determine how to configure these values. Remember that a higher m-probability or agreement weight gives the field a higher weight when field values agree.
In order to find the initial values to set for the match and duplicate thresholds, you must determine the total range of matching weights that can be assigned to a record. This weight is the sum of all weights assigned to each match field. Running the Bulk Matcher in match analysis mode can help you determine the match and duplicate thresholds. For more information about this tool, see Performing a Match Analysis in Loading the Initial Data Set for a Sun Master Index.
The way you determine weight ranges varies depending on whether you are using m and u-probabilities or agreement and disagreement weights.
For agreement and disagreement weight ranges, determining the match weight ranges is very straightforward. Simply total the maximum agreement weights for each field to determine the maximum match weight. Then total the minimum disagreement weights for each match field to determine the minimum match weight. Table 36 provides a sample agreement/disagreement configuration for matching on person data. As you can see, the range of match weights generated for a master index application with this configuration is from -36 to +38.
Table 36 Sample Agreement and Disagreement Weight Ranges
Field Name |
Maximum Agreement Weight |
Minimum Disagreement Weight |
---|---|---|
First Name |
8 |
-8 |
Last Name |
8 |
-8 |
Date of Birth |
7 |
-5 |
Gender |
5 |
-5 |
SSN |
10 |
-10 |
Maximum Match Weight |
38 | |
Minimum Match Weight |
-36 |
Determining the match weight ranges when using m-probabilities and u-probabilities is a little more complicated than using agreement and disagreement weights. To determine the maximum weight that will be generated for each field, use the following formula:
LOG2(m_prob/u_prob) |
To determine the minimum match weight that will be generated for each field, use the following formula:
LOG2((1-m_prob)/(1-u_prob)) |
Table 37 below illustrates a sample of m-probabilities and u-probabilities, including the corresponding agreement and disagreement weights that are generated with each combination of probabilities. As you can see, the range of match weights generated for a master index application with this configuration is from -35.93 to +38
Table 37 Sample m-probabilities and u-probabilities
Field Name |
m-probability |
u-probability |
Max Agreement Weight |
Min Disagreement Weight |
---|---|---|---|---|
First Name |
.996 |
.004 |
7.96 |
-7.96 |
Last Name |
.996 |
.004 |
7.96 |
-7.96 |
Date of Birth |
.97 |
.007 |
7.11 |
-5.04 |
Gender |
.97 |
.03 |
5.01 |
-5.01 |
SSN |
.999 |
.001 |
9.96 |
-9.96 |
Maximum Match Weight |
38 | |||
Minimum Match Weight |
-35.93 |
The match configuration file defines several match types for different types of fields. You can either modify existing rows in this file or create new rows that define custom matching logic. To determine which comparison functions to use, review the information provided in Match Configuration Comparison Functions for Sun Match Engine (Repository). Choose the comparison functions that best suit how you want the match fields to be processed.