The match configuration file, matchConfigFile.cfg, contains the matching logic for each field on which matching is performed. By default, this file defines the matching logic for the three primary data types (person names, business names, and addresses), and can also handle generic data types, such as dates, numbers, social security numbers, and characters.
The match configuration file defines matching logic for each field on which matching is performed. The Sun Match Engine provides several comparison functions that you can call in this file to fine-tune the match process. Comparison functions contain the logic to compare different types of data in very specific ways in order to arrive at a match weight for each field. These functions allow you to define how matching is performed for different data types and can be used in conjunction with either matching and unmatching probabilities or agreement and disagreement weight ranges for each field. This file also defines how to handle missing fields.
The following topics describe the format of the configuration file and provide an overview of the predefined comparison functions:
The match configuration file is divided into two sections. The first section consists of one line that indicates the matching probability type. The second section consists of the matching rules to use for each match field.
Following is an excerpt from the default match configuration file. This excerpt illustrates the components that are described in the following sections.
ProbabilityType 1 FirstName 15 0 uf 0.99 0.001 15 -5 LastName 15 0 ul 0.99 0.001 15 -5 String 25 0 ua 0.99 0.001 10 -5 DateDays 20 0 dD 0.99 0.001 10 -10 y 15 30 DateMonths 20 0 dM 0.99 0.001 10 -10 n DateHours 20 0 dH 0.99 0.001 10 -10 y 30 60 DateMinutes 20 0 dm 0.99 0.001 10 -10 y 300 600 DateSeconds 20 0 ds 0.99 0.001 10 -10 y 75 60 Numeric 15 0 n 0.99 0.001 10 -10 y 8 Integer 15 0 nI 0.99 0.001 10 -10 n Real 15 0 nR 0.99 0.001 10 -10 n Char 1 0 c 0.99 0.001 5 -5 pro 15 0 p 0.99 0.001 10 -10 20 5 5 |
The first line of the match configuration file defines the probability type to use for matching. Specify “0” (zero) to use m-probabilities and u-probabilities to determine a field’s match weight; specify “1” (one) to use agreement and disagreement weight ranges. If the probability type is set to use agreement and disagreement weight ranges, the m-prob and u-prob columns in the matching rules section are ignored. Likewise, if the probability type is set to use m-probabilities and u-probabilities, the agreement-weight and disagreement-weight columns in the matching rules section are ignored. The default is to use agreement and disagreement weight ranges because they are more intuitive.
The section after the first line of the match configuration file contains match field rows, with each row defining how a certain data type or field will be matched. The syntax for this section is:
match-type size null-field function m-prob u-prob agreement-weight disagreement-weight parameters
Table 1 describes each element in a match field row.
Table 1 Match Configuration File Columns
Column Number |
Column Name |
Description |
---|---|---|
1 |
match-type |
A value that indicates to the Sun Match Engine how each field should be weighted. Each field included in the match string (the MatchingConfig section of the Match Field file) must have a match type corresponding to a value in this column. |
2 |
size |
The number of characters in the field on which matching is performed, beginning with the first character. For example, to match on only the first four characters in a 10-digit field, the value of this column should be “4”. |
3 |
null-field |
An index that specifies how to calculate the total weight for null fields or fields that only contain spaces. You can specify any of the following values:
Note – In the above descriptions, the agreement and disagreement weights are either specified in this file or calculated using a logarithmic formula based on the m and u-probabilities (depending on the probability type). |
4 |
function |
The type of comparison to perform when weighting the field. For information about the available comparison functions, see Match Configuration Comparison Functions for Sun Match Engine (Repository). |
5 |
m-prob |
The initial probability that the specified field in two records will match if the records match. The probability is a double value between 0 and 1, and can have up to 16 decimal points. |
6 |
u-prob |
The initial probability that the specified field in two records will match if the records do not match. The probability is a double value between 0 and 1, and can have up to 16 decimal points. |
7 |
agreement-weight |
The matching weight to be assigned to a field given that the fields match between two records. This number can be between 0 and 100 and can have up to 16 decimal points. It represents the maximum match weight for a field. |
8 |
disagreement-weight |
The matching weight to be assigned to a field given that the fields do not match between two records. This number can be between 0 and -100 and can have up to 16 decimal points. It represents the minimum match weight for a field. |
9 |
parameters |
The parameters that correspond to the comparison function specified in column 4. Some comparison functions do not take any parameters and some take multiple parameters. For additional information about parameters, see Match Configuration Comparison Functions for Sun Match Engine (Repository). |
Match field comparison functions, or comparators, compare the values of a field in two records to determine whether the fields match. The fields are then assigned a matching weight based on the results of the comparison function. You can use several different types of comparison functions in the match configuration file to define how the Sun Match Engine should match the fields in the match string. The Sun Match Engine provides several options to use with each function.
The following table summarizes each comparison function. A complete reference of the comparison functions and their parameters is included in Match Configuration Comparison Functions for Sun Match Engine (Repository).
Table 2 Comparison Functions