Understanding the Sun Match Engine

Sun Match Engine Matching Comparison Functions

Match field comparison functions, or comparators, compare the values of a field in two records to determine whether the fields match. The fields are then assigned a matching weight based on the results of the comparison function. You can use several different types of comparison functions in the match configuration file to define how the Sun Match Engine should match the fields in the match string. The Sun Match Engine provides several options to use with each function.

The following table summarizes each comparison function. A complete reference of the comparison functions and their parameters is included in Match Configuration Comparison Functions for Sun Match Engine (Repository).

Table 2 Comparison Functions

Comparison Function 

Name 

Description 

b1 

Bigram String Comparator 

Based on the Bigram algorithm, this function compares two strings using all combinations of two consecutive characters and returns the total number of combinations that are the same.

b2 

Advanced Bigram String Comparator 

Similar to the standard Bigram comparison function (b1), but allows for character transpositions.

Generic String Comparator 

Based on the Jaro algorithm, this function compares two strings taking into account uncertainty factors, such as string length, transpositions, and characters in common.

ua 

Advanced Generic String Comparator 

Based on the Jaro algorithm with variants of Winkler/Lynch and McLaughlin, this function is similar to the generic string comparator (u), but increases the agreement weight if the initial characters of each string are exact matches. This comparison function takes into account key punch and visual memory errors.

uf 

Simplified String Comparator - FirstName 

Based on the generic string comparator (u), this function is designed to specifically weight first name values. The string is analyzed and the weight adjusted based on statistical data. 

ul 

Simplified String Comparator - LastName 

Based on the generic string comparator (u), this function is designed to specifically weight last name values. The string is analyzed and the weight adjusted based on statistical data.

un 

Simplified String Comparator - House Numbers 

Based on the generic string comparator (u), this function is designed to specifically weight house number values. The string is analyzed and the weight adjusted based on statistical data.

us 

Simplified String Comparator 

A custom string comparator that compares two strings taking into account such uncertainty factors as string length, transpositions, key punch errors, and visual memory errors. Unlike the generic string comparator (“u”), this function handles diacritical marks. This function also improves processing speed.

usu 

Language-specific String Comparator 

A custom string comparator similar to the “us” comparator with the exception that it is based in Unicode to support multiple languages and alphabets. This comparator takes one parameter indicating the language to use.

Exact char-by-char Comparator 

Compares string fields character by character. Each character must match in order for an agreement weight to be assigned.

Generic Number Comparator 

Compares numeric fields using a relative distance value to determine the match weight. As the difference between the two fields increases, the match weight decreases. Once the difference is beyond the relative distance, a disagreement weight is assigned. This comparator takes two parameters; the first indicates whether to use a relative distance or direct string comparison, and the second indicates the relative distance to use.

nI 

Integer Comparator 

Compares integer fields using a relative distance comparison. This comparison function is based on the generic number comparator (n), and accepts the same parameters.

nR 

Real Number Comparator 

Compares fields containing real numbers using a relative distance comparison. This comparison function is based on the generic number comparator (n), and accepts the same parameters.

nS 

Alphanumeric Comparator 

Compares social security numbers or other unique identifiers, taking into account any of these parameters:

  • Field length

  • Character types

  • Invalid values

dY 

Date Comparator - Year only 

Compares year values using relative distance values prior to and following the given year to determine the match weight. As the difference between the two fields increases, the match weight decreases. Once the difference is beyond the relative distance, a disagreement weight is assigned. The date comparison functions handle Gregorian years. This comparator takes up to three parameters; the first indicates whether to use a relative distance or direct string comparison, and the second and third indicate the relative distance before and after.

dM 

Date Comparator - Month-Year 

Compares the month and year using a relative distance as described above for the year comparison function (dY).

dD 

Date Comparator - Day-Month-Year 

Compares the day, month, and year using a relative distance as described above for the year comparison function (dY).

dH 

Date Comparator - Hour-Day-Month-Year 

Compares the hour, day, month, and year using a relative distance as described above for the year comparison function (dY).

dm 

Date Comparator - Min-Hour-Day- Month-Year 

Compares the minute, hour, day, month, and year using a relative distance as described above for the year comparison function (dY).

ds 

Date Comparator - Sec-Min-Hour-Day- Month-Year 

Compares the second, minute, hour, day, month, and year using a relative distance as described above for the year comparison function (dY).

Prorated Comparator 

Prorates the disagreement weight for a date or numeric field based on values you specify. Differences greater than the amount you specify receive the full disagreement weight. This comparator takes three parameters indicating the relative distance and the agreement and disagreement ranges.