Understanding the Sun Match Engine

Sun Match Engine Common Standardization Files for Person Data

The standardization files described in this section are common to all national domains. These files define special characters to remove from name fields and define hyphenated first names. A patterns file is also common, but is not currently used.

The Hyphenated Name Category File (personFirstNameDash.dat)

The hyphenated name category file defines first names that include hyphens (such as Anne-Marie) to help the Sun Match Engine recognize and process these values as first names. The file also classifies each name into a gender category. This file is used to standardize all domains except Australia, which uses the personFirstNameDashAU.dat file located in the Australia folder, and France, which uses the personFirstNameDashFR.dat file located in the France folder.

The hyphenated name category files use the following syntax:

name gender-class

You can modify or add entries in this table as needed. Table 8 describes the columns in the personFirstNameDash.dat file.

Table 8 Hyphenated Name Category File

Column 

Description 

name 

A hyphenated first name.

gender-class 

An indicator of the gender with which the first name corresponds. The possible values are: 

  • N - The name is neutral, and can be applied to male or female first names.

  • F - The name is used for females.

  • M - The name is used for males.

Following is an excerpt from the personFirstNameDash.dat file.


ANNE-MARIE          F
JEAN-NOEL           M
JEAN-MARIE          M
JEAN-BAPTISTE       M
JEAN-PIERRE         M
JEAN-YVES           M

The Person Name Patterns File (personNamePatt.dat)

The person name patterns file is not currently used, but is designed to standardize free-form text name fields.

The Special Characters Reference File (personRemoveSpecChars.dat)

The special characters reference file lists characters that might appear in person data, but that should be ignored. The Sun Match Engine removes these characters from a field before making any comparisons or before normalizing data. You can define additional characters to remove from person data by simply adding the character to the list.

An excerpt from the personRemoveSpecChars.dat file appears below.


[
]
{
}
<
>
/
?
*
^
#
!