The standardization files described in this section are common to all national domains. These files define special characters to remove from name fields and define hyphenated first names. A patterns file is also common, but is not currently used.
The hyphenated name category file defines first names that include hyphens (such as Anne-Marie) to help the Sun Match Engine recognize and process these values as first names. The file also classifies each name into a gender category. This file is used to standardize all domains except Australia, which uses the personFirstNameDashAU.dat file located in the Australia folder, and France, which uses the personFirstNameDashFR.dat file located in the France folder.
The hyphenated name category files use the following syntax:
name gender-class
You can modify or add entries in this table as needed. Table 8 describes the columns in the personFirstNameDash.dat file.
Table 8 Hyphenated Name Category File
Following is an excerpt from the personFirstNameDash.dat file.
ANNE-MARIE F JEAN-NOEL M JEAN-MARIE M JEAN-BAPTISTE M JEAN-PIERRE M JEAN-YVES M |
The person name patterns file is not currently used, but is designed to standardize free-form text name fields.
The special characters reference file lists characters that might appear in person data, but that should be ignored. The Sun Match Engine removes these characters from a field before making any comparisons or before normalizing data. You can define additional characters to remove from person data by simply adding the character to the list.
An excerpt from the personRemoveSpecChars.dat file appears below.
[ ] { } < > / ? * ^ # ! |