2.1 Identifier preparation

The following identifiers are prepared for use in the individual matching process.

Table 2-1 Identifier preparation

Identifier Description Standard prepared attribute name Summary of preparation logic
Given Names dnGivenNames A space-separated list of the first and middle names of the individual, after normalization (see the name normalization section, below).
Family Name dnFamilyName A normalized version of the family name (see the name normalization section, below).
Full Name dnFullName A concatenation of the given names and family name separated using spaces.
Original Script Name dnOriginalScriptName A whitespace normalized version of the original script name.
City dnCity A pipe-separated list of cities associated with the individual data.
Country Code A space separated, duplicated and sorted superset of all country codes provided in dnAddressCountryCode, dnResidencyCountryCode, dnNationalityCountryCodes and dnCountryOfBirthCode. A space-separated list of standard 2-character country codes.
Dateof Birth dnDOB A date attribute containing the date of birth of the individual.
Year of Birth dnYOB A string attribute containing a space-separated list of possible years of birth, in a four-digit format.

The following sections describe the data preparation strategy for each of these identifiers.

Name Normalization

The name identifiers map to the prepared attributes dnGivenNames, dnFamilyName and dnFullName. In all these fields, the following transformations are applied before matching:
  • Standardization of accented characters.
  • Replacement of non-alpha (A-Z or a-z) characters with spaces.

    Note:

    If matching data in the original language against original script names in watch lists, the appropriate character ranges should be removed from the Name Noise Characters Reference Data so that they are not replaced. If transliterating data before matching, transliteration must be done before the name normalization.
  • Normalization of whitespace
  • Conversion to upper case

The purpose of these transformations is not to create the most ‘correct’ name. For example, hyphens may be used in names in a number of ways, such as in a double-barreled surname, or as an alternative for a space when a surname has a qualifier (common in the World-Check data file).

In the former case, one might ideally want to preserve the hyphen, and in the latter case replace it with a space. In general, however, additional spaces in names will not cause names to miss matching, whereas different characters could.

Examples

Table 2-2 Input data and Identifiers

Input data- Fore- name Input data- Sur- name Identifiers- dnGivenNames Identifiers- dnFamilyName Identifiers- dnFullName
Carmelo Raschellà CARMELO RASCHELLA CARMELO RASCHELLA
Darwen MANN`A DARWEN MANNA DARWEN MANN A
Badrbin Saud bin Harib AL-BUSAIDI BADRBIN SAUD BIN

HARIB

ALBUSAIDI BADRBIN SAUD BIN HARIB AL BUSAIDI
A.Arnaldo G. TAVEIRA A ARNALDOG TAVEIRA AARNALDO G TAVEIRA
JoseMardônio DACOSTA** JOSE MARDONIO DA COSTA JOSE MARDONIO DA COSTA

City and country identifiers

City and country values are derived from the source data wherever possible. There may be multiple possible cities or countries associated with an individual, perhaps because an individual resides in more than one country, has dual nationality, or resides in a different country from his/her nationality.

Country values are prepared as a space-separated list of two-character country codes in the dnAllCountryCodes attribute.

City values (which may contain spaces, for example, ‘New York’) are prepared as a pipe separated list of cities in the dnCity attribute.

Date of birth and Year of birth identifiers

A formal Date attribute holds the date of birth, where known. The year of birth is stored as a string and is either derived from the date of birth or may be derived from other data. The year of birth may include several possible years. This is most likely to occur when a reference source lists the age of individuals as of a given date, which may lead to two possible years of birth.

For example, if an individual is listed as 27 years old on 01/05/2007, the year of birth could either be 1980 (if born before 1st May) or 1979 (if born after 1st May). In this case, both possible years are derived and added to a list of possible years of birth. The year of birth comparison in matching looks for a common year of birth between the two records being compared.