2.1 Identifier preparation
Table 2-1 Identifier preparation
Identifier Description | Standard prepared attribute name | Summary of preparation logic |
---|---|---|
Given Names | dnGivenNames | A space-separated list of the first and middle names of the individual, after normalization (see the name normalization section, below). |
Family Name | dnFamilyName | A normalized version of the family name (see the name normalization section, below). |
Full Name | dnFullName | A concatenation of the given names and family name separated using spaces. |
Original Script Name | dnOriginalScriptName | A whitespace normalized version of the original script name. |
City | dnCity | A pipe-separated list of cities associated with the individual data. |
Country Code | A space separated, duplicated and sorted superset of all country codes provided in dnAddressCountryCode, dnResidencyCountryCode, dnNationalityCountryCodes and dnCountryOfBirthCode. | A space-separated list of standard 2-character country codes. |
Dateof Birth | dnDOB | A date attribute containing the date of birth of the individual. |
Year of Birth | dnYOB | A string attribute containing a space-separated list of possible years of birth, in a four-digit format. |
The following sections describe the data preparation strategy for each of these identifiers.
Name Normalization
dnGivenNames
,
dnFamilyName
and dnFullName
. In all these
fields, the following transformations are applied before matching:
- Standardization of accented characters.
- Replacement of non-alpha (A-Z or a-z) characters with spaces.
Note:
If matching data in the original language against original script names in watch lists, the appropriate character ranges should be removed from the Name Noise Characters Reference Data so that they are not replaced. If transliterating data before matching, transliteration must be done before the name normalization. - Normalization of whitespace
- Conversion to upper case
The purpose of these transformations is not to create the most ‘correct’ name. For example, hyphens may be used in names in a number of ways, such as in a double-barreled surname, or as an alternative for a space when a surname has a qualifier (common in the World-Check data file).
In the former case, one might ideally want to preserve the hyphen, and in the latter case replace it with a space. In general, however, additional spaces in names will not cause names to miss matching, whereas different characters could.
Table 2-2 Input data and Identifiers
Input data- Fore- name | Input data- Sur- name | Identifiers- dnGivenNames | Identifiers- dnFamilyName | Identifiers- dnFullName |
---|---|---|---|---|
Carmelo | Raschellà | CARMELO | RASCHELLA | CARMELO RASCHELLA |
Darwen | MANN`A | DARWEN | MANNA | DARWEN MANN A |
Badrbin Saud bin Harib | AL-BUSAIDI | BADRBIN SAUD BIN
HARIB |
ALBUSAIDI | BADRBIN SAUD BIN HARIB AL BUSAIDI |
A.Arnaldo G. | TAVEIRA | A ARNALDOG | TAVEIRA | AARNALDO G TAVEIRA |
JoseMardônio | DACOSTA** | JOSE MARDONIO | DA COSTA | JOSE MARDONIO DA COSTA |
City and country identifiers
City and country values are derived from the source data wherever possible. There may be multiple possible cities or countries associated with an individual, perhaps because an individual resides in more than one country, has dual nationality, or resides in a different country from his/her nationality.
Country values are prepared as a space-separated list of two-character country codes
in the dnAllCountryCodes
attribute.
City values (which may contain spaces, for example, ‘New York’) are prepared as a
pipe separated list of cities in the dnCity
attribute.
Date of birth and Year of birth identifiers
A formal Date attribute holds the date of birth, where known. The year of birth is stored as a string and is either derived from the date of birth or may be derived from other data. The year of birth may include several possible years. This is most likely to occur when a reference source lists the age of individuals as of a given date, which may lead to two possible years of birth.
For example, if an individual is listed as 27 years old on 01/05/2007, the year of birth could either be 1980 (if born before 1st May) or 1979 (if born after 1st May). In this case, both possible years are derived and added to a list of possible years of birth. The year of birth comparison in matching looks for a common year of birth between the two records being compared.