Identifier Preparation

The following identifiers are prepared for use in the individual and Entity matching process.

Note:

For Identifier preparation, Vessel and Aircraft are considered Entities.

Table 5-1 Individual and Entity Identifier Preparation

Identifier Description Standard Prepared Attribute Name Summary of Preparation Logic
Individual Family Name dnFamilyName A normalized version of the family name (see the Name Normalization section).
Individual Full Name dnFullName A concatenation of the given names and family name, separated using spaces.
Original Script Name dnOriginalScriptName A whitespace normalized version of the original script name.
dnCity dnCity A pipe-separated list of cities associated with the individual data.
dnAddressCountr yCode dnAddressCountryCode A space separated list of standard 2- character country codes.
dnEntityName dnEntityName The original entity name, after Name Normalization.
Individual Given Names dnGivenNames A space-separated list of the first and middle names of the individual, after normalization (see the Name Normalization section).

The following sections describe the data preparation strategy for each of these identifiers.

Name Normalization

The individual, entity, vessel, and aircraft names are normalized using the following logic:
  • Standardization of accented characters.
  • Replacement of non-alpha (A-Z or a-z) characters with spaces.

    Note:

    • If data is matched in the original language against original script names in the watch lists, then the appropriate character ranges must be removed from the Name Noise Characters Reference Data so that they are not replaced.
    • If transliteration of data is done before matching, then transliteration must also be done before name normalization.
  • Normalization of whitespace.
  • Conversion to upper case.
Note that the purpose of these transformations is not to create the most ‘correct’ name. For example, hyphens may be used in names in a number of ways, such as in a double-barreled surname, or as an alternative for a space when a surname has a qualifier (common in the World-Check data file).
In the former case, one might ideally want to preserve the hyphen, and in the latter case replace it with a space. In general, however, additional spaces in names will not cause names to mismatch, whereas different characters could. The following table provides some examples.

Table 5-2 Name Normalization

Input Data Identifiers
Forename Surname dnGivenNames dnFamilyName dnFullName
Darwen MANN`A DARWEN MANN A DARWEN MANN A
Badr bin Saud bin Harib AL- BUSAIDI BADR BIN SAUD BIN HARIB AL BUSAIDI BADR BIN SAUD BIN HARIB AL BUSAIDI
A. Arnaldo G. TAVEIRA A ARNALDO G TAVEIRA A ARNALDO G TAVEIRA
Jose Mardônio DA COSTA** JOSE MARDONIO DA COSTA JOSE MARDONIO DA COSTA
Carmelo Raschellà CARMELO RASCHELLA CARMELO RASCHELLA

City and Country Identifiers

City and country values are derived from the source data wherever possible. There may be multiple possible cities or countries associated with an individual, perhaps because an individual resides in more than one country, has dual nationality, or resides in a different country from their nationality.

Country values are prepared as a space-separated list of two-character country codes in the dnAllCountryCodes attribute.

City values (which may contain spaces, for example, ‘New York’) are prepared as a pipe-separated list of cities in the dnCity attribute.