Identifier Preparation

The following identifiers are prepared for use in the individual and Entity matching process.

Note:

For Identifier preparation, Vessel and Aircraft are considered Entities.

Table 5-1 Individual and Entity Identifier Preparation

Identifier Description	Standard Prepared Attribute Name	Summary of Preparation Logic
Individual Family Name	dnFamilyName	A normalized version of the family name (see the Name Normalization section).
Individual Full Name	dnFullName	A concatenation of the given names and family name, separated using spaces.
Original Script Name	dnOriginalScriptName	A whitespace normalized version of the original script name.
dnCity	dnCity	A pipe-separated list of cities associated with the individual data.
dnAddressCountr yCode	dnAddressCountryCode	A space separated list of standard 2- character country codes.
dnEntityName	dnEntityName	The original entity name, after Name Normalization.
Individual Given Names	dnGivenNames	A space-separated list of the first and middle names of the individual, after normalization (see the Name Normalization section).

The following sections describe the data preparation strategy for each of these identifiers.

Name Normalization

The individual, entity, vessel, and aircraft names are normalized using the following logic:

Standardization of accented characters.
Replacement of non-alpha (A-Z or a-z) characters with spaces.
Note:
- If data is matched in the original language against original script names in the watch lists, then the appropriate character ranges must be removed from the Name Noise Characters Reference Data so that they are not replaced.
- If transliteration of data is done before matching, then transliteration must also be done before name normalization.
Normalization of whitespace.
Conversion to upper case.

Note that the purpose of these transformations is not to create the most ‘correct’ name. For example, hyphens may be used in names in a number of ways, such as in a double-barreled surname, or as an alternative for a space when a surname has a qualifier (common in the World-Check data file).

In the former case, one might ideally want to preserve the hyphen, and in the latter case replace it with a space. In general, however, additional spaces in names will not cause names to mismatch, whereas different characters could. The following table provides some examples.

Table 5-2 Name Normalization

Input Data		Identifiers
Forename	Surname	dnGivenNames	dnFamilyName	dnFullName
Darwen	MANN`A	DARWEN	MANN A	DARWEN MANN A
Badr bin Saud bin Harib	AL- BUSAIDI	BADR BIN SAUD BIN HARIB	AL BUSAIDI	BADR BIN SAUD BIN HARIB AL BUSAIDI
A. Arnaldo G.	TAVEIRA	A ARNALDO G	TAVEIRA	A ARNALDO G TAVEIRA
Jose Mardônio	DA COSTA**	JOSE MARDONIO	DA COSTA	JOSE MARDONIO DA COSTA
Carmelo	Raschellà	CARMELO	RASCHELLA	CARMELO RASCHELLA

City and Country Identifiers

City and country values are derived from the source data wherever possible. There may be multiple possible cities or countries associated with an individual, perhaps because an individual resides in more than one country, has dual nationality, or resides in a different country from their nationality.

Country values are prepared as a space-separated list of two-character country codes in the dnAllCountryCodes attribute.

City values (which may contain spaces, for example, ‘New York’) are prepared as a pipe-separated list of cities in the dnCity attribute.