Identifier Preparation
The following identifiers are prepared for use in the individual and Entity matching process.
Note:
For Identifier preparation, Vessel and Aircraft are considered Entities.Table 5-1 Individual and Entity Identifier Preparation
Identifier Description | Standard Prepared Attribute Name | Summary of Preparation Logic |
---|---|---|
Individual Family Name | dnFamilyName | A normalized version of the family name (see the Name Normalization section). |
Individual Full Name | dnFullName | A concatenation of the given names and family name, separated using spaces. |
Original Script Name | dnOriginalScriptName | A whitespace normalized version of the original script name. |
dnCity | dnCity | A pipe-separated list of cities associated with the individual data. |
dnAddressCountr yCode | dnAddressCountryCode | A space separated list of standard 2- character country codes. |
dnEntityName | dnEntityName | The original entity name, after Name Normalization. |
Individual Given Names | dnGivenNames | A space-separated list of the first and middle names of the individual, after normalization (see the Name Normalization section). |
The following sections describe the data preparation strategy for each of these identifiers.
Name Normalization
- Standardization of accented characters.
- Replacement of non-alpha (A-Z or a-z) characters with spaces.
Note:
- If data is matched in the original language against original script names in the watch lists, then the appropriate character ranges must be removed from the Name Noise Characters Reference Data so that they are not replaced.
- If transliteration of data is done before matching, then transliteration must also be done before name normalization.
- Normalization of whitespace.
- Conversion to upper case.
Table 5-2 Name Normalization
Input Data | Identifiers | |||
---|---|---|---|---|
Forename | Surname | dnGivenNames | dnFamilyName | dnFullName |
Darwen | MANN`A | DARWEN | MANN A | DARWEN MANN A |
Badr bin Saud bin Harib | AL- BUSAIDI | BADR BIN SAUD BIN HARIB | AL BUSAIDI | BADR BIN SAUD BIN HARIB AL BUSAIDI |
A. Arnaldo G. | TAVEIRA | A ARNALDO G | TAVEIRA | A ARNALDO G TAVEIRA |
Jose Mardônio | DA COSTA** | JOSE MARDONIO | DA COSTA | JOSE MARDONIO DA COSTA |
Carmelo | Raschellà | CARMELO | RASCHELLA | CARMELO RASCHELLA |
City and Country Identifiers
City and country values are derived from the source data wherever possible. There may be multiple possible cities or countries associated with an individual, perhaps because an individual resides in more than one country, has dual nationality, or resides in a different country from their nationality.
Country values are prepared as a space-separated list of two-character country codes
in the dnAllCountryCodes
attribute.
City values (which may contain spaces, for example, ‘New York’) are prepared as a
pipe-separated list of cities in the dnCity
attribute.