Identifier Preparation

3.1 Identifier Preparation

The following identifiers are prepared for use in the entity matching process:

Table 3-1 Identifier Preparation

Identifier	Summary of preparation logic
Original Entity Name	The original entity name, after Name Normalization. See section 3.1.1 "Name Normalization" below.
Standardized Entity Name	A standardized version of the entity name, with common entity name suffixes standardized. The standardization process may be amended by changing the Reference Data used to standardize tokens (such as LTD) and phrases (such as FIN SERVS).
Original Script Name	A whitespace normalized version of the original script name.
City	A pipe-separated list of cities.
Country Codes	A space-separated list of standard 2-character country codes.

Name Normalization

Entity names are normalized using the following logic:

Standardization of accented characters.
Removal of apostrophes.
Replacement of all other characters apart from alpha (A-Z or a-z), numeric (0-9) or ampersand (&) characters with spaces.

Note:
If matching data in the original language against original script names in watch lists, the appropriate character ranges should be removed from the Name Noise Characters Reference Data so that they are not replaced. In addition, if transliterating data before matching, transliteration must be done before the name normalization.
Normalization of whitespace.
Conversion to upper case.