3.1 Identifier Preparation
The following identifiers are prepared for use in the entity matching process:
Table 3-1 Identifier Preparation
Identifier | Summary of preparation logic |
---|---|
Original Entity Name | The original entity name, after Name Normalization. See section 3.1.1 "Name Normalization" below. |
Standardized Entity Name | A standardized version of the entity name, with common entity name suffixes standardized. The standardization process may be amended by changing the Reference Data used to standardize tokens (such as LTD) and phrases (such as FIN SERVS). |
Original Script Name | A whitespace normalized version of the original script name. |
City | A pipe-separated list of cities. |
Country Codes | A space-separated list of standard 2-character country codes. |
Name Normalization
Entity names are normalized using the following logic:
- Standardization of accented characters.
- Removal of apostrophes.
- Replacement of all other characters apart from alpha (A-Z or a-z), numeric
(0-9) or ampersand (&) characters with spaces.
Note:
If matching data in the original language against original script names in watch lists, the appropriate character ranges should be removed from the Name Noise Characters Reference Data so that they are not replaced. In addition, if transliterating data before matching, transliteration must be done before the name normalization. - Normalization of whitespace.
- Conversion to upper case.