Entity Start End Name Tokens (dnClusterStartEndNameTokens)
This clustering method is designed as a looser version of the Entity Name Tokens cluster and allows for variation in entity names by creating clusters for the first five and last five characters of each name token.
The default logic is as follows:
- Remove initials.
- Remove common name tokens, such as Limited, or Corporation.
- Normalize whitespace.
- For each token that is longer than five characters, replace with two new tokens that
are:
- The first five characters of the token
- The last five characters of the token
The following table provides some examples.
Table 5-14 Start/End Name Tokens Cluster
dnEntityName | Name with Initials and Common Name Tokens Stripped | dnClusterStartEndNameTokens |
---|---|---|
HAVANA INTERNATIONAL BANK LTD | HAVANA INTERNATIONAL BANK | HAVAN| AVANA| INTER| IONAL| B ANK |
CIMEX S A | CIMEX | CIMEX |
LA EMPRESA CUBANA DE FLETES | LA EMPRESA CUBANA FLETES | LA| EMPRE| PRESA| CUBAN| UBA NA| FLETE| LETES |