Entity Start End Name Tokens (dnClusterStartEndNameTokens)

This clustering method is designed as a looser version of the Entity Name Tokens cluster and allows for variation in entity names by creating clusters for the first five and last five characters of each name token.

The default logic is as follows:

Remove initials.
Remove common name tokens, such as Limited, or Corporation.
Normalize whitespace.
For each token that is longer than five characters, replace with two new tokens that are:
- The first five characters of the token
- The last five characters of the token

The following table provides some examples.

Table 5-14 Start/End Name Tokens Cluster

dnEntityName	Name with Initials and Common Name Tokens Stripped	dnClusterStartEndNameTokens
HAVANA INTERNATIONAL BANK LTD	HAVANA INTERNATIONAL BANK	HAVAN\| AVANA\| INTER\| IONAL\| B ANK
CIMEX S A	CIMEX	CIMEX
LA EMPRESA CUBANA DE FLETES	LA EMPRESA CUBANA FLETES	LA\| EMPRE\| PRESA\| CUBAN\| UBA NA\| FLETE\| LETES