Entity Start End Name Tokens (dnClusterStartEndNameTokens)

This clustering method is designed as a looser version of the Entity Name Tokens cluster and allows for variation in entity names by creating clusters for the first five and last five characters of each name token.

The default logic is as follows:

  1. Remove initials.
  2. Remove common name tokens, such as Limited, or Corporation.
  3. Normalize whitespace.
  4. For each token that is longer than five characters, replace with two new tokens that are:
    • The first five characters of the token
    • The last five characters of the token
The following table provides some examples.

Table 5-14 Start/End Name Tokens Cluster

dnEntityName Name with Initials and Common Name Tokens Stripped dnClusterStartEndNameTokens
HAVANA INTERNATIONAL BANK LTD HAVANA INTERNATIONAL BANK HAVAN| AVANA| INTER| IONAL| B ANK
CIMEX S A CIMEX CIMEX
LA EMPRESA CUBANA DE FLETES LA EMPRESA CUBANA FLETES LA| EMPRE| PRESA| CUBAN| UBA NA| FLETE| LETES