Individual Full Name Trim Pairs Cluster (dnClusterFullNameTrim)

On occasion, two names which are close matches may not generate a common cluster key using the Full Name Metaphone Pairs cluster.

Consider the following example records:

Table 5-7 Full Name Trim Pairs Cluster

dnFullName Name Tokens and Metaphone Values Distinct Cluster Keys dnClusterFullNameMeta
XIAO JIAN ZHONG JIAN | JN

XIAO | S

ZHONG | JNK

JNS JNJNK SJNK JNS | JNJNK | SJNK
ZHONG XIAOJIAN XIAOJIAN | SJN

ZHONG | JNK

SJNJNK SJNJNK

These two records are a possible name match. However, the Full Name Metaphone Pairs cluster does not produce a common cluster key for the pair because the tokens ‘Xiao’ and ‘Xiaojian’ yield different three character Metaphone keys.

In order to match these cases efficiently, a Full Name Trim Pairs cluster is prepared in a similar way to the primary cluster, but without applying a Metaphone transformation. This allows for typos and spacing differences in the names, but is ‘left-biased’; that is, it demands that the first few characters of the names match.

The logic of the cluster is as follows:

  1. Split the normalized full name into name tokens, using space as a delimiter.
  2. Sort the name tokens alphabetically.
  3. Apply the Trim Characters transformation to each name token, outputting a key with a length of (up to) 3 characters.
  4. Concatenate the trimmed values, generating a final key value for each distinct pair of tokens.
  5. Deduplicate the list of keys.
The following table provides some examples.

Table 5-8 Trim Characters for Full Name Trim Pairs Cluster

dnFullName Name Tokens and Trimmed Values Cluster Keys dnClusterFullNameTrim
XIAO JIAN ZHONG JIAN | JIA

XIAO | XIA

ZHONG | ZHO

JIAXIA JIAZHO XIAZHO JIAXIA | JIAZHO | XIAZHO
ZHONG XIAOJIAN XIAOJIAN | XIA

ZHONG | ZHO

XIAZHO XIAZHO
MOHAMMED SANI ABACHE ABACHE | ABA

MOHAMMED | MOH

SANI | SAN

ABAMOH ABASAN MOHSAN ABAMOH | ABASAN | MOHSAN
JOSEPH TSANGA ABANDA ABANDA | ABA

JOSEPH | JOS

TSANGA | TSA

ABAJOS ABATSA JOSTSA ABAJOS | ABATSA | JOSTSA
ABD AL WAHAB ABD AL HAFIZ ABD | ABD

ABD | ABD

AL | AL

AL | AL

HAFIZ | HAF

WAHAB | WAH

ABDABD ABDAL ABDHAF ABDWAH ALAL ALHAF ALWAH HAFWAH ABDABD | ABDAL | ABDHAF | ABDWAH | ALAL | ALHAF | ALWAH | HAFWAH
SULIMAN HAMD SULEIMAN AL BUTHE AL | AL

BUTHE | BUT

HAMD | HAM

SULEIMAN | SUL

SULIMAN | SUL

ALBUT ALHAM ALSUL ALSUL BUTHAM BUTSUL HAMSUL SULSUL ALBUT | ALHAM | ALSUL | BUTHAM | BUTSUL | HAMSUL | SULSUL
AL BUTHE SOLEIMAN HAMAD AL | AL

BUTHE | BUT

HAMD | HAM

SOLEIMAN

ALBUT ALHAM ALSOL BUTHAM BUTSOL HAMSOL ALBUT | ALHAM | ALSOL | BUTHAM | BUTSOL | HAMSOL
REGINALD B GOODRIDGE B | B

GOODRIDGE | GOO

REGINALD | REG

GOOREG GOOREG
REGINALD B SR GOODRICH B | B

GOODRICH | GOO

REGINALD | REG

SR | SR

GOOREG GOOSR REGSR

NOTE: Initials are ignored by default when generating cluster keys.

GOOREG | GOOSR | REGSR
STEPHEN JEQE NKOMO JEQE | JEQ

NKOMO | NKO

STEPHEN | STE

JEQNKO JEQSTE NKOSTE JEQNKO | JEQSTE | NKOSTE
S J NKOMO S | S

NKOMO | NKO

J | J

NKO

NOTE: Initials are ignored by default when generating cluster keys.

NKO
STEPHEN JEKE N KOMO JEKE | JEK

KOMO | KOM

N | N

STEPHE | STE

JEKKOM JEKSTE KOMSTE

NOTE: Initials are ignored by default when generating cluster keys.

JEKKOM | JEKSTE | KOMSTE