6.2.4 Individual Full Name Trim Pairs Cluster(dnClusterFullNameTrim)

On occasion, two names which are close matches may not generate a common cluster key using the Full Name Metaphone Pairs cluster.

The following table describes the Full Name Trim Pairs Cluster.

Table 6-9 Full Name Trim Pairs Cluster

dnFullName Name tokens and Metaphone values Name tokens and Metaphone values Distinct Cluster Keys dnClusterFull- NameMeta
XIAO JIAN ZHONG JIAN JN JNS JNJNK SJNK JNS|JNJNK|SJNK
XIAO S
ZHONG JNK
ZHONG XIAOJIAN XIAOJIAN SJN SJNJNK SJNJNK
ZHONG JNK

These two records are a possible name match. However, the Full Name Metaphone Pairs cluster does not produce a common cluster key for the pair because the tokens ‘Xiao’ and ‘Xiaojian’ yield different three character Metaphone keys.

In order to match these cases efficiently, a Full Name Trim Pairs cluster is prepared in a similar way to the primary cluster, but without applying a Metaphone transformation. This allows for typos and spacing differences in the names, but is ‘left-biased’; that is, it demands that the first few characters of the names match.

The logic of the cluster is as follows:
  1. Split the normalized full name into name tokens, using space as adelimiter.
  2. Sort the name tokens alphabetically.
  3. Apply the Trim Characters transformation to each name token, outputting a key with a length of (up to) 3 characters.
  4. Concatenate the trimmed values, generating a final key value for each distinct pair of tokens.
  5. Deduplicate the list of keys.

    The following table describes the Trim Characters for Full Name Trim Pairs Cluster.

    Table 6-10 Trim Characters for Full Name Trim Pairs Cluster

    dnFullName Name tokens and trimmed values Name tokens and trimmed values Cluster Keys dnClusterFullNameTrim
    XIAO JIAN ZHONG JIAN JIA JIAXIA JIAZHO XIAZHO JIAXIA|JIAZHO|XI AZHO
    XIAO XIA
    ZHONG ZHO
    ZHONG XIAOJIAN XIAOJIAN XIA XIAZHO XIAZHO
    ZHONG ZHO
    MOHAMMED SANI ABACHE ABACHE ABA

    ABAMOH ABASAN

    MOHSAN

    ABAMOH|ABASA N|MOHSAN
    MOHAMMED MOH
    SANI SAN
    JOSEPH TSANGA ABANDA ABANDA ABA ABAJOS ABATSA JOSTSA

    ABAJOS|ABATSA

    |JOSTSA

    JOSEPH JOS
    TSANGA TSA
    ABD AL WAHAB ABD AL HAFIZ ABD ABD

    ABDABD ABDAL ABDHAF

    ABDWAH ALAL ALHAF

    ALWAH HAFWAH

    ABDABD|ABDAL| ABDHAF

    |ABDWAH|ALAL| ALHAF

    |ALWAH|HAFWA H

    ABD ABD
    AL AL
    AL AL
    HAFIZ HAF
    WAHAB WAH
    SULIMAN HAMD SULEIMAN AL BUTHE AL AL

    ALBUT ALHAM ALSUL

    ALSUL BUTHAM BUTSUL

    HAMSUL SULSUL

    ALBUT|ALHAM| ALSUL|

    BUTHAM|BUTSU

    L|

    HAMSUL|SULSU L

    BUTHE BUT
    HAMD HAM
    SULEIMAN SUL
    SULIMAN SUL
    AL BUTHE SOLEIMAN HAMAD AL AL

    ALBUT ALHAM ALSOL

    BUTHAM BUTSOL

    HAMSOL

    ALBUT|ALHAM| ALSOL|

    BUTHAM|BUTSO L |HAMSOL

    BUTHE BUT
    HAMAD HAM
    SOLEIMAN -
    REGINALD B GOODRIDGE B B

    GOOREG

    NOTE: Initials are ignored by default when generating cluster keys

    GOOREG
    GOODRIDGE GOO
    REGINALD REG
    REGINALD B SR GOODRICH B B GOOREG GOOSR REGSR

    GOOREG|GOOSR

    |REGSR

    GOODRICH GOO
    REGINALD REG
    SR SR
    STEPHEN JEQE NKOMO JEQE JEQ JEQNKO JEQSTE NKOSTE

    JEQNKO|JEQSTE

    |NKOSTE

    NKOMO NKO
    STEPHEN STE
    S J NKOMO S S

    NKO

    Initials are ignored by default when generating cluster keys

    NKO
    NKOMO NKO
    J J
    STEPHEN JEKE N KOMO JEKE JEK

    JEKKOM JEKSTE KOMSTE

    Initials are ignored by default when generating cluster keys

    JEKKOM|JEKSTE

    |KOMSTE

    KOMO KOM
    N N
    STEPHE STE