6.2.4 Individual Full Name Trim Pairs Cluster(dnClusterFullNameTrim)
On occasion, two names which are close matches may not generate a common cluster key using the Full Name Metaphone Pairs cluster.
Table 6-9 Full Name Trim Pairs Cluster
| dnFullName | Name tokens and Metaphone values | Name tokens and Metaphone values | Distinct Cluster Keys | dnClusterFull- NameMeta |
|---|---|---|---|---|
| XIAO JIAN ZHONG | JIAN | JN | JNS JNJNK SJNK | JNS|JNJNK|SJNK |
| XIAO | S | |||
| ZHONG | JNK | |||
| ZHONG XIAOJIAN | XIAOJIAN | SJN | SJNJNK | SJNJNK |
| ZHONG | JNK |
These two records are a possible name match. However, the Full Name Metaphone Pairs cluster does not produce a common cluster key for the pair because the tokens ‘Xiao’ and ‘Xiaojian’ yield different three character Metaphone keys.
In order to match these cases efficiently, a Full Name Trim Pairs cluster is prepared in a similar way to the primary cluster, but without applying a Metaphone transformation. This allows for typos and spacing differences in the names, but is ‘left-biased’; that is, it demands that the first few characters of the names match.
- Split the normalized full name into name tokens, using space as adelimiter.
- Sort the name tokens alphabetically.
- Apply the Trim Characters transformation to each name token, outputting a key with a length of (up to) 3 characters.
- Concatenate the trimmed values, generating a final key value for each distinct pair of tokens.
- Deduplicate the list of keys.
The following table describes the Trim Characters for Full Name Trim Pairs Cluster.
Table 6-10 Trim Characters for Full Name Trim Pairs Cluster
dnFullName Name tokens and trimmed values Name tokens and trimmed values Cluster Keys dnClusterFullNameTrim XIAO JIAN ZHONG JIAN JIA JIAXIA JIAZHO XIAZHO JIAXIA|JIAZHO|XI AZHO XIAO XIA ZHONG ZHO ZHONG XIAOJIAN XIAOJIAN XIA XIAZHO XIAZHO ZHONG ZHO MOHAMMED SANI ABACHE ABACHE ABA ABAMOH ABASAN
MOHSAN
ABAMOH|ABASA N|MOHSAN MOHAMMED MOH SANI SAN JOSEPH TSANGA ABANDA ABANDA ABA ABAJOS ABATSA JOSTSA ABAJOS|ABATSA
|JOSTSA
JOSEPH JOS TSANGA TSA ABD AL WAHAB ABD AL HAFIZ ABD ABD ABDABD ABDAL ABDHAF
ABDWAH ALAL ALHAF
ALWAH HAFWAH
ABDABD|ABDAL| ABDHAF
|ABDWAH|ALAL| ALHAF
|ALWAH|HAFWA H
ABD ABD AL AL AL AL HAFIZ HAF WAHAB WAH SULIMAN HAMD SULEIMAN AL BUTHE AL AL ALBUT ALHAM ALSUL
ALSUL BUTHAM BUTSUL
HAMSUL SULSUL
ALBUT|ALHAM| ALSUL|
BUTHAM|BUTSU
L|
HAMSUL|SULSU L
BUTHE BUT HAMD HAM SULEIMAN SUL SULIMAN SUL AL BUTHE SOLEIMAN HAMAD AL AL ALBUT ALHAM ALSOL
BUTHAM BUTSOL
HAMSOL
ALBUT|ALHAM| ALSOL|
BUTHAM|BUTSO L |HAMSOL
BUTHE BUT HAMAD HAM SOLEIMAN - REGINALD B GOODRIDGE B B GOOREG
NOTE: Initials are ignored by default when generating cluster keys
GOOREG GOODRIDGE GOO REGINALD REG REGINALD B SR GOODRICH B B GOOREG GOOSR REGSR GOOREG|GOOSR
|REGSR
GOODRICH GOO REGINALD REG SR SR STEPHEN JEQE NKOMO JEQE JEQ JEQNKO JEQSTE NKOSTE JEQNKO|JEQSTE
|NKOSTE
NKOMO NKO STEPHEN STE S J NKOMO S S NKO
Initials are ignored by default when generating cluster keys
NKO NKOMO NKO J J STEPHEN JEKE N KOMO JEKE JEK JEKKOM JEKSTE KOMSTE
Initials are ignored by default when generating cluster keys
JEKKOM|JEKSTE
|KOMSTE
KOMO KOM N N STEPHE STE