Individual Full Name Trim Pairs Cluster (dnClusterFullNameTrim)
On occasion, two names which are close matches may not generate a common cluster key using the Full Name Metaphone Pairs cluster.
Table 5-7 Full Name Trim Pairs Cluster
dnFullName | Name Tokens and Metaphone Values | Distinct Cluster Keys | dnClusterFullNameMeta |
---|---|---|---|
XIAO JIAN ZHONG | JIAN | JN
XIAO | S ZHONG | JNK |
JNS JNJNK SJNK | JNS | JNJNK | SJNK |
ZHONG XIAOJIAN | XIAOJIAN | SJN
ZHONG | JNK |
SJNJNK | SJNJNK |
These two records are a possible name match. However, the Full Name Metaphone Pairs cluster does not produce a common cluster key for the pair because the tokens ‘Xiao’ and ‘Xiaojian’ yield different three character Metaphone keys.
In order to match these cases efficiently, a Full Name Trim Pairs cluster is prepared in a similar way to the primary cluster, but without applying a Metaphone transformation. This allows for typos and spacing differences in the names, but is ‘left-biased’; that is, it demands that the first few characters of the names match.
The logic of the cluster is as follows:
- Split the normalized full name into name tokens, using space as a delimiter.
- Sort the name tokens alphabetically.
- Apply the Trim Characters transformation to each name token, outputting a key with a length of (up to) 3 characters.
- Concatenate the trimmed values, generating a final key value for each distinct pair of tokens.
- Deduplicate the list of keys.
Table 5-8 Trim Characters for Full Name Trim Pairs Cluster
dnFullName | Name Tokens and Trimmed Values | Cluster Keys | dnClusterFullNameTrim |
---|---|---|---|
XIAO JIAN ZHONG | JIAN | JIA
XIAO | XIA ZHONG | ZHO |
JIAXIA JIAZHO XIAZHO | JIAXIA | JIAZHO | XIAZHO |
ZHONG XIAOJIAN | XIAOJIAN | XIA
ZHONG | ZHO |
XIAZHO | XIAZHO |
MOHAMMED SANI ABACHE | ABACHE | ABA
MOHAMMED | MOH SANI | SAN |
ABAMOH ABASAN MOHSAN | ABAMOH | ABASAN | MOHSAN |
JOSEPH TSANGA ABANDA | ABANDA | ABA
JOSEPH | JOS TSANGA | TSA |
ABAJOS ABATSA JOSTSA | ABAJOS | ABATSA | JOSTSA |
ABD AL WAHAB ABD AL HAFIZ | ABD | ABD
ABD | ABD AL | AL AL | AL HAFIZ | HAF WAHAB | WAH |
ABDABD ABDAL ABDHAF ABDWAH ALAL ALHAF ALWAH HAFWAH | ABDABD | ABDAL | ABDHAF | ABDWAH | ALAL | ALHAF | ALWAH | HAFWAH |
SULIMAN HAMD SULEIMAN AL BUTHE | AL | AL
BUTHE | BUT HAMD | HAM SULEIMAN | SUL SULIMAN | SUL |
ALBUT ALHAM ALSUL ALSUL BUTHAM BUTSUL HAMSUL SULSUL | ALBUT | ALHAM | ALSUL | BUTHAM | BUTSUL | HAMSUL | SULSUL |
AL BUTHE SOLEIMAN HAMAD | AL | AL
BUTHE | BUT HAMD | HAM SOLEIMAN |
ALBUT ALHAM ALSOL BUTHAM BUTSOL HAMSOL | ALBUT | ALHAM | ALSOL | BUTHAM | BUTSOL | HAMSOL |
REGINALD B GOODRIDGE | B | B
GOODRIDGE | GOO REGINALD | REG |
GOOREG | GOOREG |
REGINALD B SR GOODRICH | B | B
GOODRICH | GOO REGINALD | REG SR | SR |
GOOREG GOOSR REGSR
NOTE: Initials are ignored by default when generating cluster keys. |
GOOREG | GOOSR | REGSR |
STEPHEN JEQE NKOMO | JEQE | JEQ
NKOMO | NKO STEPHEN | STE |
JEQNKO JEQSTE NKOSTE | JEQNKO | JEQSTE | NKOSTE |
S J NKOMO | S | S
NKOMO | NKO J | J |
NKO
NOTE: Initials are ignored by default when generating cluster keys. |
NKO |
STEPHEN JEKE N KOMO | JEKE | JEK
KOMO | KOM N | N STEPHE | STE |
JEKKOM JEKSTE KOMSTE
NOTE: Initials are ignored by default when generating cluster keys. |
JEKKOM | JEKSTE | KOMSTE |