Family Name Cluster (dnClusterFamilyName)
The Family Name cluster provides a backup to the full name clusters. This is especially important where the given name data is incomplete, making it difficult to form a complete cluster key for two names.
Table 5-3 Family Name Cluster
dnFullName | Name Tokens and Trimmed Values | Cluster Keys | dnClusterFullNameTrim |
---|---|---|---|
STEPHEN JEQE NKOMO | JEQE| JEQ
NKOMO| NKO STEPHEN| STE |
JEQNKO JEQSTE NKOSTE | JEQNKO|JEQSTE|NKOSTE |
S J NKOMO | S| S
NKOMO| NKO J| J |
NKO | NKO |
STEPHEN JEKE N KOMO | JEKE| JEK
KOMO| KOM N| N STEPHEN| STE |
JEKKOM JEKSTE KOMSTE | JEKKOM|JEKSTE|KOMSTE |
Clustering only on the family name circumvents this issue, but results in large clusters and a concomitant increase in the processing required to cross-check all the records.
The Family Name cluster builder counters spacing and punctuation differences by generating Metaphone keys for all tokens of the family name, AND the whole of the family name after all white space is trimmed. This is to ensure that family names such as those in the last two records in the example table below are all clustered together despite the spacing differences.
- Trim all white space from the normalized family name.
- Apply the Metaphonetransformation to the result, outputting a key with a length of up to 4 characters.
- Strip common name qualifiers from the normalized family name, such as Abd,Al.
- Split the family name into several name tokens, using a space
delimiter.
Note:
Many other punctuation and noise characters are normalized to spaces before generating the cluster. For more information see Name Normalization. - Apply the Metaphone transformation to each name token, outputting a key with a length of up to 4 characters. If there were no tokens remaining after stripping common name qualifiers then apply the Metaphone transformation to the each name token of the original normalized family name.
- Concatenate all the generated Metaphone keys.
- Deduplicate the list of keys.
Table 5-4 Metaphone Transformations for Family Name Cluster
dnFamilyName | Tokens Derived from dnFamilyName | Metaphone Transformations | dnClusterFamilyName |
---|---|---|---|
ZHONG | ZHONG | JNK | JNK |
XIAOJIAN | XIAOJIAN | SJN | SJN |
ABACHE | ABACHE | APX | APX |
ABANDA | ABANDA | APNT | APNT |
ABD AL HAFIZ | HAFIZ ABDALHAFIZ | HFS APTL | HFS|APTL |
AL BUTHE | BUTHE ALBUTHE | P0 ALP0 | P0|ALP0 |
AL | AL | AL | AL |
SOLEIMAN HAMAD | SOLEIMAN HAMAD SOLEIMANHAMAD | SLMN HMT SLMN | SLMN|HMT |
GOODRIDGE | GOODRIDGE | KTRJ | KTRJ |
GOODRICH SR | GOODRICH SR GOODRICHSR | KTRX SR KTRK | KTRX|SR|KTRK |
NKOMO | NKOMO | NKM | NKM |
N KOMO | N KOMO NKOMO | N KM NKM | N|KM|NKM |