6.2.1 Family Name Cluster (dnClusterFamilyName)
Table 6-5 Family Name Cluster details
| dnFullName | Name tokens and trimmed values | Name tokens and trimmed values | Identifiers dnFamilyName | dnClusterFullNameTrim |
|---|---|---|---|---|
| STEPHEN JEQE NKOMO | JEQE | JEQ | JEQNKO JEQSTE NKOSTE | JEQNKO|JEQSTE|NKOSTE |
| - | NKOMO | NKO | - | - |
| - | STEPHEN | STE | - | - |
| S J NKOMO | S | S | NKO | NKO |
| - | NKOMO | NKO | - | - |
| - | J | J | - | - |
| STEPHEN JEKE N KOMO | JEKE | JEK | JEKKOM JEKSTE KOMSTE | JEKKOM|JEKSTE|KOMSTE |
| - | KOMO | KOM | - | - |
| - | N | N | - | - |
| - | STEPHEN | STE | - | - |
Clustering only on the family name circumvents this issue, but results in large clusters and a concomitant increase in the processing required to cross-check all the records.
The Family Name cluster builder counters spacing and punctuation differences by generating Metaphone keys for all tokens of the family name, AND the whole of the family name after all white space is trimmed. This is to ensure that family names such as those in the last two records in the example table below are all clustered together despite the spacing differences.
- Trim all white space from the normalized family name
- Apply the Metaphone transformation to the result, outputting a key with a length of up to 4 characters.
- Strip common name qualifiers from the normalized family name, e.g. Abd, Al.
- Split the family name into several name tokens, using a space delimiter.
Many other punctuation and noise characters are normalized to spaces before generating the cluster. For further information see Name Normalization.
- Apply the Metaphone transformation to each name token, outputting a key with a length of up to 4 characters. If there were no tokens remaining after stripping common name qualifiers then apply the Metaphone transformation to the each name token of the original normalized family name.
- Concatenate all the generated Metaphone keys
- Deduplicate the list of keys
Table 6-6 Metaphone Transormations for Family Name Cluster
| dnFamilyName | Tokens derived from dnFamilyName | Metaphone transformations | dnClusterFamilyName |
|---|---|---|---|
| ZHONG | ZHONG | JNK | JNK |
| XIAOJIAN | XIAOJIAN | SJN | SJN |
| ABACHE | ABACHE | APX | APX |
| ABANDA | ABANDA | APNT | APNT |
| ABD AL HAFIZ | HAFIZ ABDALHAFIZ | HFS APTL | HFS APTL |
| AL BUTHE | BUTHE ALBUTHE | P0 ALP0 | P0|ALP0 |
| AL | AL | AL | AL |
| SOLEIMAN HAMAD | SOLEIMAN HAMAD SOLEIMANHAMAD | SLMN HMT SLMN | SLMN|HMT |
| GOODRIDGE | GOODRIDGE | KTRJ | KTRJ |
| GOODRICH SR | GOODRICH SR GOODRICHSR | KTRX SR KTRK | KTRX|SR|KTRK |
| NKOMO | NKOMO | NKM | NKM |
| N KOMO | N KOMO NKOMO | N KM NKM | N|KM|NKM |