6.2.2 Individual Full Name Metaphone Pairs Cluster(dnClusterFull- NameMeta)
- Split the normalized full name into several name tokens, using
space as a delimiter.
Many other punctuation and noise characters are normalized to spaces before generating the cluster. For further information see Name Normalization.
- Sort the name tokens alphabetically.
- Apply the Metaphonetransformation (the standard double-metaphone algorithm) to each name token, outputting a key with a length of up to three characters.
- Concatenate the Metaphone values, generating a final key value for each distinct pair of tokens.
- Deduplicate the list of keys.
The following table describes the Full Name Metaphone Pairs Cluster example.
Table 6-7 Full Name Metaphone Pairs Cluster
dnFullName Name tokens and Metaphone values Name tokens and Metaphone values Distinct Cluster Keys dnClusterFull- NameMeta XIAO JIAN ZHONG JIAN JN JNS JNJNK SJNK JNS|JNJNK|SJNK XIAO S ZHONG JNK ZHONG XIAOJIAN XIAOJIAN SJN SJNJNK SJNJNK ZHONG JNK MOHAMMED SANI ABACHE ABACHE ABX APXMHM APXSN MHMSN APXMHM|APXSN
| MHMSN
MOHAMMED MHMT SANI SN JOSEPH TSANGA ABANDA ABANDA APNT APNJSF APNTSN JSFTSN APNJSF|APNTSN
|JSFTSN
JOSEPH JSF TSANGA TSNK ABD AL WAHAB ABD AL HAFIZ ABD APT APTAPT APTAL APTHFS APTAHP ALAL ALHFS ALAHP HFSAHP APTAPT|APTAL|A PTHFS
|APTAHP|ALAL|A LHFS
|ALAHP|HFSAHP
ABD APT AL AL AL AL HAFIZ HFS WAHAB AHP SULIMAN HAMD SULEIMAN AL BUTHE AL AL ALP0 ALHMT ALSLM P0HMT P0SLM HMTSLM SLMSLM ALP0|ALHMT|AL SLM| P0HMT|P0SLM| HMTSLM
|SLMSLM
BUTHE P0 HAMD HMT SULEIMAN SLMN SULIMAN SLMN AL BUTHE SOLEIMAN HAMAD AL AL ALP0 ALHMT ALSLM P0HMT P0SLM HMTSLM ALP0|ALHMT|AL SLM| P0HMT|P0SLM| HMTSLM BUTHE P0 HAMAD HMT SOLEIMAN SLMN REGINALD B GOODRIDGE B P KTRRJN
NOTE: Initials are ignored by default when generating cluster keys
KTRRJN GOODRIDGE KTRJ REGINALD RJNLT REGINALD B SR GOODRICH B P KTRRJN KTRSR RJNSR
NOTE: Initials are ignored by default when generating cluster keys
KTRRJN|KTRSR| RJNSR GOODRIDGE KTRJ REGINALD RJNLT SR SR STEPHEN JEQE NKOMO JEQE JK JKNKM JKSTF NKMSTF JKNKM|JKSTF|N KMSTF NKOMO NKM STEPHEN STFN S J NKOMO J J NKM
Initials are ignored by default when generatingcluster keys
NKM NKOMO NKM S S STEPHEN JEKE N KOMO JEKE JK JKKM JKSTF KMSTF JKKM|JKSTF|KM STF KOMO KM N N STEPHEN STFN