Original Script Name (dnClusterOriginalScript)

The Original Script Name cluster provides a clustering method for matching names represented in non-Latin writing systems. The cluster builder generates a key for each token in the name.

Note:

A single cluster value of "Myanmar" is generated for original script names written in the Burmese alphabet irrespective of the name. This is needed because token splitting is not possible for the Myanmar writing system as it does not use a space character between words. As a result, all original script names in Burmese script will be compared during matching. This should not cause performance issues during screening providing there are a low number of customer records using this writing system.

The default logic of the cluster builder is as follows:

  1. Split the original script name into several name tokens, using a space character as the delimiter.
  2. Trim each name token to a maximum of 5 characters.
  3. Concatenate all of the trimmed token values with a pipe separator.
  4. Deduplicate the list of keys.

The following table provides some examples.

Table 5-10 Original Script Name Cluster

dnOriginalScriptName dnClusterOriginalScript
Iван Антонавiч Шчурок Iван | Антон | Шчуро
Chinese characters Chinese characters
Myanmar in orignal script Myanmar
Arabic characters Arabic characters