Evaluation Logic Used by Matching

The Customer Screening Matching Service uses evaluation logic to determine whether individuals and entities match the watch list.

Evaluation logic is the foundation for a sub-rule. A sub-rule is a combination of the evaluation logic with an AND condition. The overall score for an individual or entity is the weighted average of all the individual attribute scores.

Consider two source attributes available for matching individuals: customer last name and customer full name. The customer last name is matched with a watch list Family Name record and customer full name is matched with a watch list Full Name & Alias Name record using fuzzy matching. The threshold score is as configured by the user and weightage is as configured by the user. A JSON is generated when the batch is run and passed to the Matching Service.

The Entity rules work the same way as the Individual rules, except that the entity rules or logic only applies to companies and corporations.The following table provides some examples of evaluation logic for SAN, PEP, and EDD.

Table 8-4 Customer Screening Evaluation Logic

Logic Used Description Example
Exact Considers two values and determines whether or not they match exactly. Applies only if Exact Match is selected. It does not apply when using Fuzzy Match. If the source attribute is “John smith” and target attribute is “John smith”, then the match is an exact match.
Character Edit Distance (CED) Considers two String tokens and determines how closely they match each other by calculating the minimum number of character edits (deletions, insertions and substitutions) needed to transform one value into the other.

For entities, stop words are not considered.

If the source attribute is “John smith” and target attribute is “Jon smith”, then the CED is 1 since the letter 'h' is missing between the source attribute and target attribute.

If the entity names are Oracle Financial Corporation and Finance Orcl Pvt. Ltd., then only Oracle Financial and Finance Orcl are considered for matching as corporation, Pvt., and Ltd. are stop words.

The CED for Orcl is 2 and CED for finance is 3, so the overall CED is 3.

Character Match Percentage (CMP) Determines how closely two values match each other by calculating the Character Edit Distance between two String tokens and considering the length of the shorter of the two tokens, by character count. If the source attribute is “John smith” and target attribute is “Jon smith”, then the CMP is calculated using the formula (length of shorter string – CED) * 100 /length of longer string. In this case, it is (9-1) * 100/8 = 77.77%.
Word Edit Distance (WED) Determines how well multi-word String values match each other by calculating the minimum number of word edits (word insertions, deletions and substitutions) required to transform one value to another. If the source attribute is “John smith” and target attribute is “Jon smith”, then the WED is calculated by checking the number of words that did not match with the target words after allowing for character tolerance, which is the number of words in the source attribute that did not match the target attribute.

For example, the source string is Yohan Russel Smith and target string is Smith Johaan Rusel. First, we determine the CED for each word:

  • Yohan matches with Johann with a CED of 2
  • Russel matches with Rusel with a CED of 1
  • Smith matches with Smith with a CED of 0
  • If we consider a character tolerance of 1, we can observe the following:
  • Russel with a character tolerance of 1 matches with Rusel.
  • Smith with a character tolerance of 0 matches with Smith.
  • Yohan with a character tolerance of 2 does not match with Johann as the character tolerance is 1.

Based on these observations, we can conclude that one word does not match. This means that the WED is 1.

Word Match Percentage (WMP) Determines how closely, by percentage, two multi-word values match each other by calculating the Word Edit Distance between two Strings, and also taking into account the length of the longer or the shorter of the two values, by word count. The WMP is calculated using the formula (WMC/minimum word length) * 100.

If the source attribute is “John smith” and target attribute is “Jon smith”, then the WMP is calculated as (2/5) * 100 = 40 %.

Word Match Count (WMC)

Determines how closely two multi-word values match each other by calculating the Word Edit Distance between two Strings, and also taking into account the length of the longer or the shorter of the two values, by word count.

The WMC is like WED, with the difference being that WMC gives the number of matches between 2 words and WED gives the number of words that did not match between 2 words.

If the source attribute is “John smith” and target attribute is “Jon smith”, then the WMC is 2 as two words have matched (allowing for the character tolerance).

Exact String Match Considers two String values and determines whether or not they match exactly.
Abbreviation Checks if the first character matches with the first character of source and target values.
Starts With Compares two values and determines whether either value starts with the whole of the other value. It therefore matches both exact matches and matches where one of the values starts the same as the other but contains extra information
Jaro Winkler The Jaro Winkler similarity is the measure of the edit distance between two strings. If the source string is Mohammed Ali and the target string is Mohammed Ali, then the similarity = 1.