Matching Rules Widget

The Matching Rules widgetMatching Rules iconenables you to define the matching configuration for a set of data. The data that must be matched by each widget depends on the source and target set in the Entity widgets linked to the Matching Rules widget. The source and target data can be filtered if a subset of data is to have this matching configuration applied. This allows you to provide different matching configurations for different types of watchlist records and different jurisdictions and domains. Each matching ruleset contains the name, description, scoring aggregation used, the threshold value for the overall rule set and one or more rules.

Rules are configured using the Matching Ruleset window. Matches are generated based on a defined set of attributes for each rule. A weighted average of the score is generated for each of the attribute level matches.

There are two types of matching services :

·        Real-Time query processing

·        Bulk query processing

In Real-Time query processing, a string value given in the UI is matched against a column in the target table. Customer Screening explicitly passes the strings as values in the request which forms “the strings to be matched” against “all the values in a column name”. Then, based on the matches received for the source string from the search engine, the score and the feature vector for the matched strings (source and target) are generated. Scores which exceed the configured thresholds are taken and collected.

Provide the following values for each rule:

·        Source attribute

·        Target attribute

·        Match type. The following table provides some examples:

Match Types– Descriptions and Examples

Logic Used

Description

Example

Exact

Considers two values and determines whether or not they match exactly. Applies only if Exact Match is selected. It does not apply when using Fuzzy Match. 

If the source attribute is “John smith” and target attribute is “John smith”, then the match is an exact match.

Character Edit Distance (CED)

Considers two String tokens and determines how closely they match each other by calculating the minimum number of character edits (deletions, insertions and substitutions) needed to transform one value into the other.

For entities, stop words are not considered.

If the source attribute is “John smith” and target attribute is “Jon smith”, then the CED is 1 since the letter 'h' is missing between the source attribute and target

attribute.

If the entity names are Oracle Financial Corporation and Finance Orcl Pvt. Ltd., then only Oracle Financial and Finance Orcl are considered for matching as corporation, Pvt., and Ltd. are stop words.

The CED for Orcl is 2 and CED for finance is 3, so the overall CED is 3.

Character Match Percentage (CMP)

Determines how closely two values match each other by calculating the Character Edit Distance between two String tokens and considering the length of the shorter of the two tokens, by character count. 

If the source attribute is “John smith” and target attribute is “Jon smith”, then the CMP is calculated using the formula (length of shorter string – CED) * 100 / length of longer string. In this case, it is (9-1) * 100/8 = 77.77%.

Word Edit Distance (WED)

Determines how well multi-word String values match each other by calculating the minimum number of word edits (word insertions, deletions and substitutions) required to transform one value to another. 

If the source attribute is “John smith” and target attribute is “Jon smith”, then the WED is calculated by checking the number of words that did not match with the target words after allowing for character tolerance, which is the number of words in the source attribute that did not match the target attribute.

For example, the source string is Yohan Russel Smith and target string is Smith Johaan Rusel. First, we determine the CED for each word:

·        Yohan matches with Johann with a CED of 2

·        Russel matches with Rusel with a CED of 1

·        Smith matches with Smith with a CED of 0

If we consider a character tolerance of 1, we can observe the following:

·        Russel with a character tolerance of 1 matches with Rusel.

·        Smith with a character tolerance of 0 matches with Smith.

·        Yohan with a character tolerance of 2 does not match with Johann as the character tolerance is 1.

Based on these observations, we can conclude that one word does not match. This means that the WED is 1.

Word Match Percentage (WMP)

Determines how closely, by percentage, two multi-word values match each other by calculating the Word Edit Distance between two Strings, and also taking into account the length of the longer or the shorter of the two values, by word count.

The WMP is calculated using the formula (WMC/minimum word length) * 100.

If the source attribute is “John smith” and target attribute is “Jon smith”, then the WMP is calculated as (2/5) * 100 = 40 %.

Word Match Count (WMC)

Determines how closely two multi-word values match each other by calculating the Word Edit Distance between two Strings, and also taking into account the length of the longer or the shorter of the two values, by word count. 

The WMC is like WED, with the difference being that WMC gives the number of matches between 2 words and WED gives the number of words that did not match between 2 words.

If the source attribute is “John smith” and target attribute is “Jon smith”, then the WMC is 2 as two words have matched (allowing for the character tolerance).

Exact String Match

Considers two String values and determines whether or not they match exactly.

 

Abbreviation

Checks if the first character matches with the first character of source and target values. 

 

Starts With

Compares two values and determines whether either value starts with the whole of the other value. It therefore matches both exact matches and matches where one of the values starts the same as the other but contains extra information. 

 

Jaro Winkler or Reverse Jaro Winkler

The Jaro Winkler similarity is the measure of the edit distance between two strings.  Click here for more information.

In the Reverse Jaro Winkler, matches are generated even if the string is reversed. For example, if the source string is Mohammed Ali and the target string is Ali Mohammed, then the similarity = 1.

If the source string is Mohammed Ali and the target string is Mohammed Ali, then the similarity = 1.

Levenshtein

The Levenshtein Distance (LD) or edit distance provides the distance, or the number of edits (deletions, insertions, or substitutions) needed to transform the source string into the target string. Click here for more information.

 For example, if the source string is Mohamed and the target string is Mohammed, then the LD = 1, because there is one edit (insertion) required to match the source and target strings.

·         Scoring method. This can be one of the following:

§       Levenshtein: The Levenshtein Distance (LD) or edit distance provides the distance, or the number of edits (deletions, insertions, or substitutions) needed to transform the source string into the target string. For example, if the source string is Mohamed and the target string is Mohammed, then the LD = 1, because there is one edit (insertion) required to match the source and target strings.

§       Jaro Winkler: The Jaro Winkler similarity is the measure of the edit distance between two strings. For example, if the source string is Mohammed Ali and the target string is Mohammed Ali, then the similarity = 1.

§       Reverse Jaro Winkler: In the Reverse Jaro Winkler, matches are generated even if the string is reversed. For example, if the source string is Mohammed Ali and the target string is Ali Mohammed, then the similarity = 1.

§       Individual SAN: The details are provided in the Matching Guide.

§       Entity SAN: The details are provided in the Matching Guide.

§       Individual PEP: The details are provided in the Matching Guide.

§       Entity PEP: The details are provided in the Matching Guide.

§       Individual EDD: The details are provided in the Matching Guide.

§       Entity EDD: The details are provided in the Matching Guide.

·        Set threshold value which if crossed means that the attribute is considered for matching

·        Weightage assigned to the attribute (total of all attributes within a rule must equal 1)

·        Must check box (optional). If this check box is selected, then there must be a match on this attribute; if not, no matches are generated for this rule.

Each combination of attributes in the match rule will be scored. If the threshold for an attribute is greater than the specified attribute level threshold then the score contributes to the overall score. If data is null for either the source or target attribute a score of 50 is given. Attribute level scores are multiplied by the weightage and then added to get the weighted average score for the customer and watchlist record. If the score is greater than the rule threshold, then the record is considered for matching.

If there are two or more rules in the ruleset then the maximum score is taken. If this score is greater than the threshold defined for the ruleset, than the two records are a match.

To add a ruleset, follow these steps:

1.     In the Pipeline Designer page, select the pipeline you want to edit. The Pipeline Designer window appears. 

2.      Hover over the Matching Rules widget and click Edit Edit icon. Provide details as described in the following table:

Matching Rules Widgets and their Descriptions

Field

Description

Ruleset Name

Enter the name for your ruleset. This is a mandatory field.

Description

Enter the description of the ruleset. This is a mandatory field.

Scoring Aggregation Type

Select the scoring type. Currently, only Maximum is available.

Set Threshold

Enter the threshold value for the ruleset.

Source

Select Filter  to add values for the source entity in the Add Source Entity Filters window. To add a value, click Add  and provide the required attribute, operator, and value.

Attributes can be Business Domain Code, Customer Type Code, or Jurisdiction Code. Enter the value based on the attribute. For example, a value for jurisdiction code can be JC1.

Click Save  to save the values or click Close  to go back to the Matching Ruleset window.

Target

Select Filter  to add values for the target entity in the Add Source Entity Filters window. To add a value, click Add  and provide the required attribute, operator, and value.

Attributes can be Entity Type, WatchList Type, or WatchList Sub Type. Enter the value based on the attribute. For example, a value for watchlist type can be SAN.

Click Save  to save the values or click Close  to go back to the Matching Ruleset window.

Rules

Select Add  to add a rule for the ruleset.

Name

Enter the rule name.

Description

Enter the description of the ruleset. This is a mandatory field.

Rule Threshold

Enter the threshold value for the rule.

Mappings

Select Add  to add a matching configuration for the rule.

Source Attribute

Select one or more source attributes from the customer record that must be matched.

Target Attribute

Select one or more attributes from the watch list against which matching is performed.

Match Type

Select the matching type. The following match types are available:

·        Exact

·        Fuzzy

·         Date

Scoring Method

Select the scoring method if you have selected the match type as Fuzzy. The following scoring methods are available:

·        Levenshtein: The Levenshtein Distance (LD) or edit distance provides the distance, or the number of edits (deletions, insertions, or substitutions) needed to transform the source string into the target string. For example, if the source string is Mohamed and the target string is Mohammed, then the LD = 1, because there is one edit (insertion) required to match the source and target strings.

·        Jaro Winkler: The Jaro Winkler similarity is the measure of the edit distance between two strings. For example, if the source string is Mohammed Ali and the target string is Mohammed Ali, then the similarity = 1.

·        Reverse Jaro Winkler: In the Reverse Jaro Winkler, matches are generated even if the string is reversed. For example, if the source string is Mohammed Ali and the target string is Ali Mohammed, then the similarity = 1.

·        Individual SAN: The details are provided in the Matching Guide.

·        Entity SAN: The details are provided in the Matching Guide.

·        Individual PEP: The details are provided in the Matching Guide.

·        Entity PEP: The details are provided in the Matching Guide.

·        Individual EDD: The details are provided in the Matching Guide.

·        Entity EDD: The details are provided in the Matching Guide.

Threshold

Enter the threshold score.

Weightage

Enter the weightage.

Condition

If this check box is selected, then this condition must be met for matching.

 

3.       Click Save  to save the changes. The rule is created and is visible on the canvas. It is also available for use in the Matching Ruleset window.  

When you have finished looking through the fields and want to go back to the Pipeline Designer window, click Close  to close the window. Finally, click Save  to save the updates made.