8 Managing Rulesets

A Ruleset is a set of rules that are applied to the defined source and target entities, compares the attributes of the entities to derive a match. For information on matching rulesets, see the Financial Crime Graph Model Matching Guide.

Accessing the Rulesets

To access the Rulesets page, follow these steps:

1.       Navigate to the FCC Studio workspace.

2.      Click the Navigation Menu Menu_icon.png on the upper-left corner.

The menu items are listed.

3.      Click Ruleset.

The Ruleset page is displayed with all the out-of-the-box rulesets.

Figure 1:   Ruleset Page

ER_page.png 

Creating Rulesets

To create a ruleset, follow these steps:

1.       Navigate to the Ruleset page.

2.      Click Create.

The Ruleset Details page is displayed.

ER_page2.png 

3.      Enter the following details.

 

Field

Description

Name

Indicates the name of the ruleset.

Description

Indicates the additional description given for the ruleset.

Scoring Aggregation Type

Indicates the scoring aggregation method.

Select one of the following options:

·        Maximum: Considers the highest score obtained out of all the rules created for a ruleset.

·        Minimum: Considers the lowest score obtained out of all the rules created for a ruleset.

Set Threshold

Indicates the threshold value set for a ruleset. A Similarity Edge is generated only when the maximum score obtained for a ruleset is equal to or higher than the threshold value.

Source

Indicates the source entity (node).

The values are auto-populated from the metadata table that contains the elastic search index names generated as a result of running the Sqoop job.

Target

Indicates the target entity (node).

The values are auto-populated from the metadata table that contains the elastic search index names generated as a result of running the Sqoop job.

Creating Rules in a Ruleset

To create rules in a ruleset, follow these steps:

1.       Navigate to a Ruleset Details page.

2.      Click Create to add a new rule.

A New Rule section is displayed.

3.      Enter the following details:

 

Field

Description

Name

Indicates the name of the rule.

Description

Indicates the description of the rule.

Rule Threshold

Indicates the threshold value set for a rule. This rule contributes to the match­ing, only when the maximum score obtained for a rule is equal to or higher than the threshold value.

4.     Click Create to add new Mappings:

 

Field

Description

Source Attribute

Indicates the source attribute.

Target Attribute

Indicates the target attribute.

Match Type

Indicates the match type.

Select one of the following options:

·        Exact: To obtain the matches that are 100% perfect when finding the entities in a database.

·           Fuzzy: To obtain the matches that are less than 100% perfect when finding the entities in a database.

Scoring Method

The scoring methods used are as follows:

·        Default

·        Jaro Winkler

For more information, see Scoring Method.

Threshold

Indicates that a score below the mentioned value does not generate a result from the elastic search.

Weightage

Indicates the weightage given for the attributes in the rule.

Condition

Indicates that this attribute cannot have a null value. This attribute must be populated and must return a value for the matching.

Scoring Method

The scoring methods used in the entity resolution component are as follows:

·        Default Method

The distance is computed by finding the number of edits which transforms one string to another. The transformations allowed are as follows:

§        Insertion: Adding a new character

§        Deletion: Deleting a character

§        Substitution: Replace one character with another

By performing these operations, the algorithm attempts to modify the first string to match the second one. The final result obtained is the edit distance.

For example:

a.      textdistance.levenshtein('arrow', 'arow')

1

b.     >> textdistance.levenshtein.normalized_similarity('arrow', 'arow')

0.8

Here, if you insert single ‘r’ in string 2, that is, ‘arow’, it becomes same as the string 1. Hence, the edit distance is 1. Similar with Hamming distance, you can generate a bounded similarity score between 0 and 1. The similarity score obtained is 80%.

·        Jaro Winkler

This algorithms gives high scores for the following strings:

a.      The strings that contain same characters, but within a certain distance from one another.

b.     The order of the matching characters is same.

To be precise, the distance of finding similar character is one character less than half of the length of the longest string. So if the longest string has a length of five, a character at the start of the string 1 must be found before or on ((5/2)–1) ~ 2nd position in the string 2. This is considered a valid match. Hence, the algorithm is directional and gives high score if matching is from the beginning of the strings.

For example:

a.      textdistance.jaro_winkler("mes", "messi")

0.86

b.     textdistance.jaro_winkler("crate", "crat")

0.96

c.      textdistance.jaro_winkler("crate", "atcr")

0.0

In first case, as the strings are matching from the beginning, high score is given. Similarly, in the sec­ond case, only one character was missing and that too at the end of the string 2, hence a very high score is given. In third case, the last two character of string 2 are rearranged by bringing them at front and hence results in 0% similarity.