Master Data

Dice

You can select the engine to be used for screening by specifying it in Master Data > Power Data > Configurations > Service Preference.

This engine uses the occurrence of bigram characters (two characters) in the party and the restricted party data to compare and compute a coefficient value, which in turn is used to determine the match percentage. The output match factor can vary from 0 to 1. Dice can be employed to compare double-byte characters as well.

Matching Factor Calculation by Dice Engine

The Dice engine performs screening using the following process:

  1. Simplify abbreviations.
    Abbreviations can contain spaces or full stops (.).
    Abbreviations containing a space will have the space removed. For example, "U S A" will be changed to "USA".
    If the optional feature "RPLS - AVOID SPACE IN PLACE OF ABBREVIATION" is enabled and the list of punctuation marks specified in the property gtm.rpsservice.punctuationmarks contains a full stop (.), then abbreviations having a full stop will have it removed. For example, "U.S.A." will be changed to "USA".
  2. Remove punctuations based on the property gtm.rpsservice.punctuationmarks. All the punctuation marks mentioned in this property will be converted to a space.
  3. Split the party and the restricted party details into words.
  4. Since there can be multiple words for the party and the restricted party, GTM prepares all possible combinations of two words (one from party detail and another from restricted party detail), computes their bigrams and then checks for their matching percentage using Dice Coefficient.

    GTM calculates the Dice Coefficients between two words (one from the party and other from the denied party) using the following formula:

    Dice Coefficient Formula = [2 * NC/{NC + Maximum(N1 , N2)}]

    where,

    Number of Bigrams of Party word = N1

    Number of Bigrams of Restricted Party word = N2

    Number of common Bigrams of Party and Restricted Party word= NC

  5. Remove all that are not a match: two words with zero Dice Coefficient is not a match.
  6. For every party word, identify the best target word match.
  7. Calculate the match factor using the best matches.

Detailed Example of Match Factor Calculation Using Dice Engine (Forward and Backward Configuration):

Let us take an example of a restricted party with the name "ZIDAN EMAD ABDELHADIE" and party "ZIDA ABDELHADI" with same address. You can configure a match threshold of 0.85 for name parameter match. Following are the steps showing how the forward and backward match factor is calculated.

  1. Split the party and the restricted party full name into words.
    1. Source Tokens = {ZIDAN, EMAD, ABDELHADIE}
    2. Target Tokens = {ZIDA, ABDELHADI}
  2. Create bigrams of every party and restricted party word.

Party

Restricted Party

ZIDAN = ZI, ID, DA, AN [4 BIGRAMS]

EMAD = EM, MA, AD [3 BIGRAMS]

ABDELHADIE = AB, BD, DE, EL, LH, HA, AD, DI, IE [9 BIGRAMS]

ZIDA = ZI, ID, DA [3 BIGRAMS]

ABDELHADI = AB, BD, DE, EL, LH, HA, AD, DI [8 BIGRAMS]

 

  1. Prepare a combination of two words (one from the party and the other from the restricted party) and calculate Dice Coefficient for each such combination.

Party: Name

Restricted Party: Name

Common Bigrams

Number of Common Bigrams

Dice Coefficient Formula = [2 * NC/{NC + Maximum(N1 , N2)}]

ZIDAN = ZI, ID, DA, AN [4 BIGRAMS]

ZIDA = ZI, ID, DA

[3 BIGRAMS]

ZI, ID, DA

3

2*3/[3+ Max(4, 3)] = 6/7 = 0.85

ABDELHADI = AB, BD, DE, EL, LH, HA, AD, DI

[8 BIGRAMS]

Nil

0

2*0/[0+ Max(4, 8)] = 0

EMAD = EM, MA, AD [3 BIGRAMS]

ZIDA = ZI, ID, DA

[3 BIGRAMS]

Nil

0

2*0/[0+ Max(3, 3)] = 0

ABDELHADI = AB, BD, DE, EL, LH, HA, AD, DI

[8 BIGRAMS]

AD

1

2*1/[1+ Max(3, 8)] = 2/9 = 0.22

ABDELHADIE = AB, BD, DE, EL, LH, HA, AD, DI, IE [9 BIGRAMS]

ZIDA = ZI, ID, DA

[3 BIGRAMS]

Nil

0

2*1/[1+ Max(3, 8)] = 2/9 = 0.22

ABDELHADI = AB, BD, DE, EL, LH, HA, AD, DI

[8 BIGRAMS]

AB, BD, DE, EL, LH, HA, AD, DI

8

2*8/[8+ Max(9, 8)] = 16/17 = 0.94

 

  1. Remove all non-matches (i.e. with zero Dice Coefficient) and arrange the remaining matches in the order of Dice Coefficient.

    The remaining combinations are:

    Party: Name

    Restricted Party: Name

    Dice Coefficient

    ABDELHADIE

    ABDELHADI

    0.94

    ZIDAN

    ZIDA

    0.85

    EMAD

    ABDELHADI

    0.22

     

Identify the best matches among the remaining combinations. ABDELHADI from Restricted Party is considered a better match with the ABDELHADIE from Party (94%) than EMAD from party (22%). Hence, EMAD vs ABDELHADI (22%) is discarded. Thus remains:

Party: Name

Restricted Party: Name

Dice Coefficient

ABDELHADIE

ABDELHADI

0.94

ZIDAN

ZIDA

0.85

 

  1. Calculating the number of characters for each party and restricted party word and calculating forward and backward match factor:
    Party = ZIDAN [5 Characters – 85% matching] + EMAD [4 Characters - 0% matching] + ABDELHADIE [10 characters - 94% matching] = 19 Characters

    Restricted Party = ZIDA [4 Characters – 85% matching] + ABDELHADI [9 characters - 94% matching] = 13 Characters

    Forward Match Factor = Total Number of Matching Characters of Party/Total Number of Characters of Party
    = (0.85 * 5 + 0 * 4 + 0.94 * 10)/ (5 + 4 + 10)
    = (4.25 + 0 + 9.4)/19
    = 13.65/19
    = 0.71

    Backward Match Factor = Total Number of Matching Characters of Restricted Party /Total Number of Characters of Restricted Party
    = (0.85 * 4 + 0.94 * 9)/ (4 + 9)
    = (3.4 + 8.46)/13
    = 11.86/13
    = 0.91

    Here, the party will be considered as match to restricted party using backward direction as 0.91 match factor is greater than 0.85 threshold. But it will not be considered as a match using forward direction as 0.71 is less than 0.85 threshold. It is possible that the data of party and restricted party might be the other way causing the forward match factor to be 0.91 and backward match factor to be 0.71. You can use 'Both' option as match direction to perform both forward and backward match and take the best out of the two.

Related Topics