Matching

Individual and entity matching is centered on individual and entity names respectively. Other items of data, such as associated countries and cities, are used to strengthen a possible match.

Match rule groups are places in the following order:

  • Individual name match groups
  • Aircraft name match groups
  • Vessels name match groups
  • Entity name match groups

The following general notes describe the approach to matching:

  • Matches are ranked according to how well the name matches. An exact name match rates as a match at the highest level, with the lowest level being represented by two loosely possible name matches with a different name structure. Further ranking is imposed by how well additional information (such as city or country information, and date of birth information) matches between the records.
  • TBAML allows for various levels of name match, including, but not limited to:
    • Name variation recognition. This is carried out by name standardization. For example, all variations of Mohammed (Muhamad, Mohammad, Mohamed and so on) are substituted with ‘Mohammed’ when matching. This is particularly used for given names, though also applied when matching whole names. For example, more than 20 variations of the name ‘Mohammed’ are recognized and considered to be the same name.
    • Allowances for name abbreviation and initials. For example, ‘Pete’ is a possible match to ‘Peter’, and ‘J’ is a possible match to ‘John’.
    • Allowances for typographical errors and transliteration differences. For example, ‘Abdool’ is a possible match to ‘Abdul’, even if the variants are not standardized.
    • Allowances for names being out of order or structured differently. For example, ‘Mohammed Abbas Al-Tikriti’ can be matched with ‘Mohammed Al-Tikriti Abbas’.
    • Allowance for additional names. For example, ‘Juan Carlos Ferreira’ can be matched with ‘Juan Ferreira’.
    • Allowance for names being split differently. For example, ‘Xiao Jian’ is a match to ‘Xiaojian’.
  • TBAML attempts to prevent false positives by various means, including, but not limited to, the following methods:
    • Backing up typo tolerance with Metaphone matching. For example, ‘Mary’ and ‘Mark’ are not considered a match, although they are only one character different.
    • Backing up typo tolerance with consideration of the percentage of characters that are different. For example, the initials ‘A’ and ‘E’ are not considered a match, even though they are only one character different.
    • Considering the different significance and commonality of name tokens. For example, if name qualifiers such as ‘Al’ are shared between two Arabic names, this is not as significant as if an uncommon name such as ‘Abbas’ is shared.

Note:

It may be advisable to tune the set of match rules that are activated. In particular, you may wish to activate or deactivate some of the lower match rules in the list, which lead to the weakest name matches. Factors affecting the usefulness of these rules include:
  • The policies of the organization.
  • The quality of the transaction data.
  • The provenance of the transaction data.

For example, Asian and Arabic names may be subject to more typographical and name ordering issues than other names. Where the data contains many of these names, the lower strength rules may identify more possible matches. The organization may want to review some or all of these as a matter of policy, or it may consider the matches too weak to review.

The required rules are easily activated or deactivated as needed in TBAML.