3.1 Jaro Winkler

This algorithm gives high scores for the following strings:
  • The strings contain the same characters but within a certain distance from one another.
  • The order of the matching characters is the same.

To be precise, the distance of finding a similar Character is one Character less than half of the length of the longest string. So if the longest string has a length of five, a character at the start of string 1 must be found before or on ((5/2)–1) ~ 2nd position in the string 2. This is considered a good match. Hence, the algorithm is directional and gives a high score if matching is from the beginning of the strings.

For example:
  • textdistance.jaro_winkler("mes", "messi") 0.86
  • textdistance.jaro_winkler("crate", "crat") 0.96
  • textdistance.jaro_winkler("crate", "atcr") 0.0

In the first case, as the strings match from the beginning, a high score is given. Similarly, in the second case, only one Character was missing, and that is also at the end of string 2, a very high score is given. In the third case, the last two Characters of string 2 are rearranged by bringing them at the front, resulting in 0% similarity.