3.1 Jaro Winkler
- The strings contain the same characters but within a certain distance from one another.
- The order of the matching characters is the same.
To be precise, the distance of finding a similar Character is one Character less than half of the length of the longest string. So if the longest string has a length of five, a character at the start of string 1 must be found before or on ((5/2)–1) ~ 2nd position in the string 2. This is considered a good match. Hence, the algorithm is directional and gives a high score if matching is from the beginning of the strings.
textdistance.jaro_winkler("mes", "messi") 0.86
textdistance.jaro_winkler("crate", "crat") 0.96
textdistance.jaro_winkler("crate", "atcr") 0.0
In the first case, as the strings match from the beginning, a high score is given. Similarly, in the second case, only one Character was missing, and that is also at the end of string 2, a very high score is given. In the third case, the last two Characters of string 2 are rearranged by bringing them at the front, resulting in 0% similarity.