1.3.4.8.4 Comparison: Character Match Percentage

The Character Match Percentage comparison determines how closely two values (String, String Array) match each other by calculating the Character Edit Distance between two String values, and also taking into account the length of the longer or shorter of the two values, by character count.

Use the Character Match Percentage comparison to find matches where values are of varying lengths (such as names), and there might be spelling mistakes in the original values. For example, when matching company names, the values "ABC" and "BBC" have a Character Edit Distance of 1, and might be deemed a close match by other comparisons. However, their Character Match Percentage is only 66%, whereas the Character Match Percentage of "Oracle" and "Oracles", which also have a Character Edit Distance of 1, is 90%, indicating a stronger match.

This comparison supports the use of result bands.

The following table describes the configuration options:

Option Type Description Default Value

Match No Data pairs?

Yes/No

This option determines the result of a comparison when it compares two No Data (Null, or containing only whitespace characters) values for an identifier.

If set to No, the comparison will give a 'no data' result when comparing a No Data value against another No Data value.

If set to Yes, the comparison will give a full match (a Character Match Percentage of 100%) when comparing a No Data value against another No Data value. A 'no data' result will only be returned if a No Data value is compared against a populated value.

No

Ignore case?

Yes/No

Sets whether or not to ignore case when comparing values.

For example, if case is ignored, "Oracle Corporation" will match "ORACLE CORPORATION" with a Character Match Percentage of 100%.

Yes

Relate to shorter input?

Yes/No

This option drives the calculation made by the Character Match Percentage comparison.

If set to Yes, the result is calculated as the percentage of characters from the shorter of the two inputs (by character count) that match the longer input.

If set to No, the result is calculated as the percentage of characters from the longer of the two inputs (by character count) that match the shorter input.

No

Example

In this example, the Character Match Percentage comparison is used to match company names. The following options are specified:

  • Match No Data pairs? = No

  • Ignore case? = Yes

  • Relate to shorter input? = No

The following transformations are added:

  1. Trim Whitespace, to remove all whitespace from values before comparing them

  2. Strip Words, using *Business Suffix Map (which includes the words 'Ltd' and 'Limited')

The following table illustrates some example comparison results using the above configuration:

Table 1-37 Example Results: Character Match Percentage

Value A Value B Comparison Result

ABC ltd

ABC limited

100%

ABC ltd

BBC

66%

Fast track systems

Fastrack systems

93%

BT

BTAT

50%

Gemini Partners

Gemmini Partners

93%