1.3.4.9.7 Match Transformation: Denoise

The Denoise transformation allows values to be stripped of 'noise' characters - either when clustering or comparing values, in the same way as the main Denoise processor. This increases matching accuracy, as noise characters can detract from the ability to find matching records. For example, the values "Castle (Investments) Ltd" and "Castle Investments Ltd" are a strong match, but without removing the parentheses from the former value, they would have a character edit distance of 2.

Use the Denoise transformation when matching records using an identifier where values were entered using a free text field. Free text fields cause the same data to be entered in many formats, and can also cause typographical errors which may include the insertion of 'noise' characters such as ( and ). The Denoise transformation allows such errors to be overcome when matching.

The following table describes the configuration options:

Configuration Description

Options

Specify the following options:

  • Noise characters Reference Data: list of noise values (characters or text Strings). Type: Reference Data. Default value: *Noise Characters.

  • Noise characters: additional noise characters. Type: Free text. Default value: None.

    Note: All characters are treated as additional individual denoise characters. The value is not considered as a text String to remove where it appears.

Example

In this example, data has been imported from a text file, so all attributes have String types. In Data Type Profiling (see Data Types Profiler), one of the attributes was found to contain number values corresponding to phone number area codes. The data is converted to a Number format when clustering.

Example configuration

In this example, the Denoise transformation is used to strip noise characters from company names when matching. The following noise characters are used:& + ( ) - *

Example transformations

The following table shows examples of transformations using the above configuration:

Table 1-79 Example Transformations for Denoise

Value Transformed Value

Castle (Investments) Ltd

Castle Investments Ltd

Castle Investments Ltd

Castle Investments Ltd

Ipswich & Norwich Co-op

Ipswich Norwich Coop

Ipswich + Norwich Co-operative

Ipswich Norwich Cooperative

Barclays Bank - Cambridge

Barclays Bank Cambridge

Barclays Bank (Cambridge)

Barclays Bank Cambridge

George & Sons ***in administration***

George Sons in administration