1.3.4.9.10 Match Transformation: Last N Words

The Last N Words transformation allows matching to use only the last few (N) words either when clustering or performing comparisons.

Use the Last N Words transformation when you are matching on an identifier where there are many words, but where the words towards the end of the value are more useful for matching purposes than the words at the beginning of the value. This is often used when matching company names, such that branch names or other subsidiary words that are appended to a company name are considered when matching.

The following table describes the configuration options:

Configuration Description

Options

Specify the following options:

  • Delimiters Reference Data: allows the use of a standard set of characters that are used to split up words before taking the last n. Type: Reference Data. Default value: *Delimiters.

  • Delimiter characters: specifies an additional set of characters that are used to split up words before taking the last n. Type: Free text. Default value: Space.

  • Number of words: the number of words (counted from the right) that you want to keep when transforming values for the identifier. Type: Integer. Default value: None.

Example Configuration

In this example, the last N Words transformation is used within a Character edit distance comparison (see Comparison: Character Edit Distance) to match company names, where values frequently contain extra words not required for matching.

Delimiters Reference Data: *Delimiters

Delimiter characters: None

Number of words: 2

Example Transformation

The following table shows examples of transformations using the above configuration:

Table 1-82 Example Transformations for Last N Words

Value Transformed Value

Barclays Bank Plymouth Branch

Plymouth Branch

Barclays Bank Coventry Branch

Coventry Branch

Henkel Loctite

Henkel Loctite

Henkel Loctite Adhesives Limited

Adhesives Limited

Wingford Confectioners

Wingford Confectioners