1.3.4.9.9 Match Transformation: First N Words

The First N Words transformation allows matching to use only the first few (N) words either when clustering or performing comparisons.

Use the First N Words transformation when you are matching on an identifier where there are many words, but where the words towards the end of the value are less useful for matching purposes than the words at the beginning of the value. This is often used when matching company names, such that branch names or other subsidiary words that are appended to a company name are ignored when matching, even though in other cases the same words may be useful for company identification (and therefore not stripped from the value using a Strip Words transformation). For example, to match "Barclays Bank Coventry" with "Barclays Bank Leicester Branch".

The following table describes the configuration options:

Configuration Description

Options

Specify the following options:

  • Delimiters Reference Data: allows the use of a standard set of characters that are used to split up words before taking the first n. Type: Reference Data. Default value: *Delimiters.

  • Delimiter characters: specifies an additional set of characters that are used to split up words before taking the first n. Type: Free text. Default value: Space.

  • Number of words: the number of words (counted from the left) that you want to keep when transforming values for the identifier. Type: Integer. Default value: None.

Example configuration

In this example, the First N Words transformation is used within a Character edit distance comparison (see Comparison: Character Edit Distance) to match company names, where values frequently contain extra words not required for matching.

Delimiters Reference Data: *Delimiters

Delimiter characters: None

Number of words: 2

Example transformations

The following table shows examples of transformations using the above configuration:

Table 1-81 Example Transformations for First N Words

Value Transformed Value

Barclays Bank Plymouth Branch

Barclays Bank

Barclays Bank Coventry

Barclays Bank

Henkel Loctite

Henkel Loctite

Henkel Loctite Adhesives Limited

Henkel Loctite

Wingford Confectioners

Wingford Confectioners

Wingford Confectioners (in administration) - contact Mr J Alexander

Wingford Confectioners