1.3.4.9.8 Match Transformation: First N Characters

The First N Characters transformation allows matching to ignore the tail end of values when performing comparisons, by stripping the values to a number (N) of characters as read from the left of the value.

This is similar to using the main Trim Characters processor, trimming to the first few characters of a value.

Use the First N Characters transformation when you want to cluster using the first few characters of an identifier, or if you are matching on an identifier where the tail end of the value might be 'noise'. This is often used in a secondary match rule, using an Exact String Match comparison (see Comparison: Exact String Match), to find possible matches where the key part of the identifier is the same, but where the remaining parts are very different, and therefore hard to find using other comparisons. For example, when matching addresses, if the first 8 characters of the first line of an address are the same, there is quite a strong possibility of a match, even if one of the values may contain much more data than the other.

The following table describes the configuration options:

Configuration Description

Options

Specify the following options:

  • Number of characters: The number of characters (counted from the left) that you want to keep and use when transforming values for an identifier. Type: Integer. Default value: 1.

  • Characters to ignore: an optional number of characters (counted from the left of the value) that will be skipped before counting a number of characters to keep in the transformed value. This allows you to skip over common prefixes before transforming values. Type: Integer. Default value: 0.

Note:

Whitespace characters such as spaces and carriage returns are counted as characters like any others, if they exist in the values. You may want to use a Trim Whitespace transformation before using this transformation, in order to ensure that you are selecting data characters.

Example configuration

In this example, the First N Characters transformation is used to match the first line of addresses, where it is known that some of these first lines contain more information than just the premise name.

Number of characters: 8

Characters to ignore: 0

Example transformations

The following table shows examples of transformations using the above configuration:

Table 1-80 Example Transformations for First N Characters

Value Transformed Value

Homesteads, 145 Herring Way

Homestea

Homesteads

Homestea

135 Burbage Road, Minster, MI5 6DF

135 Burb

135 Burbage Road

135 Burb