1.3.4.9.17 Match Transformation: Replace

The Replace transformation allows you to use a Reference Data Map in order to standardize data for the purpose of either clustering or matching.

This transformation works in exactly the same way as the main Replace processor in EDQ. This help page describes the typical use of the Replace transformation when matching.

The Replace transformation is very useful to overcome the variations of the same value that users often enter into free text fields.

There are many cases where the same value may be represented in a number of different ways. For example "Bill", "William", and "Billy" are all different forms of the same first name, and "Hse", "House" and "Ho." are all forms of the word "House". A Reference Data Map allows such synonyms to be standardized to a single way of representing the value.

For example, the following entries could exist in a Reference Data Map, and be used to standardize first names for the purposes of matching:

Table 1-89 Reference Data Map Values

Value Standardization

Bill

William

Billy

William

Willy

William

Mike

Michael

Mick

Michael

Mickey

Michael

Dave

David

Jim

James

The following table describes the configuration options:

Configuration Description

Options

Specify the following options:

  • Reference data: matches the attribute values against the lookup column in the map. Where there is a match, the matching value is replaced by the value in the right-hand column. Type: Reference Data. Default value: None.

  • Match longest value first?: controls which replacement to perform where there are multiple matches against the map, in Starts With, Ends With, or Contains replacement. Type: Yes/No. Default value: No.

  • Ignore case?: determines whether or not to ignore case when matching the lookup column of the map. Type: Yes/No. Default value: Yes.

  • Match by: drives how to match the map, and therefore which part of the original value to replace. Type: Selection (Whole Value/Starts With/Ends With/Contains/Delimiter Match). Default value: Whole Value.

  • Delimiters: when matching values to the map by splitting the data using delimiters, this allows you to specify the delimiter characters to use. Type: Free text entry. Default value: Space.

Example

In this example, the Replace transformation is used to standardize first names for the purpose of matching.

Example configuration

Reference Data:

Table 1-90 Example Reference Data

Value Map Active

Bill

William

Yes

Billy

William

Yes

Willy

William

Yes

Will

William

Yes

Mike

Michael

Yes

Micheal

Michael

No

Mickey

Michael

Yes

Dave

David

Yes

Steven

Stephen

Yes

Steve

Stephen

Yes

Jim

James

Yes

Match longest value first?: No

Ignore case?: Yes

Match by: Whole Value

Delimiters: <space>

Example transformations

The following table shows example Replace transformations using the above configuration:

Table 1-91 Example Transformations for Replace

Value Transformed Value

Steven Lewis

Stephen Lewis

Stephen Lewis

Stephen Lewis

David Stevens

David Stevens

David Steven

David Stephen

Mike Davis

Michael Davis

Micheal Lewis

Micheal Lewis

Mickey Lewis

Michael Lewis

Jim Jones

James Jones

James Jones

James Jones

Bill Taylor

William Taylor

Will Taylor

William Taylor