1.3.4.9.14 Match Transformation: Make Array from String

The Make Array from String transformation allows a single text value to be broken up into a variable number of distinct values. This is useful when creating clusters for matching, as clusters will be created for each distinct value created. This ensures that any values with a common word in them, regardless of the order of that word within the value, will be in the same cluster for matching purposes. For example, where a Name identifier has values 'John Simpson' and 'Simpson, J', clustering by making an array using comma and space delimiters will ensure the two records are in the same cluster ('Simpson').

The Make Array from String transformation is functionally the same as the main Make Array from String processor, but is used specifically when clustering to split values into several words to use as cluster keys.

Note that Make Array from String cannot be used within comparisons.

Use the Make Array from String transformation as the last transformation when clustering in order to ensure that records will be brought together into the same cluster if they have any word in common.

The following table describes the configuration options:

Configuration Description

Options

Specify the following options:

  • Delimiter Reference Data: provides a way of specifying a standard, reusable set of delimiter characters or Strings for breaking up data, and allows you to use control characters as delimiters. Type: Reference Data. Default value: *Delimiters.

  • Delimiter characters: allows you to specify delimiters to use without having to create reference data for simple delimiters such as space or comma. Note that if these are used in addition to a reference list, all delimiters from both options will be used to break up the data. Type: Free text. Default value: Space.

Example

In this example, the Make Array from String transformation is included in the configuration of a cluster on an Address1 identifier.

Example configuration

The following transformations are added to the Address1 identifier to form a cluster:

  1. Upper Case

  2. Strip Numbers

  3. Strip Words (to remove very common words such as The, House, Road, Street, Avenue, Lane, etc.)

  4. Normalize Whitespace

  5. Make Array from String

Example transformations

The following table shows examples of transformations using the above configuration:

Table 1-86 Example Transformations for Make Array from String

Value Value after first 4 transformations Value after Make Array from String transformation

The Maltings, 14 Appletree Lane

MALTINGS, APPLETREE

1 - MALTINGS

2 - APPLETREE

14 Appletree Lane

APPLETREE

1 - APPLETREE

The Maltings

MALTINGS

1 - MALTINGS

32 Rushton Road, Coventry

RUSHTON, COVENTRY

1 - RUSHTON

2 - COVENTRY

32 Rushton Rd

RUSHTON

1 - RUSHTON

15 Stroud Green Road

STROUD GREEN

1 - STROUD

2 - GREEN

14 Green End Avenue

GREEN END

1 - GREEN

2 - END

All records that share a common value after transformation will be in the same cluster. For example, the first two records above will be in the 'APPLETREE' cluster, and the first and third records will be in the 'MALTINGS' cluster.