1.3.11.38 Soundex

The Soundex processor generates a soundex code for each value in a specified attribute. Soundex is an abstract key which represents similar sounding names as the same code. Soundex is specifically applicable to family / surnames (although is sometimes used – with care - in other domains).

Soundex codes are used where spelling or transcription differences occur in names that sound the same. Having created a soundex code, you would often use the soundex instead of the raw data value in a duplicate check.

The following table describes the configuration options:

Configuration Description

Inputs

Specify any String or String Array attributes from which you want to create a soundex code.

Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.

Options

None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • Soundex: a new attribute with the soundex code derived from each input attribute.

Flags

None.

The Soundex transformer presents no summary statistics on its processing.

In the Data view, the input array attribute is shown with the new array size attribute to its right.

Output Filters

None. All records input are output.

Example

This example uses the Soundex transformation on a Surname attribute. The Surname attribute was created from the NAME attribute in the Customers table, by splitting the attribute using a Make Array from String processor, using a space separator, and outputting the Surname by selecting the second element in the array using Select Array Element processor:

Surname (asc) Surname.Soundex

ADAMSKI

A352

AHMED

A530

AITKEN

A325

ALLAN

A450

ALLEN

A450

Note that where values should possibly be the same and may be the subject of typos, such as ALLAN/ALLEN, the same soundex code is generated.