1.3.11.26 Metaphone

The Metaphone processor converts the values for a String attribute into a code which represents the phonetic pronunciation of the original string, using the Double Metaphone algorithm.

The Double Metaphone algorithm is a more general phonetic technique than Soundex (which is specifically designed for people's names), and is more sophisticated and context-sensitive than the original Metaphone algorithm.

Note:

the remainder of this documentation refers to 'Metaphone codes'. However, it is the Double Metaphone algorithm that is used throughout.

Metaphone codes are particularly useful where spelling discrepancies may occur in words that sound the same, for example, where information has been captured over the telephone. By considering the pronunciation of the string instead of the exact string value, many minor variances can be overcome. A Metaphone code is therefore a good alternative to the raw data value when performing a duplicate check, making it is easier to identify possible duplicate or 'equivalent' values.

The processor allows you to specify the maximum length of the Metaphone code (up to a maximum of 12 characters) so that it can be focused solely on the first few syllables or words of complex data rather than the entire column, and so that you can control the sensitivity of the phonetic similarity between values.

The following table describes the configuration options:

Configuration Description

Inputs

Specify any String or String Array attributes.

Note that if you input an Array attribute, the transformation will be applied to all array elements, and an Array attribute will be output.

Options

Specify the following options:

  • Maximum result length: allows you to vary the maximum length of the Metaphone code to be produced. Specified as a Number from 1-12. Default value: 12.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • [Attribute Name].Metaphone: a new attribute with the Metaphone code derived from the input attribute. Value is derived from the original attribute value, converted to its Metaphone code.

Flags

None.

The Metaphone transformation processor presents no summary statistics on its processing.

In the Data view, each input attribute is shown with its new derived Metaphone attribute to the right.

Output Filters

None. All records input are output.

Example

This example uses the Metaphone processor to transform the NAME attribute in the Customers table. In this case, the default maximum length of 12 characters was used:

NAME (asc) NAME.Metaphone

James TODTENHAUPT

JMSTTNPT

James WYLIE

JMSL

James WYLLIE

JMSL

Jane MCCULLOCH

JNMKLK

Jane MCLACHAN

JNMKLKN

Jane MCWILLAIM

JNMKLM

Jane MILLIGAN

JNMLKN

Note that James WYLIE and James WYLLIE have the same Metaphone code.