1.3.4.9 List of Matching Transformations

Transformations may be used within match processors both when clustering values, and when comparing values, in order to attain better matching results, by transforming the source values. This allows you to use transformations for matching purposes without the need to configure chains of transformations prior to matching.

A number of transformations may be used, in order, within each cluster configuration or comparison. The transformations must be compatible with the data type of the identifier (though you can also change the data type using a transformation).

The following matching transformations are provided as part of EDQ. These are similar to the main transformation processors, but designed for quick use when clustering or comparing values in a match processor.

Matching Transformations

Transformation Compatible Identifier Type Description Example Transformations

Absolute Value

Number, Number Array

Converts number values into absolute values; that is, converting negative values to positive values, and removing unnecessary digits.

"-1.5" -> "1.5"

"1.5" -> "1.5"

"0001908" -> "1908"

Character Replace

String, String Array

Replaces individual characters in a string attribute.

"é" to "e"

Convert Date to String

Date

Converts date values to Strings, using a date format.

Using the format dd-MMM-yyyy:

"23-Mar-2001 00:00:00" (date) -> "23/03/2001" (String)

Convert Number to String

Number

Converts number values to Strings, using a number format.

Using the format 0.0:

"175.66" (number) -> "175.6" (String)

"175.00" (number) -> "175.0" (String)

Convert String to Date

String

Converts String values to dates, using a date format.

Using the format dd/MM/yyyy:

"01/11/2001" (String) -> "01-Nov-2001 00:00:00" (date)

"10/04/1975" (String) -> "10-Apr-1975 00:00:00" (date)

Convert String to Number

String

Converts String values to numbers, using a number format.

Using the format 0.0:

"28" (String) -> "28.0" (number)

"68.22" (String) -> "68.2" (number)

Denoise

String, String Array

Strips String values of 'noise' characters such as #'<>,/?*%+.

"Oracle (U.K.)" -> "Oracle UK"

"A+D Engineering" -> "AD Engineering"

"John#Davison" -> "JohnDavison"

"SIMPSON, David" -> "SIMPSON David"

Deduplicate Date Array

Date Array

Deduplicate the dates within an array.

Input: {Jun 22 2015 10:14:22 AM}{Feb 17, 1986 12:00:00 AM}{Jun 22 2015 10:14:22 AM} Output: {Jun 22 2015 10:14:22 AM}{Feb 17, 1986 12:00:00 AM}

Deduplicate Number Array

Number Array

Deduplicate the numbers within an array.

Input: {32}{14}{2}{32}Output: {32}{14}{2}

Deduplicate String Array

String Array

Deduplicate the string elements within an array.

Input: {A}{B}{A}Output: {A}{B}

First N Characters

String, String Array

Strips String values down to the first n characters in the value.

Where Number of characters = 4:

"Simpson" -> "Simp"

"Simposn" -> "Simp"

"Robertson" -> "Robe"

First N Words

String, String Array

Strips String values down to the first n words in the value.

Where Number of words = 2:

"Barclays Bank (Sheffield)" -> "Barclays Bank"

"Balfour Beatty Construction" -> "Balfour Beatty"

Generate Initials

String, String Array

Generates initials from String values.

Where Ignore words of less than = 4:

"IBM" -> "IBM"

"International Business Machines" -> "IBM"

"Price Waterhouse Coopers" -> "PWC"

"PWC" -> "PWC"

"Aj Smith" -> "AS"

"A j Smith" -> "AJS"

Last N Words

String, String Array

Strips String values down to the last n words in the value.

Where Number of words = 2:

"(Sheffield) Barclays Bank" -> "Barclays Bank"

"Balfour Beatty Construction" -> "Beatty Construction"

Last N Characters

String, String Array

Strips String values down to the last n characters in the value.

Where Number of characters = 5:

"01223 421630" ->"21630"

"07771 821630"->"21630"

"01223 322766"->"22766"

Lower Case

String, String Array

Converts String values into lower case.

"ORACLE" -> "oracle"

"Oracle" -> "oracle"

"OraCle" -> "oracle"

Make Array from String

String

Converts a String value into an array of values, where each value in the array forms a separate index key.

Using comma and space delimiters:

"John Simpson" -> "John", "Simpson"

"John R Adams" -> "John", "R", "Adams"

"Adams, John" -> "Adams", "John"

Metaphone

String, String Array

Generates a metaphone value from a String.

"John Murray" -> "JNMR"

"John Moore" -> "JNMR"

"Joan Muir" -> "JNMR"

Normalize Whitespace

String, String Array

Converts all sequences of whitespace characters to a single space.

"10 Harwood Road" -> "10 Harwood Road"

"3 Perse Row" -> "3 Perse Row"

Replace

String, String Array

Standardizes values using a reference data map, for example to standardize common synonyms.

Where the reference data map contains the appropriate replacements:

"Bill" -> "William"

"Billy" -> "William"

"William" -> "William"

Round

Number, Number Array

Rounds number values to a given number of decimal places.

Rounding up to two decimal places:

"175.853" -> "175.85"

"180.658" -> "180.66"

Round Extra

Number

Rounds numbers and outputs multiple rounded values.

Rounding to the nearest 10, outputting 3 numbers:

"45" -> "50", "40, "60"

"23" -> "20", "10, "30"

Script

Any

Allows the use of a custom scripted match transformation.

Transformation determined by the custom script.

Select Array Element

Any

Allows you to select an individual array element from any position in an array, to use when clustering or comparing values.

"11 Grange Road, Cambridge" -> "Cambridge"

Soundex

String, String Array

Generates a soundex value from a String.

"Smith" -> "S530"

"Snaith" -> "S530"

"Clark" -> "C462"

"Clarke" -> "C462"

"Clarke-Jones" -> "C462"

Strip Numbers

String, String Array

Strips all numbers from a String.

"CB37XL" -> "CBXL"

"7 Harwood Drive" -> " Harwood Drive"

"Lemonade 300ML" -> "Lemonade ML"

Strip Words

String, String Array

Strips words from String values, using a reference data list of words.

Where the reference data list contains company suffixes:

"ORACLE CORP" -> "ORACLE"

"VODAFONE GROUP PLC" -> "VODAFONE GROUP"

"ORACLE CORPORATION" -> "ORACLE"

Trim Whitespace

String, String Array

Strips whitespace (spaces and non-printing characters) from a String.

"Nigel Lewis" -> "NigelLewis"

"Nigel Lewis" -> "NigelLewis"

" Nigel Lewis " -> "NigelLewis"

Upper Case

String, String Array

Converts String values into upper case.

"Oracle" -> "ORACLE"

"OraCle" -> "ORACLE"

"oracle" -> "ORACLE"