1.3.11.28 Normalize Whitespace

The Normalize Whitespace processor normalizes all the whitespace in String values so that multiple spaces in between words are normalized to a single space character. It also removes leading and trailing whitespace.

Whitespace is defined in EDQ as:

  • Spaces

  • Non-printable characters, such as carriage returns, line feeds and tabs (and all other ASCII characters 0-31)

Normalize Whitespace is often used before parsing free text fields, to ensure that all values have regular spacing. It is also often useful after other transformations, which may leave extra spaces. For example, when text fields have words or numbers stripped from them, this may leave additional spaces in between words.

The following table describes the configuration options:

Configuration Description

Inputs

Specify any String or String Array type attributes where you wish to normalize whitespace. Number and Date attributes are not valid inputs.

Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.

Options

None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • [Attribute Name].WhitespaceNormalized: A new attribute with normalized spacing between words. Value is derived from the original attribute value, with whitespace normalized.

Flags

None.

The Normalize Whitespace transformer presents no summary statistics on its processing.

In the Data view, each input attribute is shown with its new derived attribute with whitespace normalized to the right.

Output Filters

None.

Example

In this example, the Normalize Whitespace processor is used to normalize the spaces between words in an attribute containing the first line of an address:

Address1 Address1.WhitespaceNormalized

Medway House[space][space][space], Bridge Street

Medway House[space], Bridge Street

Monarch Mill[space][space], Jones Street

Monarch Mill[space], Jones Street

Unit 1[space][space], Barnard Road

Unit 1[space], Barnard Road

Alston Street[space][space][space][space],

Alston Street[space],