1.3.11.29 Pattern Transform

The Pattern Transform processor provides a simple and user-friendly alternative to the use of regular expressions for transforming the format of data in a string or string array attribute.

Pattern Transform uses a Reference Data set (Patterns Map) with two columns to reformat data matching a number of patterns (in the first 'lookup' column) to a new format or formats (in the second 'map' column).

Use Pattern Transform to standardize the format of data in attributes where data ought to, but does not, conform to a small number of standard formats, for example in Postcode, Account Number, or Product Serial Number attributes.

It is useful to use the Patterns Profiler (with the same Character Pattern Map, that is, the same way of generating character patterns) to find the invalid patterns that may exist in your data. These can then be added to the Patterns Map used in this processor, with each invalid pattern mapped to a valid pattern.

Note that Pattern Transform aims to provide as much flexibility as possible within the confines of using a simple map from one character pattern to another. There are some types of text transformation that will require the extra complexity of regular expressions (see Regex Replace).

The full logic of the processor is as follows:

Step Action

1

Map all characters in the input value to pattern characters using the configured Character Pattern Map, to generate a pattern for the value (such as 'AB1243-ZX' -> 'aaNNNN-aa').

2

Match the generated pattern against the Lookup column of the Patterns Map.

  • If there is no match, do not transform the data and output the original value for the output attribute (End)

  • If there is a match (exact only), go to Step 3.

3

Where a pattern character appears in the Lookup column but not in the Map column, the underlying character is stripped from the value (for example, using a map of NN-a to NNa, the value 12-A is transformed to 12A)

Where a pattern character does not appear in the Lookup column of the map, but does appear in the Map column, it is added as a literal character in the output value (for example, using a map of NNaa to NN-aa, the value 12AB is transformed to 12-AB)

Where a pattern character appears in both the Lookup and Map columns of the Patterns Map:

  • Work from left to right and map each underlying character with the same pattern character to the output value, in order (for example, using a map of NaaN to NNaa, the value 1AB2 is transformed to 12AB).

  • If there are more characters of the same type in the Lookup column than in the Map column, strip the right-most underlying characters from the value (for example, using a map of NaaN to Naa, the value 1AB2 is transformed to 1AB).

  • If characters appear in the Map column but not in the Lookup column, these are added to the value as literal characters (for example, using a map of NNaa to EDB-NNaa, the value 12AB is transformed to EDB-12AB).

  • If there are more characters of the same type in the Map column than in the Lookup, the right-most Map characters act as literal characters in the output value (for example, using a map of NNaa to NNaaN, the value 12AB must be transformed to 12ABN).

  • If there are characters in the Map column that are enclosed in single quotes, these are added as literal characters in the transformed value (for example, using a map of NNNa to 'EDN-'NNNa, the value 123N is transformed to EDN-123N).

Note:

All characters mapping to the same pattern character must always appear in the same order in the final pattern (for example, it is possible to transform AB123CD to 123ABCD or 1ABC23, but NOT to transform AB123CD to BA123CD or AB213CD, using the default Character Pattern Map).

The following table describes the configuration options:

Configuration Description

Inputs

Specify a one or more String or String Array attributes from which you wish to replace values using a patterns map.

Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.

Options

Specify the following options:

  • Character Pattern Map: to map characters in the input value to pattern characters, in order to generate a pattern to match against the Patterns list. Default value: *Character Pattern Map.

  • Patterns Map: the map of character patterns used to transform data. Specified as Reference Data (Pattern Generation Category). Default value: None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • [Attribute Name].PatternTransformed: a new attribute with values replaced according to the Patterns Map. Value is derived from the original attribute value, transformed according to the Patterns Map.

Flags

The following data attributes are output:

  • [Attribute Name].PatternTransformedFlag: stores the result of the PatternTransform operation on each record - that is, whether or not data was transformed. Possible values are Y - transformed, or N - not transformed.

The following table describes the statistics produced by the profiler:

Statistic Description

Transformed

The number of records that were transformed using the Patterns Map.

Untransformed

The number of records that were not transformed using the Patterns Map.

Output Filters

The following output filters are available from the Pattern Transform processor:

  • Records with transformed values

  • Records with untransformed values

Example

In this example, the default Character Pattern Map is used to generate patterns, and the following Patterns Map is used to fix common format problems with UK Postcodes:

Lookup Map

aaN-_Naa

aaN Naa

aaN._Naa

aaN Naa

aaNN._Naa

aaNN Naa

aaNNaa

aaN Naa

aaN__Naa

aaN Naa

This transforms values as illustrated below:

Postcode Postcode.PatternTransformed

OL6 9HX

OL6 9HX

CW96HF

CW9 6HF

PR7 3RB

PR7 3RB

CH7 6DZ

CH7 6DZ

CH7 6BD

CH7 6BD

CH40BE

CH4 0BE

SK87NG

SK8 7NG