1.3.11.32 RegEx Replace

The RegEx Replace processor provides a way to perform advanced text replacements by matching String or String Array attributes to a regular expression, and replacing the matching value with a specific value, or with a value derived from the matched text - for example replacing the whole of a string that matched a regular expression with only the first group in the expression.

Use RegEx Replace for advanced text transformations, for example where you need to replace a String that matches a specific pattern by regular expression with a specific value, or where you need to consider the context of a piece of text before deciding whether or not to standardize it.

For example, for an attribute with a fixed number of valid values, you may want to transform all values over a few alphabetic characters in length that do not match the list of specific valid values to 'Other'. You can do this by running a List Check, and transforming the unmatched values using RegEx Replace.

Note that backslashes (\) and dollar signs ($) are special characters in the replacement String. Dollar signs are used as references to groups within the regular expression used to match against. Backslashes are used to escape literal characters in the replacement String.

Regular Expressions

Regular expressions are a standard technique for expressing patterns and manipulating Strings that is very powerful once mastered.

Tutorials and reference material about regular expressions are available on the Internet, and in books, including: Mastering Regular Expressions by Jeffrey E. F. Friedl published by O'Reilly UK; ISBN: 0-596-00289-0.

There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.

The following table describes the configuration options:

Configuration Description

Inputs

Specify one or more String or String array attributes.

Options

Specify the following options:

  • Regular expression: the regular expression to be matched. Specified as a regular expression. Default value: None.

  • Replacement: the replacement String used to replace the matched values. Specified as any value. Default value: None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • [Attribute Name].RegExReplaced: a new attribute with the result of the RegEx replace. Value is derived from the result of the RegEx replace. Note that if the regular expression was not matched, the original input attribute value is carried forward.

Flags

The following flags are output:

  • [Attribute Name].RegExReplaceSuccess: indicates whether the RegEx Replace was successful or not. Possible values are Y/N.

The following table describes the statistics produced by the profiler:

Statistic Description

Transformed

The number of records which matched the regular expression, and therefore underwent a transformation.

Untransformed

The number of records which did not match the regular expression, and therefore did not undergo a transformation.

Output Filters

The following output filters are available:

  • Records with transformed values

  • Records with untransformed values

Example

In this example, RegEx Replace is used to replace three digits followed by a space and <anything> with <anything><space><the three digits>.

  • Regular expression: ^(\d{3}) (.*)$

  • Replacement String: $2 $1

  • Results (successful replacements):

String Replacement

123 24ACB

24ACB 123

435 GBRSDF

GBRSDF 435

789 X

X 789