RegEx Replace
The RegEx Replace processor provides a way to perform advanced text replacements by matching String or String Array attributes to a regular expression, and replacing the matching value with a specific value, or with a value derived from the matched text - for example replacing the whole of a string that matched a regular expression with only the first group in the expression.
Use RegEx Replace for advanced text transformations, for example where you need to replace a String that matches a specific pattern by regular expression with a specific value, or where you need to consider the context of a piece of text before deciding whether or not to standardize it.
For example, for an attribute with a fixed number of valid values, you may want to transform all values over a few alphabetic characters in length that do not match the list of specific valid values to 'Other'. You can do this by running a List Check, and transforming the unmatched values using RegEx Replace.
Note that backslashes (\) and dollar signs ($) are special characters in the replacement String. Dollar signs are used as references to groups within the regular expression used to match against. Backslashes are used to escape literal characters in the replacement String.
Regular Expressions
Regular expressions are a standard technique for expressing patterns and manipulating Strings that is very powerful once mastered.
Tutorials and reference material about regular expressions are available on the Internet, and in books, including: Mastering Regular Expressions by Jeffrey E. F. Friedl published by O'Reilly UK; ISBN: 0-596-00289-0.
There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify one or more String or String array attributes. |
Options |
Specify the following options:
|
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
The following data attributes are output:
|
Flags |
The following flags are output:
|
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Transformed |
The number of records which matched the regular expression, and therefore underwent a transformation. |
Untransformed |
The number of records which did not match the regular expression, and therefore did not undergo a transformation. |
Output Filters
The following output filters are available:
-
Records with transformed values
-
Records with untransformed values
Example
In this example, RegEx Replace is used to replace three digits followed by a space and <anything> with <anything><space><the three digits>.
-
Regular expression: ^(\d{3}) (.*)$
-
Replacement String: $2 $1
-
Results (successful replacements):
String | Replacement |
---|---|
123 24ACB |
24ACB 123 |
435 GBRSDF |
GBRSDF 435 |
789 X |
X 789 |