1.3.11.31 RegEx Match

The RegEx Match processor matches the data in an attribute against a regular expression, and outputs the matching data in a new attribute. It also adds an attribute with an array of all the matched groups within the regular expression.

Use RegEx Match as a simple way to extract data that matches a regular expression. It is particularly useful where you want to create an array of groups.

Note that a group in a regular expression is contained between parentheses. A single regular expression may have many groups.

RegEx Match adds two attributes - one containing the value that matched against the whole regular expression, and another containing an array of the matching groups within the regular expression. If there was no match, the new attributes will both be null.

Regular Expressions

Regular expressions are a standard technique for expressing patterns and manipulating Strings that is very powerful once mastered.

Tutorials and reference material about regular expressions are available on the Internet, and in books, including: Mastering Regular Expressions by Jeffrey E. F. Friedl published by O'Reilly UK; ISBN: 0-596-00289-0.

There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.

The following table describes the configuration options:

Configuration Description

Inputs

Specify a single String attribute.

Options

Specify the following options:

  • Regular expression: the regular expression to be matched Specified as a regular expression. Default value: None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

The following data attributes are output:

  • RegExMatchFull: stores the value that matched the whole regular expression. Value is the original input value, where it matched the regular expression, or a null value, where it did not match the regular expression.

  • RegExMatchGroups: stores an array of the values matching each group within the regular expression. Value is an array of the values that matched each group of the regular expression.

Flags

The following flags are output:

  • RegExMatchSuccess: indicates whether the RegEx Match was successful or not. Possible values are Y/N.

The following table describes the statistics produced by the profiler:

Statistic Description

Matched

The number of records which matched the regular expression.

Unmatched

The number of records which did not match the regular expression.

Output Filters

The following output filters are available from the RegEx Match processor:

  • Records that matched the regular expression

  • Records that did not match the regular expression

Example

In this example, the values in an ADDRESS3 attribute are matched against the following UK Postcode regular expression:

([A-Z]{1,2}[0-9]{1,2}|[A-Z]{3}|[A-Z]{1,2}[0-9][A-Z]) +([0-9][A-Z]{2})

Matched values Unmatched values

170

1831

Drilldown on Matched values:

Where values match, an array is created with the values matching each distinct group; that is, Outcode and Incode:

ADDRESS3 RegExMatchFull RegExMatchGroups

SP7 9QJ

SP7 9QJ

{SP7}{9QJ}

BA16 0BB

BA16 0BB

{BA16}{0BB}

LA9 7BT

LA9 7BT

{LA9}{7BT}

E16 2AG

E16 2AG

{E16}[2AG}

SN1 5BB

SN1 5BB

{SN1}{5BB}