1.3.7.14 RegEx Patterns Profiler

The RegEx Patterns Profiler analyzes a number of attributes for matches against a list of regular expressions.

Use the RegEx Patterns Profiler to find data that matches a commonly recognized format, where it may occur in a number of attributes. This is useful where values with distinct patterns, such as Postcodes or National Insurance Numbers, are entered into the wrong fields.

Regular Expressions

Regular expressions are a standard technique for expressing patterns and manipulating Strings that is very powerful once mastered.

Tutorials and reference material about regular expressions are available on the Internet, and in books, including: Mastering Regular Expressions by Jeffrey E. F. Friedl published by O'Reilly UK; ISBN: 0-596-00289-0.

There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.

The following table describes the configuration options:

Configuration Description

Inputs

Specify any String attributes that you want to search for data that matches a list of regular expressions.

Options

Specify the following options:

  • Pattern list: the list of regular expressions that you want to match values against. Specified as Reference Data (Regular Expressions Category). Default value: None.

  • Regular expression: allows you simply to enter a single regular expression rather than use a reference list. Note that if both options are used, all regular expressions (in this option and in the reference list) are used. Default value: None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

The following flags are output:

  • RegExPatternMatch: indicates which data matches the Patterns listed in the Reference Data. Possible values are Y or N.

The following table describes the statistics produced by the profiler for each input attribute:

Statistic Description

Matched

The number of records in the attribute that matched one of the regular expressions in the reference list.

Drill-down to see a breakdown of matches by the matched regular expression.

Unmatched

The number of records in the attribute that did not match any of the regular expressions in the reference list.

Example

In this example, the RegEx Patterns Profiler is used to look for UK Postcodes in a number of Address attributes. The summary data:

Attribute Matched (desc) Unmatched

POSTCODE

1696

305

ADDRESS3

169

1832

ADDRESS1

0

2001

ADDRESS2

0

2001

Drill down on the number of records in each attribute that matched one of the regular expressions in the list to see a breakdown by the matched regular expression. In this case, only one regular expression was matched, so drilling down on the 169 records in ADDRESS3 that matched will reveal the following view:

Pattern Count %

([A-Z]{1,2}|[A-Z]{3}|[A-Z]{1,2}[0-9][A-Z])( +)([0-9][A-Z]{2})

169

8.4%