RegEx Check
The RegEx Check processor checks the data in an attribute against reference lists of valid and invalid regular expressions for the attribute. It takes string, multiple strings, or string array as an input.
The case-sensitivity and matching technique (Whole Value / Contains / Starts With / Ends With) of the check can be controlled.
The RegEx Check processor is a powerful tool, allowing you to validate data according to its exact content, using the position of data, partial and exact values, and wild cards.
The RegEx Check is useful in order to check any data that should be in a consistent structure, for example, UK National Insurance Numbers.
Regular Expressions
Regular expressions are a standard technique for expressing patterns and manipulating Strings that is very powerful once mastered.
Tutorials and reference material about regular expressions are available on the Internet, and in books, including: Mastering Regular Expressions by Jeffrey E. F. Friedl published by O'Reilly UK; ISBN: 0-596-00289-0.
There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.
The following are some examples of regular expressions that could check data:
Regular Expression | Pattern Meaning |
---|---|
^\d{5}$ |
5 integer US zip code. |
([A-Z]{1,2}[0-9]{1,2}|[A-Z]{3}|[A-Z]{1,2}[0-9][A-Z])( |-)[0-9][A-Z]{2} |
Valid UK Postcode. |
^[A-CEGHJ-PR-TW-Z]{1}[A-CEGHJ-NPR-TW-Z]{1}[0-9]{6}[A-DFM]{0,1}$ |
Valid UK National Insurance number. |
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$ |
Valid email address. |
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify a single, multiple, or array attributes that you want to check based on lists of valid and invalid regular expressions (or both). |
Options |
Specify the following valid pattern options:
Specify the following invalid pattern options:
Specify the following match options:
|
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
The following flags are output:
A single summary flag is also output:
|
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Valid records |
The records that were categorized as Valid by the RegEx Check. |
Unknown records |
The records that were categorized as Unknown by the RegEx Check. |
Invalid records |
The records that were categorized as Invalid by the RegEx Check. |
Drilling down on any of the above statistics reveals a count of the distinct patterns that were found to be Valid, Unknown or Invalid. You can then drill down again to see the records themselves.
Output Filters
The following output filters are available:
-
Valid records
-
Unknown records
-
Invalid records
Example
In this example, a RegEx Check is used to check the format of an Account Number attribute (CU_ACCOUNT), using a Whole Value match against the following regular expression:
^([0-9]{2})(-)([0-9]{4,5})(-)([a-zA-Z]{2})
This regular expression dictates that values must start with exactly 2 digits, following by a hyphen, followed by either 4 or 5 digits, followed by another hyphen, followed by two letters.
A summary view:
Valid Records | Unknown Records | Invalid Records |
---|---|---|
1997 |
0 |
14 |
A drill down on invalid records:
Value | Count | % |
---|---|---|
[Null] |
4 |
0.2 |
OO-24077-SH |
1 |
<0.1 |
OO-24282-LR |
1 |
<0.1 |
OO-24276-LR |
1 |
<0.1 |
0975t3487263 |
1 |
<0.1 |
OI-25057-JD |
1 |
<0.1 |
97-19671-5H |
1 |
<0.1 |
97-19601- |
1 |
<0.1 |
02-999-ZZ |
1 |
<0.1 |
00-0-XX |
1 |
<0.1 |