Pattern Check
The Pattern Check processor checks the pattern of data in an attribute against reference lists of valid and invalid patterns. It takes multiple single or array inputs.
Use the Pattern Check processor to ensure that the data in an attribute conforms to one of the valid patterns for that attribute. Data may need to conform to a set of valid patterns for either technical or business reasons. For example, when migrating data, the target system may require that all data for a given attribute consists of numeric characters only, and with minimum and maximum length restrictions. Alternatively, for business reasons, it may be that you want to tag as invalid records that have bad data, or data in the wrong attributes, for example, numeric values in Name fields, malformed product codes etc.
You can create the lists of valid and invalid patterns from the data itself using the Patterns Profiler.
Pattern Check allows the use of up to two reference lists - a list of Valid patterns for the attribute, and a list of Invalid patterns.
You may choose only to use one of these two lists. For example, if you discover from profiling that there are many different valid patterns for an attribute, you may want only to check the attribute for invalid patterns, and consider the non-matching values as either Valid, or Unknown.
If, however, the attribute has a small number of valid patterns, you may want simply to check the data against a list of valid patterns, and consider the non-matching values as Invalid, or Unknown.
Finally, you can use both lists, and recognize both valid and invalid patterns, with values that do not match either list categorized as Unknown.
The following table describes the configuration options:
Configuration | Description |
---|---|
Inputs |
Specify one or more single or array attributes that you want to check for valid or invalid patterns (or both). |
Options |
Specify the following valid value options:
Specify the following valid pattern options:
Specify the following invalid pattern options:
|
Outputs |
Describes any data attribute or flag attribute outputs. |
Data Attributes |
None. |
Flags |
The following flags are output:
A single summary flag is also output for the whole record:
|
The following table describes the statistics produced by the profiler:
Statistic | Description |
---|---|
Valid records |
The records that were categorized as Valid by the Pattern Check. |
Unknown records |
The records that were categorized as Unknown by the Pattern Check. |
Invalid records |
The records that were categorized as Invalid by the Pattern Check. |
Drilling down on any of the above statistics reveals a count of the distinct patterns that were found to be Valid, Unknown or Invalid. You can then drill down again to see the records themselves.
Output Filters
The following output filters are available:
-
Valid records
-
Unknown records
-
Invalid records
Example
In this example, Pattern Check is used to verify values in an Account Number attribute (CU_ACCOUNT) using lists of valid and invalid patterns collated from Patterns Profiling.
Values that did not match either list of pattens were categorized as Unknown.
A summary view:
Valid Records | Unknown Records | Invalid Records |
---|---|---|
1991 |
1 |
9 |
A drill down on invalid records:
Value | Count | % |
---|---|---|
aa-NNNNN-aa |
4 |
0.2 |
NN-NNN-aa |
2 |
<0.1 |
NN-NNNNN-Na |
1 |
<0.1 |
NN-NNNNN- |
1 |
<0.1 |
NN-N-aa |
1 |
<0.1 |