1.3.2.8 List Check

The List Check processor checks the data in an attribute against reference lists of valid and invalid values for the attribute.

The List Check allows case sensitive or insensitive matching, and can match the reference lists in one of a number of different ways:

  • Contains (the value must contain a matching list entry)

  • Whole value (the value must match the list exactly)

  • Starts with (the value must begin with a matching list entry)

  • Ends with (the value must end with a matching list entry)

  • Delimiter match (match the list after splitting the data using the specified delimiters)

Note:

The List Check processor does not support the use of External Reference Data for the valid or invalid values. Attempting to do so will result in an error message during processing.

The List Check is an important processor, used in auditing to find valid and invalid values in a data attribute. Use the results of the Frequency Profiler or Phrase Profiler to create lists of valid and invalid values, and use them in a List Check to audit the data against the lists on an ongoing basis.

List Check allows the use of up to two reference lists - a list of Valid values for the attribute, and a list of Invalid values.

You may choose only to use one of these two lists. For example, if you discover from profiling that there are many different valid values for an attribute, you may want only to check the attribute for invalid values, and consider the non-matching values as either Valid, or Unknown - for example simply to look for suspect words such as 'Test' in a Surname attribute.

If, however, the attribute has a small number of valid values, you may want simply to check the data against a list of valid values, and consider the non-matching values as Invalid, or Unknown - for example to check Title values against a small set of valid titles.

Finally, you can use both lists, and recognize both valid and invalid values, with values that do not match either list categorized as Unknown.

The following table describes the configuration options:

Configuration Description

Inputs

Specify one or more attributes of any type (string, date, number, string array, date array, number array) that you want to check based on lists of valid or invalid values (or both).

Options

Specify the following options for checking against valid values:

  • Valid values Reference Data: list of valid values for the attribute. Specified as Reference Data (does not support the use of External Reference Data). Default value: None.

  • Categorize unmatched as: how to categorize values that do not match the list of valid values. Specified as a selection (Unknown/Invalid). Default value: Unknown.

Specify the following options for checking against invalid values:

  • Invalid values Reference Data: list of invalid values for the attribute. Specified as Reference Data (does not support the use of External Reference Data). Default value: None.

  • Categorize unmatched as: how to categorize values that do not match the list of invalid values. Specified as a selection (Unknown/Valid). Default value: Unknown.

Specify the following match options:

  • Ignore case?: drives whether or not to ignore case when matching the list(s). Possible values: Yes/No. Default value: Yes.

  • Match list by: drives how to match against the list(s). Specified as a selection (Whole Value/Contains/Starts With/Ends With/Delimiter Match) Default value: Whole Value.

  • Delimiters: when matching values to the list(s) by splitting the data using delimiters (Delimiter Match), this field is used to specify the delimiter characters to use. Specified as a free text entry. Default value: [space].

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

The following flags are output:

  • ListValid: indicates which data passes the List Check. A value of '-' means that the outcome was Unknown. Possible values are Y/N/-.

The following table describes the statistics produced by the profiler:

Statistic Description

Valid records

The records that were categorized as Valid by the List Check.

Unknown records

The records that were categorized as Unknown by the List Check.

Invalid records

The records that were categorized as Invalid by the List Check.

You can drill down the results to see the records themselves.

Output Filters

The following output filters are available from a List Check:

  • Valid records

  • Unknown records

  • Invalid records

Example

In this example, the List Check is used to check the values in a Business attribute using lists of valid and invalid Business values collated from Frequency Profiling.

Note that values that do not match either list are categorized as Unknowns.

A summary view:

Valid Records Unknown Records Invalid Records

1665

332

4

A drill down on unknown records:

Value Count %

[Null]

331

16.5

Fields

1

<0.1

A drill down on invalid records:

Value Count %

Test

3

0.1

Test Ltd

1

<0.1