1.3.2.6 Invalid Character Check

The Invalid Character Check processor provides a quick and easy way to find values that contain odd characters.

Use the Invalid Character Check to check for unusual characters. This is particularly useful when analyzing free text fields, which may have 'data cheats' in them, where data entry users have worked round mandatory fields by entering dummy characters such as #. The Invalid Character Check is also useful for finding typos.

If the invalid characters do not signify anything, they can simply be removed by adding a Denoise processor.

The following table describes the configuration options:

Configuration Description

Inputs

Specify a single attribute or an array to analyze invalid characters.

Options

Specify the following options:

  • Ignore case?: allows you not to distinguish between characters in upper or lower case - for example to find any value containing either an upper case or lower case 'x'. Possible values: Yes/No. Default value: Yes.

  • Disallowed characters Reference Data: a reference list of invalid characters. Allows a standard list of invalid characters to be used in a number of different checks, and allows control characters to be used. Default value: *Noise Characters.

  • Disallowed characters: provides a quick way of adding small numbers of invalid characters to search for. These characters act in addition to any characters in the Reference Data. Specified as a free text entry. Default value: None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

For each attribute input, a new attribute is created in the following format:

  • [Attribute Name].CharValid: Indicates whether the data passes the Invalid Character Check; that is, does the value consist only of valid characters? Possible values are Y or N.

  • [Attribute Name].CharValidDetail: Indicates which elements of the data passes the Invalid Character Check? Possible values are Y or N.

A single summary flag is also output:

  • CharValidSummary: Indicates whether the inputs collectively passes the Invalid Character Check? Possible values are Y or N.

The following table describes the statistics produced by the profiler:

Statistic Description

Valid records

The records that were categorized as valid by the Invalid Character Check.

Invalid records

The records that were categorized as invalid by the Invalid Character Check.

Output Filters

The following output filters are available from a Invalid Character Check:

  • Valid records

  • Invalid records

Example

In this example, a NAME attribute is checked for invalid characters such as ()#%^*$£"!'A number of records are found containing the # character and one record with 'character.

Valid Invalid

1988

14

You can drill down on invalid values:

This list describes the elements in the Summary page:

Name

  • # MCAULEY

  • # RAE

  • # WILLIAM

  • # SWAN

  • # HAWKES

  • # BARKER

  • # PALMER

  • # SNOWDON

  • # DOONAN

  • # MCCLEMENTS

  • # SHIELDS

  • # SEADEN

  • {O'CONNAL}