1.3.2.14 Suspect Data Check

The Suspect Data Check processor checks an attribute value for a variety of common data entry 'cheats', such as entering 'aaa' in a name field.

Specifically, it can check for any or all of the following:

  • Repeating alphabetic characters (for example, 'aaa')

  • Repeating numeric characters (for example, '111')

  • Repeating non-alphanumeric characters (for example, '>>>')

  • Repeating patterns (for example, 'abcabc')

  • Minimum character length (for example, for short values such as 'x')

Use the Suspect Data Check to check for common user ruses which result in suspect data in compulsory columns.

Where an empty value cannot be entered at the point of data entry, the user may enter a single character - for example a space, a full stop, or a random single character - to get past that point.

Alternatively, a single but repeating character may be entered - for example '9999' - or perhaps a repeating pattern of characters - for example 'asdasd'.

Note that this may not be the fault of the user, but of the business process and/or supporting applications. The user may not know or have available the complete set of data that the data entry application is requesting, but is required to enter the data into the system.

The following table describes the configuration options:

Configuration Description

Inputs

Specify a single attribute that you want to check for suspect data entries.

Options

Specify the following repeating alphabetic character options:

  • Check: drives whether or not to check for repeating alphabetic characters. Specified as Yes/No. Default value: Yes.

  • Minimum repeat: the minimum number of alphabetic characters that must be repeated for the check to identify a suspect entry. Specified as a number. Minimum value: 2. Default value: 3.

Specify the following repeating numeric character options:

  • Check: drives whether or not to check for repeating numeric characters. Specified as Yes/No. Default value: Yes.

  • Minimum repeat: the minimum number of numeric characters that must be repeated for the check to identify a suspect entry. Specified as a number. Minimum value: 2. Default value: 2.

Specify the following repeating non-alphanumeric character options:

  • Check: drives whether or not to check for repeating non-alphanumeric characters. Specified as Yes/No. Default value: Yes.

  • Minimum repeat: the minimum number of numeric characters that must be repeated for the check to identify a suspect entry. Specified as a number. Minimum value: 2. Default value: 2.

Specify the following repeating patterns options:

  • Check: drives whether or not to check for repeating patterns of characters. Specified as Yes/No. Default value: Yes.

  • Minimum pattern length: the minimum number of pattern characters that must be repeated for the check to identify a suspect entry. Specified as a number. Minimum value: 2. Default value: 3.

  • Minimum pattern repeat: the minimum number of times a pattern must occur for the data to be identified as suspect. Specified as a number. Minimum value: 2. Default value: 2.

Specify the following minimum length options:

  • Minimum length: the minimum length, in characters, for values in this attribute. Specified as a number. Default value: 0.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

The following flags are output:

  • SuspectData: indicates which data passes the Suspect Data Check: Suspect Data, Valid Data and Null. Possible values are Y/N/-.

The following table describes the statistics produced by the profiler:

Statistic Description

Suspect records

Records that were identified as having a suspect value in the attribute checked.

Drill down to see a breakdown of the suspects by the check that identified them. Drill down again to see the records.

Valid records

Records that did not have a suspect value in the attribute checked.

Null records

Records with a null value in the attribute checked.

Output Filters

The following output filters are available:

  • Valid records

  • Suspect records

  • Records that were null in the attribute checked

Example

In this example, the Suspect Data Check is used to check a NAME attribute for suspect data entries.

A summary view:

Alpha repeats Numeric repeats Non-alpha repeats Pattern repeats Short values

1

1

1

0

0

A drill down on Alpha repeats:

NAME CU_NUM

aaaaaaaaa

87581