1.3.2.7 Length Check

The Length Check processor provides a quick and easy way for checking an attribute for values of the appropriate length. The input attribute can be a single string attribute, multiple string inputs or string array attribute.

The Length Check can check either, or both, of the following:

  • The total length in characters (including whitespace and control characters)

  • The number of words

You can choose the way 'words' are counted using options on the Length Check. By default, words are separated by spaces. For example, the word count of 'Oracle Limited' is 2.

Use the Length Check to ensure that the data within the attribute will meet either its technical or business purpose. For example, if migrating an attribute's data to a shorter attribute in a target system, you may choose to truncate the data, and then check that it conforms to the character length restrictions of the target field before migrating. Alternatively, there may be a business reason why a value should not be over a set number of characters, or words. For example, you might want to check a Surname attribute for all values over 2 distinct words in length, as this might indicate misuse of the attribute - for example to store a Company Name value.

The following table describes the configuration options:

Configuration Description

Inputs

Specify a single, multiple or string attribute that you want to check for values that are too short or too long.

Options

Specify the following options:

  • Valid character count: specifies the allowed number of characters (inclusive). Specified as a number range (for example, 10-11), or open ended range (for example, 10-). Default value: None.

  • Valid word count: specifies the allowed number of words (inclusive). Specified as a number range (for example, 1-2) or open ended range (for example, 3-). Default value: None.

  • Word delimiters Reference Data: specifies a list of characters that are used to split up words before counting them. Specified as Reference Data. Default value: *Delimiters.

  • Word delimiters: specifies an additional set of characters that are used to split up words before counting them. Specified as a free text entry. Default value: no default.

  • Valid Values in: how to categorize a record if it has multiple inputs, or array inputs, based on how many are categorized as Valid. Specified as a selection (All Values/Any Value). Default value: All Values.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

The following flag is output for each input:

  • [Attribute Name].LengthValid: indicates which data passes the Length Check. Possible values are Y (valid length), NC (invalid character length), NW (invalid word length), N (invalid character and word length).

Additionally there is a single summary output:

  • LengthValidSummary: indicates whether the record passes the Length Check. Possible values are Y (valid length), NC (invalid character length), NW

The following table describes the statistics produced by the profiler:

Statistic Description

Both counts good

The number of records with valid character and word counts.

Bad char, good word count

The number of records with an invalid character count, but a valid word count.

Good char, bad word count

The number of records with a valid character count, but an invalid word count.

Both counts bad

The number of records with invalid character and word counts.

Click on the Additional Data button to see the above statistics as percentages of the number of records analyzed.

Output Filters

The following output filters are available from a Length Check:

  • Valid (records where both counts were valid)

  • Invalid (records where both counts were invalid)

  • Invalid character count (records with an invalid character count, but a valid word count)

  • Invalid word count (records with an invalid word count, but a valid character count)

Example

In this example, Length Check is used to check the length of an Account Number attribute (CU_ACCOUNT) for any values with a character count not in the range 10-11, and any values that do not consist of a single word:

Both counts good Bad char, good word counts Good char, bad word counts Both counts bad

2002

4

0

4

You can drill down on a bad character, good word lengths count.

Note that the CU_ACCOUNT attribute for the above records is too short.

CU_ACCOUNT CU_NUMBER

97-19601-

10944

02-999-ZZ

99999

00-000-ZZ

00-0-XX

0