1.3.2.3 Data Type Check

The Data Type Check processor checks that the values in String or String Array attributes conform to a consistent data type, and categorizes as invalid any records with values that are not of the expected data type.

Note that Number and Date attributes are by definition 100% consistent with regard to their data type, and so cannot be checked.

The Data Type Check is a useful way of quickly finding values that have been entered into the wrong fields in a user application - typically numbers or dates that have been entered into fields that expect text values only.

Note that it is possible to 'expect' dates or numbers in a String attribute, and categorize as invalid any values that are not of the expected type. This is provided because dates and numbers are not always held in attributes with a controlled data type that can be read from the schema of the data source.

The following table describes the configuration options:

Configuration Description

Inputs

Specify one or more String or String Array attributes that you want to check for data type consistency.

Options

Specify the following options:

  • Expected Data Type: specifies the expected (valid) data type for the input data. Any data found that is not of the expected type is categorized as invalid. Specified as a Selection (Text/Number/Date). Default value: None.

  • Interpret Nulls as Valid: drives whether or not to interpret Null values as valid in the check. Possible values: Yes/No. Default value: Yes.

  • List of recognized date formats: recognizes dates in a variety of different formats. Specified as Reference Data (Date Formatting Category). Default value: Yes.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

For each attribute input, a new attribute is created in the following format:

  • [Attribute Name].DateTypeValidDetail: Indicates which elements of the data passes the Data Type Check. Possible values are Y or N.

  • [Attribute Name].DateTypeValid: Indicates whether the data passes the Data Type Check. Possible values are Y or N.

The following table describes the statistics produced by the profiler:

The Date Formats Reference Data used by the Data Type Check must conform to the standard Java 1.6.0 or later SimpleDateFormat API.

Statistic Description

Valid

Records with data of the expected data type in the input attribute.

Invalid

Records with data not of the expected data type in the input attribute.

Clicking on the Additional Information button will show the above statistics as percentages of the total number of records analyzed.

Output Filters

The Data Type Check produces the following output filters:

  • Valid records

  • Invalid records

Example

In this example, the Data Type Check is used to check if all values for a NAME attribute are in the textual format.In this case, null values are treated as invalid.

Input Attribute Valid/Invalid

Michael

Valid

John Smith

Valid

<Null>

Invalid

19-Aug-2012

Invalid