1.3.2 Audit Processors

Audit processors, or checks, check input data using business rules in order to assess whether or not it is fit for its business purpose.

The audit processors used to check data, and the rules used by them, are normally determined from the results of Profiling.

Audit processors categorize each input record as to whether it was valid or invalid according to the check. Invalid records may be handled separately from valid records in downstream processing, using an Output Filter - for example so that you only attempt to clean records that did not pass your check. Some audit processors, such as List Check, have three output filters - valid (the record passed the check), invalid (the record was positively identified as a failure), and unknown (the record was not recognized as either definitely valid, or definitely invalid).

Audit processors implicitly use the business rules that you apply to a given data attribute when profiling. Refer to the following table for the audit processor for each type of business rule that you can apply.

Type of Rule Example Business Rule Audit Processor

Whether or not the attribute is allowed to contain null values

The CU_NO attribute must not be null

No Data Check

The allowed or expected length of the data in the attribute

The CU_ACCOUNT attribute must be between 10-11 characters in length, and must not contain spaces

Length Check

The data type consistency in an attribute

There must be no numeric values in the NAME attribute

Data Type Check

The validity of values in an attribute

Values in the TITLE attribute must match a list of valid titles

List Check

The conformity to a standard character pattern

Values in the TEL_NO attribute must conform to a standard pattern

Pattern Check

The conformity to a standard pattern, by regular expression

UK National Insurance Numbers must match a standard regular expression

RegEx Check

The validity of specific characters in an attribute

The values in a NAME attribute must not contain characters such as #~@;:/?.>,<%$£!^*

Invalid Character Check

Duplication of values in an attribute

There must be no duplicate CU_NO values

Duplicate Check

Whether or not the attribute contains any common user entry workarounds for mandatory fields

There must be no values such as 'aaa' in the FORENAME attribute

Suspect Data Check

Check one attribute's value against another

The DATE_OF_BIRTH attribute must be before the DATE_OF_DEATH attribute

Cross-attribute Check

Check for related data in a reference table

There must be at least one active Contact record for a Customer

Lookup Check

Check for data which passes a Logic expression

There is a valid DATE_OF_BIRTH attribute and a valid Postcode and a valid email address

Logic Check

Check that data has a specific value, or value range

All male Customers must have a Gender value of 'M'

Value Check

Check that data conforms to a set of business rules, defined independently of EDQ.

If the customer is based in England, the post code must be present and must be in a valid format.

Business Rules Check

In addition to the general-purpose audit processors above, EDQ comes with a number of checks for specific attributes; for example, the Email Check.

Note:

If you cannot create your own check using the general purpose processors provided, you may either write your own check using JavaScript, or you may choose to extend EDQ to add a new processor. See Extending EDQ for more information.