1.3.2.4 Duplicate Check

The Duplicate Check processor provides a simple way of checking for duplicate values across either one or many attributes.

Use the Duplicate Check to identify any duplicate values that may cause a problem for a data migration (for example, in key attributes), or as an initial check for duplicate records in the data.

The following table describes the configuration options:

Configuration Description

Inputs

Specify all attributes that you want to consider in the duplicate check. Records will be identified as duplicates if they are the same in all input attributes.

Options

Specify the following options:

  • Consider all no data as duplicates?: drives whether or not values that have no data in all attributes are considered as duplicates. Possible values: Yes/No. Default value: Yes.

  • Ignore case?: drives whether or not the duplicate check should be case sensitive. Possible values: Yes/No. Default value: No.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

The following flag is output:

  • DateTypeValid: indicates which data passes the Data Type Check. Possible values are Y or N.

The Duplicate Check assesses duplication across a batch of records. It must therefore run to completion before its results are available, and is not suitable for a process that requires a real time response.

When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached. The statistics returned will indicate the number of duplicates in the batch of transactions only.

The following table describes the statistics produced by the profiler:

Statistic Description

Duplicated

The records that were duplicated in the input attributes. Drill down to see each distinct value, and the number of times it occurred. Drill down again to see the records.

Not duplicated

The records that were not duplicate in the input attributes.

Output Filters

The following output filters are available from a Duplicate Check:

  • Duplicate records

  • Non-duplicate records

Example

In this example, the Duplicate Check processor is used to look for duplicate company names in a BUSINESS attribute:

Duplicated Not duplicated

41

1970

You can drill down on duplicated values:

Business Count

Test

3

Zircom

2

Darwins

2

Tamlite Group

2

BSA Guns (UK) Limited

2

Permanent Pest Control

2

Gemini Visuals

2

Northern Water Utilities

2

Attitude Flooring

2

N S News & Confectionery

2

Send Group

2

Press Patterns

2