Duplicate Check

1.3.2.4 Duplicate Check

The Duplicate Check processor provides a simple way of checking for duplicate values across either one or many attributes.

Use the Duplicate Check to identify any duplicate values that may cause a problem for a data migration (for example, in key attributes), or as an initial check for duplicate records in the data.

The following table describes the configuration options:

Configuration	Description
Inputs	Specify all attributes that you want to consider in the duplicate check. Records will be identified as duplicates if they are the same in all input attributes.
Options	Specify the following options: `Consider all no data as duplicates?`: drives whether or not values that have no data in all attributes are considered as duplicates. Possible values: `Yes`/`No`. Default value: `Yes`. `Ignore case?`: drives whether or not the duplicate check should be case sensitive. Possible values: `Yes`/`No`. Default value: `No`.
Outputs	Describes any data attribute or flag attribute outputs.
Data Attributes	None.
Flags	The following flag is output: `DateTypeValid`: indicates which data passes the Data Type Check. Possible values are `Y` or `N`.

The Duplicate Check assesses duplication across a batch of records. It must therefore run to completion before its results are available, and is not suitable for a process that requires a real time response.

When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached. The statistics returned will indicate the number of duplicates in the batch of transactions only.

The following table describes the statistics produced by the profiler:

Statistic	Description
Duplicated	The records that were duplicated in the input attributes. Drill down to see each distinct value, and the number of times it occurred. Drill down again to see the records.
Not duplicated	The records that were not duplicate in the input attributes.

Output Filters

The following output filters are available from a Duplicate Check:

Duplicate records
Non-duplicate records

Example

In this example, the Duplicate Check processor is used to look for duplicate company names in a BUSINESS attribute:

Duplicated	Not duplicated
41	1970

You can drill down on duplicated values:

Business	Count
Test	3
Zircom	2
Darwins	2
Tamlite Group	2
BSA Guns (UK) Limited	2
Permanent Pest Control	2
Gemini Visuals	2
Northern Water Utilities	2
Attitude Flooring	2
N S News & Confectionery	2
Send Group	2
Press Patterns	2