1.3.4.5 Group and Merge

The Group and Merge processor provides a simple way to deduplicate records, by grouping records using an attribute or attributes, and merging these records together, outputting records that are distinct across the selected grouping attributes. Unlike other matching processors, it does not offer the ability to configure complex matching. Records are simply grouped by an exact match on the selected grouping attributes.

Use Group and Merge as a simple and efficient way to output the distinct values for an attribute or attributes.

For example, if using EDQ on a data extract, the extract may in fact have been generated as a join across a number of database tables. This will be shown if a key column has many duplicate values. In this case, it may well be useful to 'unjoin' the data by creating a set of data with a distinct key value.

Group and Merge is also very useful when generating Reference Data in an EDQ process. For example, it might be useful to create a set of data with all the distinct Forename values that have passed a number of checks. The records that pass the checks can be fed into Group and Merge, with the Forename attribute used to group records. The output distinct Forename values can then be written to staged data and converted to Reference Data, or used directly in lookups. Note that the output MatchGroupSize attribute will act as a count of how many times each value occurred.

There are sometimes other reasons to group records, for example to sum all records with the same attribute value. Group and Merge can be used to do this, in conjunction with the ability to create custom output selectors.

Sub-processor Description

Input

Select the attributes from the data stream to be grouped.

Group

.Select the attributes to group records by.

Merge

Use rules to merge grouped records.

The following table describes the configuration options:

Configuration Description

Inputs

The Group and Merge processor accepts input attributes of any type, except Arrays. As with other matching processors, only attributes that are input will be output.

The inputs are configurable in the Input sub-processor.

Options

All options are configured within the sub-processors above.

Note that Group and Merge groups records using a simple concatenation of the selected attributes for grouping, separating the values consecutively without a separator. This means that there may be records such as the two examples below that have the same data across the grouping attributes, but in a different structure, that will be grouped.

If you want Group and Merge only to group records with exactly the same data values in all the attributes you are using to group by, it is best to use the Concatenate processor to create a grouping key attribute, separating data attributes with a delimiter character such as pipe which does not occur in the data values. You can then use this key attribute to group records in Group and Merge.

Outputs

The merged output data stream is configured in the Merge sub-processor.

It is possible to use a Group and Merge processor in a Real time Response process, provided the process contains only one match processor. However, it will only group and merge records within the same input message.

The Group and Merge processor produces a number of views of results as follows.

Groups View

The Groups view summarizes the groups by size.

Statistic Description

Group size

Group size (number of records)

Count

The number of groups of the listed size. Drill down on the Count to see the merged records for each group.

Merged Output View

The Merged Output is a Data View of the merged output from the Group and Merge processor; that is, the record set after grouped records have been merged together. The records that are output, and their attributes, will vary depending on the options set in the Merge sub-processor.

Output Filters

The Group and Merge processor has a single output filter - Merged - this corresponds to the Merged Output as above.

Example

For example, Group and Merge is used to group and merge all records with the same Name, Date of Birth and Email address. 3 Groups of 2 records are created and merged. Drilling down on the 3 groups of 2 records shows the merged records for each group: