1.3.4.11.1 Input

The Input sub-processor of matching processors is used to map attributes from input data streams to matching processors.

The Input sub-processor is a necessary part of matching, used to control the data that is used in the matching process.

Normally, all attributes from each input data stream are included in a matching process. However, you may want to vary the attributes used in matching, and only include those that you need either to match on, use in the review of possible matches, or use in making output selections.

Note:

For versions of EDQ older than 7.0, it was also necessary to configure the selection of input attributes carefully as all input attributes would be included in the Decision Key used to re-apply ('remember') manual match decisions. However, it is now possible to configure which of the input attributes to use in the Decision Key - see Advanced options for match processors.

For example, from a typical Customer table, the following attributes might be included in a matching process:

Purpose Attributes

Needed for matching

First_name

Surname

Birth_date

Address_1

Postcode

Email

Home_tel_number

Needed for the review of possibly matching records

Title

Address_2

Town

County

Customer_type

Needed to identify specific records for data updates

Customer_ID

Needed to make output decisions (for example, to choose the most recent record)

Last_modified_date

Has_active_account

A number of other attributes in the source data might be excluded from the matching process.

In order to input data into matching, you first need to connect up the data stream(s) to the match processor on the canvas. Note that the number and type of data streams accepted by the processor depends on the type of processor, as follows:

Match Processor Type Access input data streams

Group and Merge

A single working data stream

Deduplicate

A single working data stream

Enhance

A single working data stream, and any number of reference data streams

Link

Any number of working and reference data streams

Consolidate

Any number of working data streams

Advanced match

Any number of working and reference data streams

Data streams are connected to match processors either directly from Readers, or from output filters of other processors.

Once the data streams are connected, you can use the Inputs dialog to select attributes, in the same way as for all processors.

Two additional options appear when configuring the options for a match processor (except Group and Merge):

Compare against self - this option allows you to change whether or not the match processor will look for matches within the data stream (rather than between data streams). This option is set to the most likely default depending on the type of match processor. Note that working data streams are always compared with each other, and reference data streams are never compared with each other.

Enabled - this option allows you to retain the configuration of an input data stream, but to switch on and off the use of it in the match process - for example to run a match of some working data against some, but not all, configured reference data streams.