1.3.4.11.4 Match

Match is a sub-processor of all matching processors except Group and Merge. The purpose of the Match stage of match processor configuration is to configure the main matching process, that is, how to compare records, and how to interpret the results of those comparisons (as automatically match, do not match, or assign for manual review).

It is also possible to configure how to output the results of the matching process - that is, the sets of matching records, and the relationships created between records.

The tabs of match configuration are:

The set of comparisons and match rules needed to match records accurately will depend on the requirements of the matching process, and also on the quality of the data being matched.

In general, when first developing a match process, the following tips are useful:

  • Start by looking for definite matches (normally records that match exactly across your key identifiers). To do this, add Exact Match comparisons to each identifier, and a rule that expects exact matches in each. Note that the Exact Match comparison could still contain transformations to resolve minor discrepancies between records (such as case differences, or extra filler words appearing in the identifier value).

  • Widen out the matching process by adding further rules, below the exact match rule, with degrees of fuzzy matching (for example, using a Character Edit Distance comparison allowing matches with an edit distance of 1 or 2), and run matching to see how effective each rule is (that is, if it finds any matches, and if there are any false positives; that is, records that were matched but which do not represent the same entity).

  • Create the loosest match rule you can imagine that might yield a positive match (perhaps amongst many non-matches), and set the initial match decision to Review. This will allow you to review the characteristics of records matched by the rule, and create new 'stronger' rules to match the positive matches only.

  • When developing a match process, the general aim is to minimize the amount of manual review that will be required to check possible matches. However, on some occasions, there is no way to distinguish automatically between records that should, and should not, match. When you have a rule where it is not obvious whether or not each of the match pairs of records should match, this should be a Review rule.

For more high-level information about matching in EDQ, see the Matching concept guide. More

Configuration

There are four steps of configuration of the Match sub-processor, with different tabs on the configuration dialog for each.

The main configuration of matching is encapsulated in the Comparisons and Match Rules tabs. The two types of output have default configuration settings, which will often not need to be changed, or may only need changing when the development of the match process is nearing completion.

Comparisons

Comparisons are matching functions that determine how well two records match each other, for a given identifier.

EDQ comes with a library of comparisons (see List of Comparisons) to cover the majority of matching needs. New comparisons may also be scripted and added into EDQ.

Comparisons compare all the records within a cluster group with each other, and produce a comparison result. The possible comparison results depend on the comparison, and the type of identifier (such as String, number or date) being compared.

For example, the Exact String Match comparison (see Comparison: Exact String Match) delivers one of the following results for each comparison it performs:

  • True - the pair of identifier values match

  • False - the pair of identifier values do not match

  • No Data - no value was found in one or both of the identifier values

So, the exact String match comparison simply determines whether or not a pair of records match.

By contrast, the Character Edit Distance comparison (see Comparison: Character Edit Distance) attempts to find how well a pair of records match, by calculating a numeric value for how many character edits it would take to get from one value to another. For example the values 'test' and 'test' would match exactly, meaning a character edit distance result of 0, the values 'test' and 'tast' have a character edit distance of 1, because a single character is different, and the values 'test' and 'mrtest' would result in a character edit distance of 2, because two characters are different.

When used with array attributes, comparisons will, in general, compare all array element values in the first record with all array element values in the second record, and output the strongest match result. For example, if record A has an array with elements 'John' and 'Jon', and record B has an array with elements 'J' and 'Jon', an Exact Match comparison will return 'True', and a Character Edit Distance comparison will return '0', because 'Jon' matches 'Jon' exactly.

Adding and Configuring Comparisons

Comparisons are added to each identifier using the Add Comparison button at the bottom of the dialog. It is also possible to copy and paste comparisons (for example, if you want the same comparison configuration on another identifier), by copying (Ctrl + C) with a comparison selected and pasting (Ctrl + V), with an identifier. Note that comparisons can also be copied between matching processors, so you can reuse comparison configurations that you have used in other match processors.

Each comparison is configured using the right-hand side of the dialog.

Adding Transformations to Comparisons

Adding transformations on comparisons allows identifiers to be transformed before they are compared.

For example, you might want to strengthen a match rule where identifier values (such as names) are similar, but do not match exactly, with a comparison that ensures that the two values sound the same. To do this, use an Exact String match comparison, but add a Metaphone transformation to the comparison, so that you are comparing the metaphone key for each identifier rather than the individual value - this would mean (for example) that 'Jhon' and 'John' would match.

Comparison transformations may themselves require configuration, depending on the transformation used. See the help pages for the individual transformations for a full guide.

Comparison Options

The comparison options vary depending on the comparison used. For a full guide to the available options, see the help pages for the individual comparisons. For example, the following options are available for the Exact String match comparison (see Comparison: Exact String Match):

  • Match No Data pairs - determines if matching two values that contain No Data (nulls, empty Strings, or only non-printing characters) should return a "True" result (that is, the two values match), or a "No Data" result (as no data was found).

  • Ignore case? - determines if matching will be case sensitive or not. For example, if set, the values "John" and "JOHN" will match; if not set, they will not match.

Result Bands

Comparisons that yield numeric results (such as a match percentage, or an edit distance between two identifier values) have result bands, which allow you to configure distinct comparison results (to drive whether or not to match records automatically) for bands of results. Default result bands for each comparison are provided to illustrate this, and so that you do not always have to configure the result bands from scratch.

You can change the result bands for a comparison if you want to band results differently. For example, when using a Character edit distance comparison, you might simply want a rule that matches that identifier if the edit distance is 2 or less.

Note also the colors on the right-hand side of each result band. These are used in the Match rules pane (visible with the match processor open on the canvas, and the Match sub-processor selected) to provide a quick guide to the strength of the comparison result, and therefore a quick visual guide to the configuration of each match rule across several comparisons. Use the Invert Colors tick box to change the direction of the colors. Use Green to indicate a strong match, and Red to indicate a weak match, with various gradients in between.

Compound Comparison

Compound Comparisons allow more complicated configurations to be made by creating separate groups within the match configuration. Comparisons and scores can be configured separately on these groups, and overall scores and other data are able to be calculated from results from these groups. This allows matches to be created in a more efficient and flexible way.

The main benefits are:

  • Ease of setting up match configuration - many less rules will have to be specified explicitly, therefore meaning much less configuration time required to set up

  • Flexibility - rules will be able to take into account the matching or non-matching of all groups within the rule, meaning more accurate information will be able to be returned on a match

  • Externalization - Allowing weightings across the new groups and allowing these to be externalized will help external configuration of match. For example, giving a higher weighting to a logical group can be used increase the contribution that this group's score can make towards an overall score. It will also be possible to turn off a logical group completely.

Scoring

You can define a Match Rule to take its score from the outputs of a Compound Comparison, or a combination of outputs from several Compound Comparisons.

For example, you can combine the outputs from several Compound Comparisons to create a score, and rule name result that is a combination of the results from all of them, thus giving a "best match" result across all included Compound Comparisons. For example, if combining Compound Comparisons of Name, Address, and Phone, two records could be compared with a score of 99, Rule Name "Name Exact, Address Exact, Phone Last N".

Then, if defining a Rule that took the output combination from a combination of outputs of Compound Comparisons, where the score was >90, it would be possible to define that the Rule's score was the "Score" of the combination. Thus giving, in the example, a Rule name such as "Score > 90" and a score of 99.

There are two methods for calculating an aggregate score from the score outputs of multiple Compound Comparisons:

Weighted Average

When using the weighted average score, the contribution of each Compound Comparison to the overall score is proportional to its weighting. The overall score is a proportion of the maximum possible score that could have been obtained if all the contributing Compound Comparisons obtained the maximum possible score themselves.

However, if the Ignore if "No Data" option is selected for a particular compound comparison and it has a Category result of "No data" then that comparison does not contribute towards the overall score.

The configuration for a weighted average score requires configuring Compound Comparisons. Each Compound Comparison provides a weighting and a score between -100 and 100. The options that can be configured for each Compound Comparison are described in the following table:

Options Type Description Default Value

Weighting

Numeric>0

Defines how much (relative to other compound comparisons) a match on this should contribute to the overall score

1

Enabled

Checkbox

Defines whether the compound comparison should contribute to the overall score. Deselecting this checkbox is equivalent to removing the compound comparison from the score.

Checked

Include if "No Data" result

Checkbox

Defines whether the compound comparison should contribute to the score if it has a "No Data" result

Checked

The following configuration options are provided for normalizing the minimum and maximum score for the score's result:

Options Type Description Default Value

Normalize result between: Minimum

Numeric

Minimum score, which the result should be normalized between. If this is set to 0, the minimum resulting score is 0. If less than zero, the score is normalized between this negative value and the maximum score, but any resulting negative scores are returned as zero.

0

Normalize result between: Maximum

Numeric must be greater than the minimum score.

Maximum score, which the result should be normalized to. If this is set to 100, the maximum resulting score is 100. If any resulting scores produced are more than 100, then the value is returned as 100.

100

To calculate the weighted average score, the following algorithm is applied:

If the ith Compound Comparison has score si (which will be between -100 and 100), weighting wi and "Normalize result between: Maximum" of Max, "Normalize result between: Minimum" of Min:

This image shows the algorithm.

In this equation, each sum is only for those comparisons that have a result which is not "No Data", or where it is set to Include if "No Data".

The value of 200 in the equation is the range in which the sum of the weighted compound comparisons result can fall (since each compound comparison can have a result between -100 and 100). The sum of the weightings and scores has 100 added to it to give the numerator on that part of the equation as a proportion of that range.

Example

The following example provides configuration and resulting score for the Compound Comparison of Name, Address, Phone, E-mail, and Tax Number. The weighting and Include If "No Data" options are configurations while the score is the result of the Compound Comparison.

Compound Rule Score Weighting Include if "No Data"

Name

Name Exact

100

5

Y

Address

Address Exact

100

8

Y

Phone

Phone Last N

80

6

N

E-mail

E-mail Conflict

-5

7

N

Tax Number

No data

0

10

N

Normalize result between: Minimum = -20

Normalize result between: Maximum = 120

This image shows an equation.

Observe that Tax Number is ignored as it has "No Data" result and the "Include If No Data" option set to N.

This image shows an equation.

Based on this equation (Tax number not contributing) the result is:

-20 + (120-- 20)*(1745 + 2600)/5200 = 97

As another example, if the E-mail Compound Comparison matched on the "No Data" rule instead of E-mail conflict, then there would be a significant change in the result, as shown in the following equation:

This image shows an equation.

Since, E-mail is no longer contributing, the result would be calculated using the following equation:-20 + (120 - 20)*(1780 + 1900)/3800 = 116, which will be cut off to return a score of 100.

Geometric Average

The Geometric Average score is an alternative to the Weighted Average score. Because the score is derived from a product of a score from the component Compound Comparisons, it gives a distribution, which is less linear. As a result, a match between a small number of Compound Comparisons can produce a high score faster. Extra matching comparisons, in addition to this, contribute less and the score increases slowly. This is similar to the way scoring is done in Customer Data Services at the moment. For example, only two matching fields are required for a very high score, regardless of the contents of the other fields.

For calculating the score using Geometric Average, add any number of Compound Comparisons to the score's configuration. Each Compound Comparison has the configuration options described in the following table:

Option Type Description Default

Weighting

Numeric >0

Defines how much a match on this should contribute to the overall score

1

Enabled

Checkbox

Defines whether the compound comparison should contribute to the overall score. Turning this off is equivalent to removing the compound comparison from the score.

Checked

Note:

The Include if "No Data" result option is not relevant for Geometric Average score.

The Geometric Average score applies the following algorithm. Given that the provided score for each Compound Comparison is between -100 and 100. Take each compound comparison i, which has a weighting (wi) and a score (si)

First convert these values to a contribution (ci) for the compound comparison using the equation:If si>=0 ci = 1 + wi *si/100, results would range between 1 and 1 + wi.

If si<0 ci= 1/(1-wi*si/100), results would range between 1/(1 +wi) and 1.

The overall score is calculated as:

This image shows an algorithm.

This equation gives a score which tends towards 100 as the product of the scores becomes greater, and tend towards -infinity as the result is smaller. If all scores are "No Data" (giving ci= 1) then the score would be zero. The lowest possible score is set at 0. The resulting score is rounded to the nearest integer.

In the equation, x is the Score Factor that indicates if the score would be higher or lower.

You can choose the Score Factor from a drop down list available when configuring the Geometric Average score. It provides five options with predefined values for each of them, as described in the following table:

Option Value

Typical

0.5 (Default)

High

1

Higher

1/3

Lower

1/4

Lowest

1/5

Example

The following table shows the results for a pair of records for five Compound Comparisons:

Compound Rule Score Weighting Calculated Contributions

Name

Name Exact

100

5

1+ 5 *1 = 6

Address

Premise and Postal Code

80

8

1 + 0.8*8 = 7.4

Phone

Phone Last N

80

6

1 + 0.8*6 = 5.8

E-mail

E-mail conflict

-5

7

1/(1 + 0.05*7) = 0.74

Tax Number

No Data

0

10

1

The product of the contributions is 190.7556.

Using x = 0.5 gives a result of:

100*(13.81143- 1)/ 13.81143 = 93%

Match Rules

A match rule determines how many comparison results are interpreted during the matching process.

Each match rule results in a decision. There are three possible decisions:

  • Match

  • No Match

  • Review

These decisions are interpretations of a number of comparison results - for example if all comparison results match, this might be categorized as a Match. If only some comparisons match, you might prefer to review the matching records manually in order to decide whether records linked by the rule are matches or not.

Match rules are processed in a logical order, from top to bottom as they are displayed in the Match Rules pane. If match rule groups are in use, the match rules in the first match rule group are processed first, from top to bottom, before any of the rules in the next group are processed.

The complete set of match rules form a decision table for the comparison results.

If a pair of records meets the criteria for the top match rule, (for example, Comparison 1 = True, Comparison 2 = Close match), the match rule's decision will be applied to that pair of records. Match rules that are lower down in the decision table will not apply to pairs of records already linked (or not linked, in the case of rules with No Match decisions) by a higher rule.

Normally, it is best to use the strongest match rules (with Match decisions) at the top of the table. For example, a complete duplicate across all identifiers would be considered a very strong (exact) match, and would meet the criteria of the top rule. The match rules will then get 'looser' as you move down the table.

After matching has run, the links (termed 'relationships') formed by each match rule are available in the Rules view of the Results Browser, and you can drill down to the Relationships Output to see the related records.

Adding and Configuring Match Rules

Match rules are managed using the buttons underneath the match rules list.

Rules are added using the plus sign and deleted using the minus sign. Their position in the list is adjusted using the arrow buttons at the right hand side.

The check box to the left of the match rule allows you temporarily to disable a match rule from the next run of the match processor, without losing the rule altogether (as you may want to reinstate it later). This is particularly useful for pre-configured match processors, as some of the rules provided may not be required for your specific data, and so can quickly be disabled without deleting the rule.

Each match rule is configured on the right-hand side of the dialog. Each comparison is listed, and you need to decide which comparison results you want to interpret with a match decision (MATCH, NO MATCH, or REVIEW).

It is also very useful to create new rules by copying and pasting other rules, and making minor changes to the configuration on the right - for example because you want to create a new rule that varies only slightly from an existing rule. Standard keyboard shortcuts (<Ctrl> C and <Ctrl> V) can be used, and a right-click menu is also available.

The pasted rule will be added immediately below the original rule. You can then edit the rule name, change its configuration, and move it to the appropriate place in the table of rules.

Rules can be copied from one match rule group and pasted into another.

Configuring Comparison Results in Match Rules

For each configured comparison, it is possible to select a comparison result for the match rule. As different comparisons offer different results, the possible results for a comparison vary. For example, the Exact String Match comparison may return one of the following results:

  • True (that is, the strings match)

  • False (that is, the strings do not match)

  • No data (that is, one or both of the values being compared contained No data)

When selecting the result of the comparison in a match rule, you can therefore choose any of the above results, or you may choose *, meaning 'Any result'.

Note that all comparisons offer a 'No data' result. Comparing a value containing data with a Null or empty string value will always give a No data result. Comparing two Null or empty strings gives a No data result only if the Match No Data Pairs option (also on all comparisons) is set to No. If Match No Data Pairs is set to Yes, the two No Data values will be matched with the maximum result for the comparison (for example, True, for the case of an Exact String Match).

Match Rule Groups

Match Rules are collated into groups. A match rule group consists of a set of match rules that perform a similar function. Match rules in a match rule group can be managed as a unit, including:

  • Enabling or disabling the rules in the group;

  • Changing the decision for all the rules in the group;

  • Changing the comparison used by all the rules in the group;

  • Moving the position of the group in the decision table.

The rules in a match rule group form a contiguous set of rules in the decision table. That is, for any given group, it is not possible for rules that are not part of the group to be interspersed with the rules in that group.

The match rules displayed are those which are associated with the selected group.

It is possible to ignore match rule groups completely. By default, every match processor has a 'default' match rule group, into which all the match rules are placed. If you do not create any other match rule groups, then grouping will have no effect on the match rule configuration.

Match Rule Group Controls

Match rule groups are managed in a similar way to the match rules themselves. Again, buttons underneath the list are used to add, remove and reorder the match group rules.

Deleting a match rule group deletes all the match rules within the group.

Bulk changes to the rules in a group can be made via the match rules group right-click menu.

For example, all the rules in the selected group are to be disabled. You can also change the match decision of all the rules in a group, or apply a comparison to all the rules in a group, via this mechanism.

Relationships Output

The Relationships tab allows you to configure the Relationships output from the matching process.

Relationships are links between two records, created by automatic match rules and manual decisions. The same record may be related to more than one other record, and therefore might exist in more than one relationship, but each relationship is always for a distinct pair of records.

The relationships output is available as an output from each matching processor, and can be used for writing and exporting to an external database or file, or for further processing, such as profiling. It is also available as a Data View in the Results Browser. Finally, it is used in the drilldowns from the Rules and Review Status summary views for a match processor.

The relationships output has a default set of attributes, and a default set of output records (one for each relationship formed in matching). However, you can change the set of attributes that form the output, and you can change the set of relationships to output.

Changing the Attributes

The attributes that make up the default relationships data are listed on the left-hand side of the configuration dialog.

The relationships data outputs a single record for each relationship created by your matching process. Each record in the output data therefore contains information from two matching records.

The default format includes the following attributes by default, as shown on the left-hand side of the screen:

Table 1-113 Attributes in Default Relationships Data

Attribute Name Description Attribute Value

ReviewGroup

[Match Review only]

Review Group Id

The generated Id of the review group that each relationship belongs to.

Review groups are complete groups of inter-related records. Each record in a relationship must therefore be in the same review group.

MatchGroup

[Match Review only]

Match Group Id

The internal Id of the match group that the first record in the relationship belongs to.

Match groups do not consider review relationships, by default. Two records in a review relationship will therefore be in different match groups.

InternalId

[Match Review only]

Internal Record Id

The internal record Id of the first record in the relationship.

DataStreamName

[Match Review only]

Record's Data Stream Name

The name of the input data stream for the first record in the relationship.

RelatedMatchGroup

[Match Review only]

Match Group Id

The internal Id of the match group that the second (related) record in the relationship belongs to.

RelatedInternalId

[Match Review only]

Internal Record Id

The internal record Id of the second (related) record in the relationship.

RelatedDataStreamName

[Match Review only]

Record's Data Stream Name

The name of the input data stream for the second (related) record in the relationship.

Rule

Match Rule Name

The name of the match rule that created the relationship.

RuleDecision

Relationship Decision Value

The match decision of the relationship.

ReviewStatus

Relationship Review Status

The review status of the relationship (No Review Required, Awaiting Review, or User Reviewed).

[identifier name]

Value from identifier: [Identifier name]

An attribute for each identifier value from the first record in the relationship.

related_[identifier name]

Value from identifier: [Identifier name]

An attribute for each identifier value from the second (related) record in the relationship.

[ComparisonName]_Element

Array of same type as attribute being matched

The element from the first mapped attribute to the identifier that contributed to the “best" result. If there are multiple pairs of elements contributing to the “best" match, this will contain multiple values, that will pair in order with those in the below attribute.

[ComparisonName]_RelatedElement

Array of same type as attribute being matched

The element from the second mapped attribute to the identifier that contributed to the “best" result. If there are multiple pairs of elements contributing to the “best" match, this will contain multiple values, that will pair the order with those in the above attribute.

[ComparisonName]_Index

Number Array

The index of the element from the first mapped attribute to the identifier that contributed to the “best" result (1-indexed). If there are multiple pairs of elements contributing to the “best result, this will contain multiple values that will pair in order with those in the below attribute.

[ComparisonName]_RelatedIndex

Number Array

The index of the element from the second mapped attribute to the identifier that contributed to the “best" result (1-indexed). If there are multiple pairs of elements contributing to the “best" result, this will contain multiple values that will pair in order with those in the above attribute.

To keep the default format for the relationships output, keep the Auto Attribute Selection option ticked at the bottom of the dialog. Note that the attributes in the output may still change, as attributes are included for each identifier. Adding or removing identifiers will change the attributes in the default output.

If you want to customize the output, you can choose to untick this box, and add or remove attributes. A number of attributes are available to add. You can add values from any of the input attributes to the match process, for either or both records in the relationship, and also a number of additional attributes that are made available from the matching process, such as the REVIEW_USER (the user that made the last manual decision on the relationship, if any), the REVIEW_DATE (the date of the last manual decision), COMMENT (the last comment made on the relationship during the review process), COMMENT_USER (the user that made the last comment) and Case Management Extended Attributes (if Case Management is in use).

Note that if you change the output to a custom format, for example, by adding attributes, the Auto Attribute Selection option is automatically de-selected. This means that adding identifiers will not automatically add attributes to the output, though you can still add them manually if required.

Changing the Set of Relationships

There are a number of options available for changing the set of relationships that are output:

Table 1-114 Options for Changing the Set of Relationships

Option Description Default Setting

Generate relationships output

Determines whether or not to generate the relationships output (at all) or not. For example, if you have fully developed the matching process, and you are not using the relationships output, you can save on performance by not generating it.

Selected

Output match relationships

Determines whether or not to output relationships with a Match decision.

Selected

Output review relationships

Determines whether or not to output relationships with a Review decision.

Selected

Output automatically reviewed relationships

Determines whether or not to output relationships that have been reviewed by an automatic rule.

Selected

Output manually reviewed relationships

Determines whether or not to output relationships that were manually reviewed.

Selected

Output relationship awaiting review

Determines whether or not to output relationships that are awaiting review.

Selected

Output manual no match relationships

Determines whether or not to output 'relationships' that initially had a Review decision (by automatic rule) but which were given a No Match decision during review. For example, if you want to output a full audit trail of the decisions made during the review process, you might select this option, and de-select the options above.

Not selected

Match rules to include

Allows you to select whether or not to output relationships created by individual match rules

All rules selected

Changing the Set of Match Groups

There are a number of options available for changing the set of match groups that are output:

Table 1-115 Options for Changing the Set of Match Groups

Option Description Default Setting

Generate Match Groups report

Determines whether or not to generate the match groups output (at all) or not. For example, if you have fully developed the matching process, and you are not using the match groups output, you can save on performance by not generating it.

Selected

Output related records

Determines whether or not to output groups of related records.

Selected

Output unrelated records

Determines whether or not to output groups of unrelated records.

Selected, for Deduplicate and Consolidate processors.

Not Selected, for Enhance, Link and Advanced Match processors.

Match Groups Output [Match Review only]

The Match groups tab allows you to configure the match groups output from the matching process.

Match groups are the final groups of records from the matching process. Each working record that is input to the matching process is output in a match group, possibly with other matched records. The groups consist of records that are related via Match decisions. Groups may contain a single record, if it has not been matched to any others. There is an option whether or not to output these unrelated records (groups of 1).

The match groups output is available as an output from each matching processor. It can be used for writing and exporting to an external database or file, or for further processing, such as profiling. It is also available as a Data View in the Results Browser. Finally, it is used in the drilldowns from the Matching and Match Groups summary views for a match processor.

The match groups output has a default set of attributes, and a default set of output records. However, you can change the set of attributes that form the output, and you can change the set of groups to output.

Changing the Attributes

The attributes that make up the default match groups data are listed on the left-hand side of the configuration dialog.

The match groups data outputs the working records input into the matching process, organized into groups according to the way that they were matched to other records.

Note:

Records from reference data streams are only included in match groups if they are related to working records. Where a match group contains a single record, that record is always from a working data stream.

The default format includes the following attributes by default, as shown on the left-hand side of the screen:

Table 1-116 Attributes in Default Match Groups Data

Attribute Name Description Attribute Value

MatchGroup

Match Group Id

The internal Id of the match group that each record belongs to

Note: match groups do not consider review relationships, by default. This can be changed using an Advanced option.

InternalId

Internal Record Id

The internal record Id of each record.

InputName

Record's Input Name

The name of the input data stream for the record.

MatchGroupSize

Match Group Size

The total number of records in the match group of the record.

[identifier name]

Value from identifier: [Identifier name]

An attribute for each identifier value from the first record in the relationship.

To keep the default format for the match groups output, check the Auto Attribute Selection option at the bottom of the dialog. Note that the attributes in the output may still change, as attributes are included for each identifier. Adding or removing identifiers will change the attributes in the default output.

If you want to customize the output, you can choose to un-check this box, and add or remove attributes. You can add values from any of the input attributes to the match process.

Note that if you change the output to a custom format, for example, by adding attributes, the Auto Attribute Selection option is automatically de-selected. This means that adding identifiers will not automatically add attributes to the output, though you can still add them manually if required.

Changing the Set of Match Groups

There are a number of options available for changing the set of match groups that are output:

Table 1-117 Options for Match Groups

Option Description Default Setting

Generate Match Groups report

Determines whether or not to generate the match groups output (at all) or not. For example, if you have fully developed the matching process, and you are not using the match groups output, you can save on performance by not generating it.

Selected

Output related records

Determines whether or not to output groups of related records.

Selected

Output unrelated records

Determines whether or not to output groups of unrelated records.

Selected, for Deduplicate and Consolidate processors.

Not Selected, for Enhance, Link and Advanced Match processors.

Alert Groups Output [Case Management only]

The Alert groups tab allows you to configure the alert groups output from the matching process.

The groups output is available from each matching processor. It can be used for writing and exporting to an external database or file, or for further processing, such as profiling. It is also available as a Data View in the Results Browser. Finally, it is used in the drilldowns from the Matching and Match Groups summary views for a match processor.

Alert groups are the collected sets of records from the matching process form alerts for use in the review process. Each working record that is included in a relationship by the matching process is output in an alert group, possibly with other matched records. The groups consist of records that are related via Alert Key.

Any records which have not been matched to any others will not be included in any alert groups, and will not be assigned an Alert Key. These singleton records can optionally be included in the Alert Groups output.

The alert groups output is pre-configured with a default set of output attributes and a default selection of output groups. These default configurations can be changed on the Alert Groups tab of the Match processor dialog.

Changing the Attributes

The attributes that are output in the alert group data are listed on the left-hand side of the configuration dialog.

Alert groups contain the working records input into the matching process, organized into groups by their Alert Key.

Note:

Records from reference data streams are only included in alert groups if they are related to working records.

The default format includes the following attributes by default, as shown on the left-hand side of the screen.

Table 1-118 Attributes in Default Alert Groups Data

Attribute Name Description Attribute Value

CaseKey

Case Key

The Case Key of the records in the alert group.

AlertKey

Alert Key

The Alert Key used to collect the records into the alert group.

InputName

Record's Input Name

The name of the input data stream for the record.

InternalId

Internal record ID

The internal identifier of the record.

MatchGroupSize

Match Group Size

The total number of records in the alert group of the record.

[identifier name]

Value from identifier: [Identifier name]

An attribute for each identifier value from the first record in the relationship.

To keep the default format for the alert groups output, check the Auto Attribute Selection option at the bottom of the dialog. Note that the attributes in the output may still change, because attributes are included for each identifier. Adding or removing identifiers will change the attributes in the default output.

If you want to customize the output, you can choose to uncheck this box, and add or remove attributes. You can add values from any of the input attributes to the match process.

Note that if you change the output to a custom format, for example, by adding attributes, the Auto Attribute Selection option is automatically de-selected. This means that adding identifiers will not automatically add attributes to the output, though you can still add them manually if required.

Changing the Output Set of Alert Groups

There are a number of options available for specifying which alert groups will be output:

Table 1-119 Options for Alert Groups

Option Description Default Setting

Generate Alert Groups report

Determines whether or not to generate any alert groups output at all. Once you have finished developing the matching process, you can improve the performance of the process by disabling the alert groups output.

Selected

Output related records

Determines whether or not to include records which are found in alert groups (that is, they have been matched with other records).

Selected

Output unrelated records

Determines whether or not to output records which are not part of any alert groups (that is, they have not been matched with any other records).

Selected, for Deduplicate and Consolidate processors.

Not Selected, for Enhance, Link and Advanced Match processors.