Frequently Asked Questions about Matching and Merging

How are match scores calculated and how do I use them?

The exact mechanism for calculating a match score depends on several factors such as the data type (string, integer, date, etc.), the match type (contains, similar to, between), and whether or not the match operations are combined. But in general a higher match score indicates that more matching criteria have been met for a candidate.

Let's look at some examples to better understand how a match score is calculated.

Example 1

In this example, the match rule is matching on the Name property, which is a string:


match score example 1

The target name that we are matching to is "Atkins Pearson International", and the source name that we are trying to match is "Baker H. International".

In this example, there are 28 characters in the target name, and the source name matches 17 of them ("a", "k", two spaces, and all of "International"). Therefore, approximately 61% (17 of 28) of the characters match, giving a match score of 61.

Example 2

In the second example, we are matching on two string properties, Name and Industry:


match score example 2

The target name we are matching is "Andrews Corporation", and the target industry is "Diagnostics & Research". The source name is "Andrews", and the source industry is "Diagnostics & Research".

In this case, the source Name matches 37% (7 of 19) of the characters in the target Name, and the source Industry matches 100% of the target Industry. Since this is a combined match, the average is taken (37+100)/2, giving a match score of 68.

Other data types and match operators perform similar calculations to determine the match score.

Should I use a code or a data source name in a load file?

When a request file that includes data source information is processed, the data source for each node is identified in two ways:

Because data source names can change over time, it is a best practice to always configure a code for your data sources and to use that code in your request files instead of the data source name.

Can I create a survivorship rule for a registered data source?

Survivorship rules determine which properties and relationships from an unregistered data source get merged from an accepted match candidate into a matching node in a node type. For registered data sources, you use a node type converter to determine how the properties and relationships from a match candidate are merged into a matching target nodes. See Working with Node Type Converters. You do not need to create survivorship rules for registered data sources.

Tip:

While node type converters for registered data sources determine which properties are available to be merged from an accepted match candidate into a matching node in a node type, you can still decide which of those properties gets merged. Use the Source Node and Target Node radio buttons in the Match Results panel to determine which values to keep. See Selecting the Properties to Keep During a Merge.

When creating match rules, is it better to add multiple criteria to a rule or to create separate rules?

The decision on whether to use separate match rules for specific identifying properties or as multiple criteria within a single rule is one best determined through experimentation by the implementing organization. When tuning rules in a test environment, stewards may evaluate which rule is better at generating fewer false positives.

In principle, combining identifying properties in a single match rule performs an "AND" operation while using separate rules for a specific identifying property would evaluate its value as a match determinant on a singular basis, thus serving as a potential "OR" operation across multiple rules as they are evaluated for a single combination of node type and data source.

One scenario where it might make sense to create multiple separate match rules instead of adding multiple criteria to a combined single rule is if you are auto-accepting match rules above a certain match score threshold and you expect that some criteria will meet that threshold while others may not.

For example, consider a scenario where you automatically accept matches above 90%, and you have two criteria for matching, with one matching at 100% and the other at 50%:

  • If you have two separate match rules, the match rule with 100% match will automatically be accepted.
  • If you have one match rule that contains both criteria, the average match score is 75%, which is below your threshold of 90% for automatically accepting the match. The match will not be auto-accepted.

So, in this example the decision to combine the criteria or to create separate match rules would depend on whether or not you wanted to automatically accept some matches above a certain threshold.

I accidentally accepted a match that I did not mean to. Can I review my previously accepted matches and undo them?

After a request with a matched and merged item has been completed and closed, you cannot undo that match and merge operation to the existing node. Before the request is completed and closed, you can undo the match to an existing node in the following ways:

  • Before applying changes (by clicking Reject or Skip in the matching workbench)
  • After applying changes but before the request is completed (by deleting the request item and recreating it separately)

However, after the request has been completed and closed you can no longer undo that match. You must delete and re-add the existing target node to delete the stored match information.

When are node links established between nodes?

Node links are established between a source and a target node when an existing target node is updated by an incoming source node that has a defined data source. For details, see Understanding Node Links and Data Sources.