Enterprise Data Quality Matching Configurations

Enterprise Data Quality (EDQ) matching configurations comprise attributes and parameters for real-time and batch matching of entities to prevent duplicate entries and identify existing duplicates. EDQ real-time and batch matching are available for account and contact entities.

You have the option of using either the predefined ready-to-use configuration or copying and adapting it to your address matching requirements. The predefined EDQ matching configurations applicable for both real-time and batch matching are:

  • Account Duplicate Identification

  • Contact Duplicate Identification

These configurations are used to identify the duplicate account and contact entries. You can review and edit these predefined matching configurations to optimize the matching functionality to meet your needs.

EDQ Matching Process

In EDQ matching process, the record added or updated to the application for comparison is called a driver record. And, the records that are compared with the driver record are called the candidate records. Driver records are compared with each other, but candidate records are never compared with other candidates. The EDQ real-time matching process compares a single driver record against many candidates and returns possible duplicate records based on matching attributes and threshold. The batch matching process compares all driver records of the same type, such as account and contact, and identifies all possible matches within these sets of records.

The batch matching process runs in two modes, full batch and incremental batch. While the full batch mode matches all records against each other, the incremental mode matches a subset of records against all of their selected candidates. In batch matching, separate matching templates are provided that lets you specify different match rules. For example, you may want to minimize user intervention of adding customers in front end applications, and perform an exhaustive match on a regular basis.

The EDQ matching process for real-time and batch matching runs the EDQ Cluster Key Generation service and EDQ matching service for duplicate identification. The EDQ Cluster Key Generation service is called whenever a record is added or updated in an application. This service generates keys for records added as well as for the records that are updated in the application. These generated keys are stored in the application, which are then used to select the candidate records that may match to the data in the application.

The selected candidate records along with the driver record are returned to the EDQ matching service. Then, this service examines the records and decides which of the candidate records are a good match with the driving record. Once EDQ matching service arrives at the best match, it assigns a score to every duplicate record identified based on the strength of the match.

For more information about the EDQ matching process, see the Oracle Enterprise Data Quality Customer Data Services Pack Matching Guide at

http://docs.oracle.com/cd/E48549_01/doc.11117/e40737/toc.htm

Match Attributes

Match attributes define the attributes that are used for real-time and batch matching of the account and contact entities to identify duplicate entries. You use two types of attributes for matching:

  • Match Identifier: Specifies the EDQ attribute that you want to use for matching

  • Application Attributes: Specifies the application attribute that you want to use for matching

You can map the attributes in application with the corresponding EDQ attributes to create an attribute mapping. For example, for the Name EDQ attribute, you can select the Org.OrganizationName as the corresponding Organization attribute to create a mapping. You can define such attribute mappings for real-time matching, batch-data matching, or both.

When you map the attributes in the application with the corresponding EDQ attributes, you create a matching configuration setting for identifying duplicate entries. These settings are stored as matching keys in the application. Whenever you change the attribute mappings, you must regenerate matching key values for the new or updated accounts and contacts. You can regenerate matching key values using the Rebuild Keys option in the Edit Matching Configuration page.

Match Configuration Parameters

Matching configuration parameters are system-level parameters that control aspects of the data quality matching services.

The following parameters control matching operations for identification of duplicate entries such as account and contact in the database, between database and sets of data, such as import batches, or within sets of data to resolve them from merging or linking.

Score Threshold

  • Parameter Value: Between 0 and 100. Default Value: 90

  • Parameter Description: Specifies the score above which the matched records are returned by the matching service. Records equal to or greater than the score are considered as matches and the records with scores less than the threshold are rejected.

Match Results Display Threshold

Note: This match configuration parameter is enabled only for real-time matching.
  • Parameter Value: Between 0 and 100. Default Value: 10

  • Parameter Description: Controls the number of matched records that are returned by the real-time matching.

Preview Configuration

The Preview Configuration option lets you enter the following parameters to identify and view the duplicate matching records in real-time without rebuilding the keys.

  • Cluster Key Level: Returns records based on the cluster key level. This parameter has three options:
    • Limited: helps to identify a unique record. Example: exact name , phone number, address, and postal code.
    • Typical: helps narrow down a record among many records. Example: address and city, name and postal code.
    • Exhaustive: helps loosely identify a record. Example: postal code.
  • Score Threshold: Returns records based on score threshold.

  • Maximum Candidates: Returns records based on maximum candidates.

  • Match Results Display Threshold: Returns records based on the match results display threshold value.

Review Configuration Results

The Review Configuration Results option lets you check if the input account or contact entered for matching in the Edit Matching Configuration page returns the expected matched account or contact after the rebuilding of keys. Alternatively, in the Review Configuration Results page, you can enter the attribute information for one or more of the following matching configuration parameters that you want to match:

  • Cluster Key Level: Returns records based on the cluster key level.
    • Limited: helps to identify a unique record. You must specify an option that will uniquely identify a record. Here are some examples:
      • exact name
      • phone number
      You must specify either the exact name or the phone number.
    • Typical: helps narrow down a record among many records. You must specify a combination of options that will narrow down a record. Note that using name alone isn't considered for matching. You must provide an additional value such as either email, phone number, or address to find matches. Here are some examples:
      • a combination of address and city
      • a combination of name and postal code
    • Exhaustive: helps loosely identify a record. For example, postal code.
  • Score Threshold: Returns records based on score threshold.

  • Maximum Candidates: Returns records based on maximum candidates.

  • Match Results Display Threshold: Returns records based on the match results display threshold value.

How You Manage Level of Indirection

You can control the level and number of indirect duplicates for a driver record using a user defined profile option ORA_ZCQ_LEVEL_OF_INDIRECTION. This profile option lets you include indirect duplicates. For example, look at a duplicate set having possible drivers A, C and candidates B, D as follows:

A - B

A - C

C - D

Here, we delete duplicate pair C-D because its winner C is a matched record of A-C. Hence, we lose D as a potential duplicate. This would possibly be identified as a duplicate only in the next batch run. However, we know that D is an indirect duplicate of A. If we set up the value of the profile option ORA_ZCQ_LEVEL_OF_INDIRECTION as 1, you can consider D as a matched record in the first batch run itself. Therefore, the duplicate sets would be as follows:

A - B

A - C

A - D (because D is now an indirect duplicate of A).

Let's understand how the profile option ORA_ZCQ_LEVEL_OF_INDIRECTION controls the level of indirect duplicates with another example where we have the duplicate pairs as A-B, A-C, C-D, D-E, and E-F. In this case, setting the profile option value as 1 would mean that only the first level of indirect duplicates which is C-D is considered as part of A's duplicate set, causing the A-D pair to be formed. However, if we set the profile option value as 2, it would also extend to second level of indirection. Therefore, A-D and also A-E would be the duplicate pairs identified because of the A-C, C-D, and D-E indirection sequence.

To control the level and thereby number of indirect duplicates for a driver record using the profile option ORA_ZCQ_LEVEL_OF_INDIRECTION , perform the following steps:

  1. In the Setup and Maintenance work area, go to the Manage Administrator Profile Values task.

  2. On the Manage Administrator Profile Values page, search for and select the profile option.

  3. In the Profile Values section, click Add. A new row is added for you to specify the following conditions:

    • Profile Level: Specify the level at which the profile value is to be set. Select Site.

    • Profile Value: Select or enter the value, such as 1 or 2, depending on the required level of indirection.

  4. Click Save and Close.

    Note: Changes in the profile values take effect for a user on the next sign in.