High Volume Batch Deduplication

Batch deduplication of account or contact records in Oracle Customer Data Management Cloud Service by duplicate identification or resolution.

Batch deduplication consists of the following two steps:

  • Duplicate Identification: This step includes the identification of duplicate records by submitting a Duplicate Identification Batch job.

    You can define and submit this job from the Duplicate Identification page.

  • Duplicate Resolution: This step includes the resolution of the duplicates, typically by merging each set of duplicate records.

    You can resolve the duplicates either automatically by submitting the Duplicate Identification Batch job (called Automerge) or manually by submitting records in bulk from the Duplicate Identification Batch results review page.

For more details on these steps and for configuration of Automerge, see Merge Requests, Implementing Customer Data Management.

Both of these jobs are data-intensive operations that can read or update millions of rows of data in various Oracle Application Cloud tables. This document is intended to provide the guidelines and best practices for planning the data-sets, and applying appropriate configurations to achieve optimal throughput for high volume deduplication in Oracle Customer Data Management Cloud Service. Each customer's data set is unique. The time required to process a duplicate identification batch varies on the data shape.

Best Practices for High Volume Batch Deduplication

Customer Data Management merge is a data-intensive process that scans and updates a large number of tables in Oracle Applications Cloud, to correctly merge two or more Accounts or Contacts.

This section describes how you can use the following profile options to optimize the merge process:

  • Scope of Merge Process (ORA_ZCH_MERGE_SCOPE): You can use this profile option to define the scope of the merge process.

  • Master Record Selection Method (ORA_ZCH_SETMASTER): You can use this profile option to specify the method for selecting the master record in a merge request.

  • Create Automerge with Review (ORA_ZCH_AUTOMERGE_REVIEW): You can use the profile option to select an appropriate processing option for Automerge.

  • Maximum Number of Concurrent Merge Jobs (ORA_ZCH_MERGE_MAX_REQUEST_LIMIT): Specify the maximum number of merge jobs to be processed at a time. If you don't set the maximum limit, all merge jobs are submitted for concurrent processing.

You can set these profile options in the Setup and Maintenance work area using the following:

  • Offering: Customer Data Management

  • Functional Area: Customer Hub

  • Task: Manage Customer Hub Profile Options

How You Define the Scope of the Merge Process

When you merge two or more records, the application scans hundreds of transactional and reference tables across all modules in the Oracle Applications Cloud such as, Core Customer Data Management, CRM, Financials, and Manufacturing. This can make merge a data-intensive and time consuming process. However, you can use the Scope of Merge Process (ORA_ZCH_MERGE_SCOPE) profile option to define and limit the scope of merge process in an implementation so that the application scans only the necessary business areas. This optimizes the size of the merge memory and execution profile.

The following options are supported by the Scope of Merge Process profile option:

  • All Functional Areas (ALL): This is the default option and scans across all areas of Oracle Applications Cloud. You use this option when there's a global implementation running various modules of Oracle Applications Cloud such as, Core Customer Data Management, CRM, Financials, and Manufacturing.

  • All Customer Relationship Management Related Areas (CRM): This option limits the scope of the process to handle all the CRM entities such as, Opportunities, and Leads, core Customer Data, Common Entities such as, Notes, and Activities, and Custom Objects. You use this option when there's a CRM implementation along with the use of Customer Data Management functionality.

  • Customer Data Management Specific Areas: This option limits the scope of the process to core Customer Data, Common Entities such as Notes and Activities, and Custom Objects. You use this option during the initial customer data consolidation and to achieve best performance for Customer Data management, implementations.

Note: The profile option settings can be changed at any time, if additional modules are turned on the instance. For instance, the Customer Data Management option might be used during initial consolidation and cleanup of customer data and then changed to CRM or ALL options if other modules are implemented later.

How You Define the Master Record Selection Method

The performance of the merge process also depends on the method used to select the master record. You can use the Master Record Selection Method (ORA_ZCH_SETMASTER) profile option to specify an appropriate option for selecting the master party automatically during merge. The following options are supported by the Master Record Selection Method profile option:

  • Select master record using survivorship rule (RULE): This is set as the default master selection option. This option selects the master record based on the Set Master rules defined in the Manage Survivorship task. These rules are applied using the Oracle Business Rules component. You use this option when there are complex business rules required to pick the master.

  • Select the oldest record as master (OLDEST): This option selects the party with the earliest creation date as the master.

  • Select the newest record as master (NEWEST): This option selects the party with the newest creation date as the master.

  • Select master based on duplicate identification results (ANY) - This option randomly selects one of the parties in the set as a master.

How you Configure Automerge Action

Automerge is the process of automatically merging identified duplicate sets that exceed the automerge threshold. The process is initiated by creating a duplicate identification batch with the Create Merge Request option. You can use the Create Automerge with Review (ORA_ZCH_AUTOMERGE_REVIEW) profile option that has Yes and No values to select an appropriate processing option for Automerge:

  • Create merge requests only for duplicate sets exceeding the automerge threshold: To enable this processing option, select No as the value for the Create Automerge with Review (ORA_ZCH_AUTOMERGE_REVIEW) profile option. If you select this option, the application processes duplicate sets as follows:

    • The application preprocesses the duplicate sets exceeding the automerge threshold and merges them into a single job. This option is ideal for processing high volumes of merge requests when the duplicate sets require no review or any further action.

    • Duplicate sets not exceeding the automerge threshold remain in Not Reviewed status in the Duplicate Identification page, from where they can be manually converted to merge requests, or rejected, if needed.

  • Create Merge Requests for all duplicate sets: To enable this processing option, select Yes as the value for the Create Automerge with Review (ORA_ZCH_AUTOMERGE_REVIEW) profile option. If you select this option, merge requests are created for all duplicate sets. All requests are first pre-processed. Then they're either merged (if they exceed the automerge threshold), or put in "New" status (so that they can be reviewed) if they don't exceed automerge threshold.

How you Control the Concurrency of Merge Processes

Each merge request executes as a single batch process in the Enterprise Service Scheduler (ESS). The number of merge requests executing concurrently is limited by the number of batches being concurrently processed. Therefore, if there are other ESS processes competing for threads when there are a large number of merge requests queued up, then the scheduling of those jobs could get delayed.

During initial consolidation of customer data, it's advantageous to use the maximum available threads. However, in steady state when there are other processes running in the background, it may be necessary to limit and control the number of concurrent merge ESS jobs.

To achieve this, set the following profile option to an appropriate value:

  • Profile Option Name: Maximum Number of Concurrent Merge Jobs

  • Profile Option Code: ORA_ZCH_MERGE_MAX_REQUEST_LIMIT

    • When the profile option value is left blank or when no value is defined, the ESS will allocate merge requests according to the threads available. This is recommended during initial high volume data processing.

    • After initial data load, set the profile option value to ten or lower if other processes such as Web services or other ESS jobs are running.