10Identify Duplicates

This chapter contains the following:

Identify Duplicate Customer Information

You can identify potential duplicates in your database by creating and running duplicate identification batches. These batches, when complete, contain sets of two or more potential duplicate records, which are identified by the application for your review. You can analyze each of these duplicate sets and determine whether you want to resolve them by creating a merge or link resolution request.

While managing duplicate sets you can:

  • Use the Test Merge duplicate resolution request type to test your merge configurations, such as survivorship rules and agreement rules, without any impact your application data.

  • Resolve duplicates within each identified duplicate set through a merge or link request. A merge request combines duplicate records and a link request joins duplicate records. You can also create a generic request to select the resolution later.

  • Remove records that you don't want to include in the duplicate resolution request. You can also restore the previously removed records back to the set from the removed records table.

  • Mark a pair of records as nonduplicate, so that these records aren't identified as potential matches. You can remove the pair from the nonduplicate list by changing the end date.

  • Change the master record.

  • Submit the duplicate sets as merge or link requests for resolution.

You can create a duplicate identification batch using the following options:

  • Create the duplicate identification batch from scratch for a single or periodic run.

  • Create the duplicate identification batch from a copy of an existing duplicate identification batch.

You can choose between two of the following batch match modes while creating a duplicate identification batch:

  • Within the batch: Runs the duplicate identification process within the batch.

  • Against the registry: Runs the duplicate identification process against the registry.

Batch Creation for Single or Periodic Run

You can create duplicate identification batches for a single or periodic run. You are likely to run a batch periodically for an ongoing or a repetitive task, such as the execution of registry duplicate identification.

You can schedule the periodic batch to run at a specific time or at a specific interval. For example, you can schedule the batch to run daily, every night at 9 PM, weekly, or every Sunday at 10 PM.

Create New or Use a Copy of an Existing Duplicate Identification Batch

You can use a copy of an existing duplicate identification batch to quickly create a new one. . You can edit the batch details, such as subset rules or batch selection criteria rules, duplicate identification rules, and schedule of the batch process before you submit the batch.

But, if your duplicate identification requirements are unique, you should create a new batch instead of copying an existing one.

Identify Duplicates within the Batch

You can look for duplicates in a batch by using the Within the Batch Match mode. In this mode, the application includes records in a batch based on configured subset rule conditions or the batch selection criteria rules and checks for duplicates only among the records within that batch.

Identify Duplicates Against the Registry

You can identify duplicates across the database by using the Against the Registry Batch Match mode. In this mode, the records that meet the subset rule conditions are included in the duplicate identification batch. The application matches these records against one another as well as against other records in the database.

You define subset rules as a part of creating a duplicate identification batch. These rules, also known as batch selection criteria rules, retrieve a subset of records from the database in the duplicate identification batch for matching them against one another as well as against other records in the database. You can define these rules on the Create Duplicate Identification Batch page using the available objects, attributes, and operators, followed by an AND or OR condition.

In case the attributes available out of the box on this UI don't meet your business requirements you can add additional standard or custom account (organization) or contact (person) attributes.

To add new attributes to the batch selection criteria, you must add the required attributes to the lookup types ZCH_MATCH_OBJ_PERSON and ZCH_MATCH_OBJ_ORGANIZATION, as lookup codes. Before you update the lookup types, you must identify the display name and API (internal) name for each attribute that you want to add.

Identify Display Name and API Name for an Attribute

You must identify the display name and the API (internal) name of the attributes that you want to add as subset rule selection criteria.

You can find the list of standard attributes for accounts and contacts, along with their lookup code, display name (meaning), and API name (tag), from the topic "List of Additional Attributes for Subset Rule ". Use the following steps to identify the display name and the API name of the custom attributes of an account (organization) or contact (person) using Application Composer.

Note: You can view and update objects in Application Composer only if you're in an active sandbox.
  1. Sign in as a setup user, such as Master Data Management Application Administrator, and navigate to Application Composer.

  2. Expand the Standard Objects list and navigate to Account or Contact.

  3. Click Fields under Account or Contact. The Fields page appears.

  4. Select the Custom tab to see the list of available attributes.

  5. Click the Display Label link that you want to add. The Edit Custom Field page appears.

  6. Copy the Display Name and the API Name.

Update Attribute Lookups with New Attributes

You can add new attributes to the subset selection criteria, by adding the required attributes to the attribute lookup types for person (contact) object ZCH_MATCH_OBJ_PERSON or to the organization (account) object, ZCH_MATCH_OBJ_ORGANIZATION, respectively.

You can follow these steps to add the required attributes to the attribute lookup types as lookup codes:

  1. Sign in as a setup user, such as Master Data Management Application Administrator. In the Setup and Maintenance work area, go to the following:

    • Offering: Sales

    • Functional Area: Sales Foundation

    • Task: Manage Standard Lookups

  2. Search for and navigate to the task Manage Standard Lookups.

  3. Enter the required Lookup Type.

    • Use the lookup type ZCH_MATCH_OBJ_ORGANIZATION to add new custom or standard attributes to the subset rule selection criteria of the organization (account) object.

    • Use the lookup type ZCH_MATCH_OBJ_PERSON to add new attributes to the subset rule selection criteria of the person (contact) object.

  4. Click Search.

  5. Click New on the Lookup Codes section of the Manage Standard Lookups page.

  6. Add a Lookup Code for each attribute.

    • Lookup Code: An internal name for the attribute. This field is case-sensitive. The value should be in all capital letters and the space should be replaced with an underscore. For example, CUSTOM_ATTR can be a lookup code for custom attribute.

      Note: For date fields, you must suffix the lookup code with _DATE. This enables the application to identify the attribute value as date and display the date picker.
    • Display Sequence: The order in which the lookup code must be displayed. This isn't a mandatory field.

    • Meaning: The display name of the attribute. This field is mandatory and case-sensitive.

    • Description: The description of attribute that you see on the duplicate identification work area.

    • Tag: The API (Internal) Name of the attribute. This field is mandatory and case-sensitive.

  7. Click Save and close.

To hide or remove any attribute, you must delete the lookup code.

Note: You must publish the sandbox to activate the newly added attributes. Only after you publish the sandbox, the newly added attributes appears in the Subset Rules for Identifying Duplicates section of the Create Duplicate Identification Batch page. If the sandbox isn't published, the tag value with the attribute name is considered invalid.

List of Additional Attributes for Subset Rule

You can find a detailed description of the standard account and contact attributes available for use in Subset Rule Selection Criteria in this topic.

Note: For date related attributes, you must suffix the lookup code with _DATE. Standard Attributes for Accounts.

Standard Attributes for Accounts

This table lists the standard attributes for account (organization) along with their lookup code, display name (meaning), and API name (tag) that you can add as subset rule selection criteria for identifying duplicates:

Lookup Code Meaning (Attribute Display Name) Tag (API Name)

ANALYSIS_FY

Analysis Year

AnalysisFy

BANK_CODE

Bank Code

BankCode

BANK_OR_BRANCH_NUMBER

Bank or Branch Number

BankOrBranchNumber

BRANCH_CODE

Branch Code

BranchCode

BRANCH_FLAG

Branch Indicator

BranchFlag

BUSINESS_SCOPE

Business Scope

BusinessScope

CEO_NAME

Chief Executive Name

CeoName

CEO_TITLE

Chief Executive Title

CeoTitle

CERTIFICATION_LEVEL

Certification Level

CertificationLevel

CERTREASON_CODE

Certification Reason

CertReasonCode

CLEANLINESS_SCORE

Cleanliness

CleanlinessScore

COMMENTS

Comments

Comments

COMPLETENESS_SCORE

Completeness

CompletenessScore

CONG_DIST_CODE

Congressional District

CongDistCode

CONTROL_YR

Organization Control Year

ControlYr

CORPORATION_CLASS

Corporation Class

CorporationClass

CREATION_DATE

Creation Date

CreationDate

CURRENT_FY_POTENTIAL_REV

Current Fiscal Year's Potential Revenue

CurrFyPotentialRevenue

DATA_CLOUD_STATUS

Enrichment Status

DataCloudStatus

DATA_CONFIDENCE_SCORE

Data Confidence

DataConfidenceScore

DB_RATING

D&B Credit Rating

DbRating

DIS_ADV_IND

Disadvantaged Indicator

DisadvantagedIndicator

DOM_ULTIMATE_DUNSNUM_C

Domestic D-U-N-S number

DomesticUltimateDunsNumC

DO_NOT_CONFUSE_WITH

Do Not Confuse With

DoNotConfuseWith

DUNS_NUMBER_C

D-U-N-S number

DunsNumberC

DUPLICATE_INDICATOR

Duplicate Type

DuplicateIndicator

DUPLICATE_SCORE

Duplication

DuplicateScore

EMPS_AT_PRIMARY_ADDRESS

Number of Employees at Identifying Address

EmpAtPrimaryAdr

EMPS_AT_PRIMARY_ADDR_EST

Number of Employees at Identifying Address Estimated Qualifier

EmpAtPrimaryAdrEstInd

EMPS_AT_PRIMARY_ADDR_MIN

Number of Employees at Identifying Address Minimum Qualifier

EmpAtPrimaryAdrMinInd

EMPLOYEES_TOTAL

Number of Employees

EmployeesTotal

ENQUIRY_DUNS

ENQUIRY_DUNS

EnquiryDuns

ENRICHMENT_SCORE

Enrichment

EnrichmentScore

EXPORT_IND

Exporter Indicator

ExportInd

FISCAL_YEAREND_MONTH

Fiscal Year End Month

FiscalYearendMonth

GLOBAL_ULTIMATE_DUNSNUM_C

Global Ultimate D-U-N-S Number

GlobalUltimateDunsNumC

GROWTH_STRATEGY_DESC

Growth Strategy Description

GrowthStrategyDesc

GSA_INDICATOR_FLAG

GSA Indicator

GsaIndicatorFlag

HOME_COUNTRY

Home Country

HomeCountry

HQ_BRANCH_INDICATOR

HQ branch indicator

HqBranchInd

IMPORT_IND

Importer Indicator

ImportInd

INCORP_YEAR

Year Incorporated

IncorpYear

INTERNAL_FLAG

Internal

InternalFlag

JGZZ_FISCAL_CODE

Taxpayer identification number

JgzzFiscalCode

LABOR_SURPLUS_IND

Labor Surplus Indicator

LaborSurplusInd

LAST_ASSIGNMENT_DATE

Last Assigned Date

LastAssignedDate

LAST_ENRICHMENT_DATE

Last Enrichment Date

LastEnrichmentDate

LAST_SCORE_UPDATE_DATE

Last Score Date

LastScoreUpdateDate

LAST_SOURCE_UPDATE_DATE

Last Source Update Date

LastSourceUpdateDate

LAST_UPDATE_DATE

Last modified date

LastUpdateDate

LAST_UPDATED_BY

Last modified by

LastUpdatedBy

LAST_UPDATE_SOURCE_SYSTEM

Last Update Source System

LastUpdateSourceSystem

LEGAL_STATUS

Legal Status

LegalStatus

LINE_OF_BUSINESS

Line of Business

LineOfBusiness

LOCAL_ACTIVITY_CODE

Local Activity Code

LocalActivityCode

LOCAL_ACTIVITY_CODE_TYPE

Local Activity Code Type

LocalActivityCodeType

LOCAL_BUS_IDENTIFIER

Common Business Identifier

LocalBusIdentifier

LOCAL_BUS_IDEN_TYPE

Common Business Identifier Type

LocalBusIdenType

MINORITY_OWNED_IND

Minority-Owned Indicator

MinorityOwnedInd

MINORITY_OWNED_TYPE

Type of Minority-Owned Organization

MinorityOwnedType

MISSION_STATEMENT

Mission Statement

MissionStatement

NAMED_FLAG

Named Account

NamedFlag

NEXT_FY_POTENTIAL_REVENUE

Next Fiscal Year's Potential Revenue

NextFyPotentialRevenue

OOB_IND

Out of Business Indicator

OobInd

ORGANIZATION_NAME

Name

OrganizationName

OWNER_PARTY_ID

Owner ID

OwnerPartyId

PARENT_DUNS_NUMBER_C

Parent D-U-N-S Number

ParentDunsNumC

PARENT_SUB_IND

Subsidiary Indicator

ParentSubInd

PARTY_ID

Party Id

PartyId

PARTY_NUMBER

Registry ID

PartyNumber

PREF_CONTACT_METHOD

Preferred Contact Method

PreferredContactMethod

PREF_CONTACT_PERSON_ID

Preferred Contact Person Id

PreferredContactPersonId

PREF_FUNCTIONAL_CUR

Preferred Functional Currency

PrefFunctionalCurrency

PRINCIPAL_NAME

Principal Name

PrincipalName

PRINCIPAL_TITLE

Principal Title

PrincipalTitle

PUBLIC_PRIVATE_OWNER_FLAG

Private Ownership

PublicPrivateOwnershipFlag

RECENCY_SCORE

Recency

RecencyScore

REGISTRATION_TYPE

Registration Type

RegistrationType

RENT_OWN_IND

Rent or Own Indicator

RentOwnInd

SALES_PROFILE_STATUS

Sales Profile Status

SalesProfileStatus

SALES_PROFILE_TYPE

Type

SalesProfileType

SMALL_BUS_IND

Small Business Indicator

SmallBusInd

STOCK_SYMBOL

Stock Symbol

StockSymbol

TOTAL_EMP_EST_IND

Number of Employees Estimated Qualifier

TotalEmpEstInd

TOTAL_EMPLOYEES_IND

Number of Employees Includes Subsidiaries

TotalEmployeesInd

TOTAL_EMP_MIN_IND

Number of Employees Minimum Qualifier

TotalEmpMinInd

TOTAL_EMP_TEXT

Total Number of Employees

TotalEmployeesText

TOTAL_PAYMENT_AMOUNT

Total Payments

TotalPayments

UNIQUE_NAME_ALIAS

Organization Name

UniqueNameAlias

UNIQUE_NAME_SUFFIX

Name Suffix

UniqueNameSuffix

VALIDITY_SCORE

Validity

ValidityScore

WOMAN_OWNED_IND

Woman-Owned Indicator

WomanOwnedInd

YEAR_ESTABLISHED

Year Established

YearEstablished

Standard Attributes for Contact

This table lists the standard attributes for contact (person) along with their lookup code, display name (meaning), and API name (tag) that you can add as subset rule selection criteria for identifying duplicates:

Lookup Code Meaning (Attribute Display Name) Tag (API Name)

CERTIFICATION_LEVEL

Certification Level

CertificationLevel

CERT_REASON_CODE

Certification Reason

CertReasonCode

CLEANLINESS_SCORE

Cleanliness

CleanlinessScore

COMMENTS

Comments

Comments

COMPLETENESS_SCORE

Completeness

CompletenessScore

CREATED_BY

Created by

CreatedBy

CREATION_DATE

Creation date

CreationDate

DATA_CLOUD_STATUS

Enrichment Status

DataCloudStatus

DATA_CONFIDENCE_SCORE

Data Confidence

DataConfidenceScore

DATE_OF_BIRTH_DATE

Date of Birth

DateOfBirth

DATE_OF_DEATH_DATE

Date of Death

DateOfDeath

DECEASED_FLAG

Person Deceased

DeceasedFlag

DECLARED_ETHNICITY

Declared Ethnicity

DeclaredEthnicity

DEPARTMENT

Department

Department

DEPARTMENT_CODE

Department Code

DepartmentCode

DO_NOT_CALL_FLAG

Do not call

DoNotCallFlag

DO_NOT_CONTACT_FLAG

Do not contact

DoNotContactFlag

DO_NOT_EMAIL_FLAG

Do not e-mail

DoNotEmailFlag

DO_NOT_MAIL_FLAG

Do not mail

DoNotMailFlag

DUPLICATE_INDICATOR

Duplicate Type

DuplicateIndicator

DUPLICATE_SCORE

Duplication

DuplicateScore

ENRICHMENT_SCORE

Enrichment

EnrichmentScore

GENDER

Gender

Gender

INTERNAL_FLAG

Internal

InternalFlag

JGZZFISCAL_CODE

Taxpayer identification number

JgzzFiscalCode

JOB_TITLE

Job Title

JobTitle

JOB_TITLE_CODE

Job Title Code

JobTitleCode

LAST_ASSIGNMENT_DATE

Last Assigned Date

LastAssignedDate

LAST_CONTACT_DATE

Last Contact Date

LastContactDate

LAST_ENRICHMENT_DATE

Last Enrichment Date

LastEnrichmentDate

LAST_KNOWN_GPS

Last Known Location

LastKnownGPS

LAST_SCORE_UPDATE_DATE

Last Score Date

LastScoreUpdateDate

LAST_SOURCE_UPDATE_DATE

Last Source Update Date

LastSourceUpdateDate

LAST_UPDATE_DATE

Last modified date

LastUpdateDate

LAST_UPDATED_BY

Last modified by

LastUpdatedBy

LAST_UPDATE_SOURCE_SYS

LastUpdateSourceSystem

LastUpdateSourceSystem

MARITAL_STATUS

Marital Status

MaritalStatus

MARITAL_STATUS_EFF_DATE

Marital Status Effective Date

MaritalStatusEffectiveDate

NAMED_FLAG

Named Contact

NamedFlag

OWNER_PARTY_ID

Owner ID

OwnerPartyId

PARTY_ID

PartyId

PartyId

PARTY_NUMBER

Registry ID

PartyNumber

PERSON_ACADEMIC_TITLE

Academic Title

PersonAcademicTitle

PERSONAL_INCOME

Annual Income

PersonalIncome

PERSON_FIRST_NAME

First name

PersonFirstName

PERSON_INITIALS

Initials

PersonInitials

PERSON_LAST_NAME

Last name

PersonLastName

PERSON_LAST_NAME_PREFIX

Last Name Prefix

PersonLastNamePrefix

PERSON_MIDDLE_NAME

Middle Name

PersonMiddleName

PERSON_NAME_SUFFIX

Suffix

PersonNameSuffix

PERSON_PRENAME_ADJUNCT

Prefix

PersonPreNameAdjunct

PERSON_PREV_LASTNAME

Previous Last Name

PersonPreviousLastName

PERSON_SECOND_LASTNAME

Second Last Name

PersonSecondLastName

PERSON_TITLE

Title

PersonTitle

PLACE_OF_BIRTH

Place of Birth

PlaceOfBirth

PREF_CONTACT_METHOD

Preferred Contact Method

PreferredContactMethod

PREF_FUNCTIONAL_CUR

Preferred Functional Currency

PrefFunctionalCurrency

RECENCY_SCORE

Recency

RecencyScore

RENT_OWN_IND

Rent or Own Indicator

RentOwnInd

SALES_AFFINITY_CODE

Affinity

SalesAffinityCode

SALES_BUYING_ROLE_CODE

Buying Role

SalesBuyingRoleCode

SALES_PROFILE_STATUS

Sales Profile Status

SalesProfileStatus

SALES_PROFILE_TYPE

Type

SalesProfileType

SALUTATION

Salutation

Salutation

UNIQUE_NAME_SUFFIX

Name Suffix

UniqueNameSuffix

VALIDITY_SCORE

Validity

ValidityScore

You can create a duplicate identification batch and define subset rules to retrieve a subset of the records to identify duplicates within the batch or in the database.

Subset rules, also known as batch selection criteria rules, specify the criteria for retrieving a subset of records in the duplicate identification batch. The data quality engine identifies potential duplicates from this subset of records based on one of these rules:

  • Match all keywords: Select this option to perform an AND operation.

  • Match any keyword: Select this option to perform an OR operation.

Now that you have an overview of the task, let's first create a duplicate identification batch to identify duplicate persons in the registry, and then create a rule to retrieve a subset of records where the person name contains John and the address contains Redwood.

  1. Navigate to the Duplicate Identification work area as follows: Navigator > Customer Data Management > Duplicate Identification .

  2. Click Create menu option or button. The Create Duplicate Identification Batch page appears.

  3. Enter a batch name and description.

    Note: Another way is to copy an existing duplicate identification batch and quickly create a new batch from it. You can modify the details for this batch before submitting it.

  4. Specify the Batch Match Mode such as Against the Registry or Within the Batch.

    In the Within the Batch Match mode, the duplicate identification is limited to the records in a batch that meet the subset rule conditions. In the Against the Registry Batch Match mode, the process aggregates the records that meet the subset rule conditions in a batch, and these records are matched against one another as well as against other records in the database.

  5. Specify the Party Type as Person.

  6. Specify the Automatic Processing option as Create Merge Request to merge the duplicate persons.

  7. Provide the Batch Options. The batch options available depends upon the selectedAutomatic Processing Option. The following options are available when Create Merge Request is selected as the Automatic Processing Option:

    • Select an appropriate value for Cluster Key Level such as Typical.

    • Enter a value between 1 and 101, such as 70 for Match Threshold.

    • Enter a value between 1 and 101, such as 75 for Automerge Threshold.

      Note: You need to keep in mind that the Automerge Threshold and Autolink Threshold values that you provide in the Batch Options area override the values set in the Manage Customer Hub Profile Options page.
    • Select Send Notifications to notify the status of the duplicate identification batch to all interested parties such as initiator or submitter. For more information about these statuses, see How You Merge Duplicate Records in the Related Topics section. The default value for the Send Notifications field is set using the Merge Request Notifications option in the Manage Customer Data Management Options Setup and Maintenance task. You can't set the value of the Send Notifications field if the Merge Request Notifications option is set to disable all notifications. For more information about Merge Request Notifications, see Duplicate Resolution Simplified Profile Options topic in the Related Topics section.

  8. Click Add menu option or button under Duplicate Identification Batch: Selection Criteria.

  9. Specify the Apply Rules options as Match any keyword.

  10. Enter the following sample information in the Duplicate Identification Batch: Selection Criteria table:

    Object Attribute Operator Value

    Person

    Name

    Starts with

    John

    Address

    Address Line 1

    Contains

    Redwood

  11. Click Save and Close or Schedule per your requirement.

After you have identified potential duplicates in your database through a duplicate identification batch, you can resolve these duplicate sets by creating and submitting a duplicate resolution request.

To create and submit a resolution request:

  1. Navigate to the Duplicate Identification work area as follows: Navigator > Customer Data Management > Duplicate Identification.

  2. Click on the Batch ID of the batch for which that you want to review and resolve duplicate sets. The application displays a list of sets containing duplicate records.

  3. Click on the Duplicate ID of the set that you want to review for resolution.

  4. Review the duplicate records in the selected set and determine whether you want to merge or link these records.

  5. Click Create Request and specify an appropriate duplicate resolution request type, such as Merge (to combine duplicate records), Link (to join duplicate records), Test Merge (to test the merge configuration without impacting the application data) or Generic (to select the resolution later). Once the request is submitted, the application generates a Request ID, which you can use to track the status of the duplicate resolution process.

    Note: The master record designation within the duplicate set is a preliminary value and the final determination of the master record occurs within the duplicate resolution process based on the duplicate resolution Master Record Selection Setting option.

View Nonduplicate Mapping

You mark a pair of records as nonduplicate, so that this pair isn't identified as a potential match. You can view such nonduplicate records on the Manage Nonduplicate Records, UI page using these steps:

  1. Navigate to the Duplicate Identification work area.

  2. Click Tasks.

  3. Click Manage Nonduplicate records. The Manage Nonduplicate Records page appears.

  4. Search for the nonduplicate records.

You can modify the period for which a record is a nonduplicate by changing the end date. You can also delete the nonduplicate status of a record by changing the end date.

FAQs for Duplicate Identification

Your ability to assign or reject a duplicate identification batch is dependent on your role in the application. If you're a Data Steward Manager, you can assign batches to data stewards and also reject any batch. However, as a Data Steward, you won't be able to assign batches to others. Also, you can only reject the batches that are assigned to you. As a Data Steward Manager, you can assign a duplicate identification batch by using the Assign button on the Duplicate Identification Batch page. You can choose the assignees from the 'Assign To' list on the Assign Duplicate Identification Batch page, but, keep in mind that only the users who are assigned with the role of Data Steward will be visible in this list.

How can I cancel a batch process?

You can cancel a batch process in the Scheduled Processes Overview page. You can navigate to this page as follows: . Navigator > Tools > Scheduled Processes. You must search for the batch process by its process ID, and then cancel the batch process request.

You can schedule a saved duplicate identification batch by searching for it on the Duplicate Identification work area and editing its processing options.

You use the Create Single Request option on an individual duplicate set in a batch to specify duplicate processing options at a set level. You typically use this option to specify duplicate processing options at individual duplicate set level that are different from the duplicate processing options selected at the batch level. You can specify different duplicate processing options for different duplicate sets, using this option.