Deduplicate and Merge Nodes in Viewpoints
Existing nodes can be deduplicated and merged by data managers on viewpoints that have similar nodes from one or more data sources. A deduplicate operation is performed for a selected viewpoint and node type using matching rules configured for deduplication. Deduplication can be run in two different modes: Cluster Key and Time Based. Cluster based deduplication uses an indexed property to group nodes into clusters and perform matching on a single cluster at a time. Time based deduplication matches nodes that were created on or after a specified date and time.
Deduplication results are presented in the matching workbench in the same manner as matching request items. Each deduplicate operation returns results which identify match candidates and a comparison of property values between the evaluated nodes and their match candidates. Survivorship rules are used to determine the properties and relationships that are merged from the duplicate nodes but can be overridden as necessary. Once accepted matches are applied, data from the duplicate nodes is merged and the duplicate nodes are deleted.
Business Benefit: Deduplication allows existing records from multiple data sources to be matched and merged together based on predefined rules and user review to create master nodes and prevent data duplication.
Key Resources
- Understanding Deduplication in Administering and Working with Enterprise Data Management
- Deduplicating Nodes in a Viewpoint in Administering and Working with Enterprise Data Management