27 Working with Matching and Deduplication

Matching and deduplication refers to the process of comparing nodes in various contexts, identifying nodes that are the same, and then merging in the results. This helps you prevent duplication of data in your system.

Oracle Fusion Cloud Enterprise Data Management provides two mechanisms for preventing duplicate data:

  • Matching and Merging Request Items: Prevent duplication before it occurs by matching incoming nodes in a request to existing nodes in a viewpoint in order to identify and merge nodes that are the same. See Understanding Matching and Merging Request Items.
  • Deduplication: Correct duplication in your system after it occurs by matching nodes that already exist in a viewpoint in order to identify and merge nodes that are the same. See Understanding Deduplication.

For both mechanisms, you create matching rules to specify how nodes are matched with other nodes based on their property values and survivorship rules to specify how the properties and relationships from the nodes are merged.

Terminology

The following terms can help you understand the matching process:

  • Data source: An object that represents the source for the incoming data to be matched and linked in Cloud EDM. This can be either another Cloud EDM application (called a registered data source) or an external system whose data is not being managed in Cloud EDM (called an unregistered data source). See Understanding Data Sources.

    Note:

    You can match request items for any data source. You can deduplicate data for registered data sources only.
  • Matching rule: Controls how nodes are matched either from an incoming data source to nodes that already exist in a node type (for matching and merging) or in a viewpoint (for deduplication). See Creating, Editing, and Deleting Matching Rules.
  • Survivorship rule: Specifies which properties and relationships from the source node get merged into the target nodes in a viewpoint after a match has been confirmed. See Creating, Editing, and Deleting Survivorship Rules.
  • Matching workbench: Enables you to review match candidates based on the criteria from the matching rules and accept the ones that you want to merge into the existing nodes. See Matching and Deduplicating.
  • Clustering property (Deduplication only): A property that you identify to group nodes into clusters so that you can run matching on them in order to identify and combine duplicate nodes.