Bookshelf v8.0: Overview of Merge Algorithm

Siebel Data Quality Administration Guide > Using Siebel Data Quality > Merge Algorithm in the Object Manager Layer >

Overview of Merge Algorithm

The following topic provides a brief overview of what happens during the merge process (and uses the example of merging accounts A1 and A2 into A1).

The merge process starts by enumerating through all link definitions that might be relevant, for example, in the case of the example, where the source business component is accounts.

One-to-Many Relationship

A one-to-many relationship defines the destination field, which is the foreign key in the detail table that points to a row in the parent table. Only links where the source field is "Id", that is, where the foreign key in the detail table stores the ROW_ID of the parent table row, are considered.

To make children of A2 point to A1, the merge must update the destination field in the detail table to now point to the ROW_ID of A1.

User property name: Use Literals for Merge
Use Literals For Merge: S_BU
Value: TRUE

When merging two records, the child records of the loser record point to the survivor record and the LAST_UPD and LAST_UPD_By columns of those child records are also updated. For example, account A2 is merged to account A1. Account A2 has service request SR1, and SR2. The columns LAST_UPD, and LAST_UPD_BY of SR1 and SR2 are updated during merge process.

From the example, link account/quote foreign key in S_DOC_Quote is account Id (TARGET_OU_ID). TARGET_OU_ID stored the ROW_ID of the A2. It is now updated to point to ROW_ID of A1.

SQL generated:

UPDATE S_DOC_QUOTE set TARGET_OU_ID = 'Row Id of A1'

where:

TARGET_OU_ID is equal to 'Row Id of A2'

While the merge is processing the link account or quote, it also checks to see if there are other foreign keys from quote pointing to account using the join definitions. These keys are also updated.

An optimization is used to ensure that there are no redundant update statements. For example, if there are two links defined (account or quote and account or quote with primary with the same destination field Account Id), the process would update TARGET_OU_ID of S_DOC_QUOTE twice to point to A1. To avoid this scenario, a map of table name or column name of the processed field is maintained. The update is skipped if the column has been processed before.

After the update you might have duplicate children for an account. For example, if the unique key for a quote is the name of the quote, merging two accounts with quotation marks of the same name will result in duplicates. The CONFLICT_ID column of children that will become duplicates after the merge is updated. This operation is performed before the actual update.

The user must examine duplicate children (identified by CONFLICT_ID being set) to make sure that they are true duplicates. For example, if the merged account has child quotation marks named Q1 and Q1, it is possible that these refer to distinct quotation marks. If this is the case, the name of one of the quotation marks must be updated and the children must be merged.

Many-to-Many Relationship

The many-to-many relationship (Accounts-Contacts) differs slightly from the one-to-many relationship in that it is implemented using an intersection table that stores the ROW_IDs of parent-child records. On a merge, the associations must be updated. The Contacts associated with the old Account must now be associated with the new Account.

The Inter parent column of the intersection table is updated to point to the new parent. As in the one-to-many case, to avoid redundant updates, a map of intersection tables that have been processed is maintained. Therefore, if the source and target business components have the same base table, both child and parent columns are updated.

The CONFLICT_ID column of intersection table entries that become duplicates after the merge is updated.

In contrast to the one-to-many link case, duplicates in the intersection table imply that the same child is being associated with the parent two or more times. However, there might be cases where the intersection table has entries besides the ROW_ID of the parent and child rows that store information specific to the association.

The duplicate association records are only preserved when records are determined as unique, according to the intersection table unique key. This means those duplicate association records might have some unique attributes and these attributes are part of a unique key of the intersection table. CONFLICT_ID does not account for uniqueness among records.

Siebel Data Quality Administration Guide		Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Legal Notices.