7Administering Data Quality

Data Quality Modes of Operation

Data cleansing and data matching operates in real-time or in batch mode.

In real-time mode, data quality functionality is called whenever a user attempts to save a new or modified account, contact, or prospective contact record to the database.

For data cleansing, the fields configured for data cleansing are standardized before the record is committed.

For data matching, when data quality detects a possible match with existing data, all probable matching candidates are displayed in real time. This helps to prevent duplication of records because:

  • When entering data initially, users can select an existing record to continue their work, rather than create a new one.

  • When modifying data, users can identify duplicates resulting from their changes.

In batch mode, you can use either the Administration - Server Management screen or the srvrmgr command-line utility to submit server component batch jobs. You run these batch jobs at intervals depending on business requirements and the amount of new and changed records.

For data cleansing, a batch run standardizes and corrects a number of account, contact, prospect, or business address fields. You can cleanse all of the records for a business component or a subset of records. For more information about data cleansing batch tasks, see Cleansing Data Using Batch Jobs.

For data matching, a batch run identifies potential duplicate record matches for account, contact, and prospect records. You can perform data matching for all of the records for a business component, or a subset of records. Potential duplicate records are presented to the data administrator for resolution in the Administration-Data Quality views. The duplicates can be resolved over time by a data steward (a person whose job is to monitor the quality of incoming and outgoing data for an organization.) For more information about data matching batch tasks, see Matching Data Using Batch Jobs.

Real-Time Data Cleansing and Data Matching

In real-time mode, data quality is called when you save a new or modified record. If both data cleansing and data matching are enabled for the same object manager, data cleansing runs first.

If data cleansing is enabled, a set of fields preconfigured to use data cleansing are standardized before the record is committed.

If data matching is enabled, and the new record is a potential duplicate, one of the following dialog boxes appears:

  • Duplicate Accounts

  • Duplicate Contacts

  • Duplicate Prospects

You must then decide the fate of the new record, as follows:

  • If you think the record is not a duplicate, close the dialog box or click Ignore All.

    The new record remains saved in the database and no change takes place.

  • If you think the record is a duplicate, select the best-matching record from the dialog box using the Pick button.

    The duplicate record that you choose becomes the remaining record and the new record gets deleted after a sequenced merge with the remaining record as described in Sequenced Merges.

    In real-time mode, if you enter two new records that have the same Name and Location, then an error message displays similar to the following: The same values for (Name, Location) already exist. To enter a new record, make sure that field values are unique. Real-time data matching prevents creation of a duplicate record in the following ways:

  • If you are in the process of creating a new record, that record is not saved.

  • If you are in the process of modifying a record, the change is not made to the record.

Note: Only certain fields are configured to support data matching and data cleansing. If you do not enter values in these fields when you create a new record, or you do not modify the values in these fields when changing a record, data cleansing and data matching are not triggered. For more information about which fields are preconfigured for different business components, see Preconfigured Field Mappings for Oracle Data Quality Matching Server and Preconfigured Field Mappings for Oracle Data Quality Address Validation Server.

Batch Data Cleansing and Data Matching

Batch processing provides a means to cleanse and match a large number of records at one time. You can run batch jobs as stand-alone tasks or schedule batch tasks to run on a recurring basis.

After the Data Quality Manager server component (DQMgr) is enabled and you have restarted the Siebel Server, you can start your data quality tasks.

You can start and monitor tasks for the Data Quality Manager server component in one of the following ways:

  • Using the Siebel Server Manager command-line interface, the srvrmgr program.

  • Running Data Quality Manager component jobs from the Administration - Server Management screen, Jobs view in the application.

You can specify a data quality rule in the batch job parameters. This is a convenient way of consolidating and reusing batch job parameters and also of overriding vendor parameters. For more information, see Data Quality Rules.

For more information about using the Siebel Server Manager and administering component jobs, see Siebel System Administration Guide. In particular, read the chapters about the Siebel Enterprise Server architecture, using the Siebel Server Manager GUI, and using the Siebel Server Manager command-line interface.

You must run batch mode key generation on all existing records before you run real-time data matching. The Universal Connector requires generated keys in the key tables first before you can run real-time data matching. The key generation is done within the deduplication task (which is the reason for running deduplication on all existing records first).

Caution: If you write custom Siebel CRM scripting on business components used for data matching (such as Account, Contact, List Mgmt Prospective Contact, and so on), the modifications to the fields by the script execute in the background and might not trigger logic that activates user interface features. For example, the scripting might not trigger UI features such as windows that show potential matching records.

Data Quality Rules

In the Administration - Data Quality screen, Rules view, you can define rules for each of the data quality operations that are performed in real-time and in batch mode.

The data quality rules specify the parameters used when a data quality operation is performed in real-time or in batch mode. For example, you can create a rule for the batch mode Data Cleansing operation on the Account business component for a particular vendor. The parameters used are the vendor parameters defined for the applicable vendor, but you can override these parameters by specifying the equivalent rule parameters. However, the value set for Match Threshold in the User Preferences data quality settings override the equivalent rule parameters.

You can only create rules for business components for which data cleansing or data matching are supported. This includes the preconfigured business components and any additional business components that you configure for data cleansing and data matching. Also, you can only create rules for operations that are supported for a particular vendor. For each vendor, the supported operations and business components are defined in the Administration - Data Quality screen, Third Party Administration view.

You can create only one real time rule for each combination of vendor, business component, and operation name. However, you can create any number of batch rules for each combination of vendor business component, and operation name.

When you define a rule for real time mode, the rule is applied each time data cleansing or data matching is performed for the business component. When you define a rule for batch mode, the rule is applied if you specify the name of the rule in the batch job parameters, see Data Quality Batch Job Parameters. Using rules in this way allows you to consolidate batch job parameters into a reusable rule.

You can specify a search specification, business object name, business component name, threshold, and operation Type in a rule or in the job parameters when you submit a job in batch mode. The values in the job parameters override any value in the rules.

Note: Do not confuse data quality rules with the matching rules that are used by the third-party software.

Creating a Data Quality Rule

Use the following procedure to create a data quality rule.

To create a data quality rule

  1. Navigate to the Administration - Data Quality screen, then the Rules view.

  2. Create a new record. Some of the fields are shown in the following table:

    Field Comment

    Name

    Enter a unique name for the rule.

    Search Specification

    Enter a search specification.

    Applicable for Operation Type Batch only.

    Vendor Name

    Select a vendor name, for example, ISS.

    Operation Type

    Select Batch or Real Time.

    Operation Name

    Select one of the following:

    • Data Cleansing

    • DeDuplication

    • Key Generate (batch mode only)

    • Key Refresh (batch mode only)

    Threshold

    Enter a value between 50 and 100. This value overrides the value in the Data Quality settings.

    Applicable for Operation Name DeDuplication only.

    Source Business Component

    Select a business component name.

    Source Business Object

    Select the business object name corresponding to the business components

    An example of a rule is shown in the following table. This is a rule for DeDuplication operations for all Account records whose name starts with Aa.

    Field Value

    Name

    Rule_Batch_Account_Dedup

    Search Specification

    [Name] LIKE 'Aa*'

    Vendor Name

    ISS

    Operation Type

    Batch

    Operation Name

    DeDuplication

    Threshold

    60

    Source Business Component

    Account

    Source Business Object

    Account

  3. (Optional) Specify rule parameters.

    1. Click the Rule Parameter view tab.

    2. Create rule parameters by selecting a parameter and entering the required value.

Data Quality Batch Job Parameters

The following table shows the parameters used in Data Quality batch jobs. The names of the parameters for both Data Quality Manager component jobs and srvrmgr commands are given.

Job Parameter or Server Manager Parameter

Required Description

Buscomp name

bcname

Yes The name of the business component: Possible values include:
  • Account

  • Contact

  • List Mgmt Prospective Contact

  • Business Address - applicable to Data Cleansing operations only. For Siebel Industry Applications, CUT Address is used instead of Business Address.

Business Object Name

bobjname

Yes The name of the business object. Possible values include:
  • Account

  • Contact

  • List Mgmt

  • Business Address - applicable to Data Cleansing operations only

Operation Type

opType

Yes The type of operation: Possible values are:
  • Data Cleansing - cleanses data

  • Key Generate - generates match keys

  • Key Refresh - refreshes match keys

  • DeDuplication - performs data matching.

Object Sorting Clause

objsortclause

No

Applicable to Data Matching operations only.

Indicates how candidate records are sorted for optimal processing by the data matching software. The default value is Dedup Token.

Object Where Clause

objwhereclause

No

Limits the number of records processed by a data quality task. Typically, you use the account's name or the contact's first name to split up large data quality batch tasks using the first letter of the name.

For example, the following object WHERE clause selects only French account records where the account name begins with A:

[Name] like 'A*' AND [Country] = 'France'

As another example, the following object WHERE clause selects all records where Name begins with Paris or ends with london:

[Name] like 'Paris*' or [Name] like '*london'

Data Quality Setting

DQSetting

No Specifies data quality settings for data cleansing and data matching jobs. This parameter has three values separated by commas:
  • First value. If this value is set to Delete, existing duplicates are deleted. Otherwise, existing duplicates are not deleted. This is the only usage for this value.

  • Second value. Applicable to the Universal Connector only. It specifies whether the job is a full or incremental data matching job.

  • Third value. This is obsolete. Enter an empty string.

For more information about the use of DQSetting, see Matching Data Using Batch Jobs.

Threshold

No Specifies a value for the Threshold data quality parameter.

Rule Name

No Specifies the name of a data quality rule. A rule with the specified name must have been created in the Administration - Data Quality screen, Rules view. For example:
RuleName="Rule_Batch_Account_Dedup"
For more information, see Data Quality Rules.

Cleansing Data Using Batch Jobs

The following procedure describes how to use a batch job to perform data cleansing on records in a selected business component.

To effectively exclude selected records when running data cleansing tasks, you must add the following command to your object WHERE clause:

[Disable DataCleansing] <> 'Y'
Caution: When you run a process in batch mode, any visibility limitation against your targeted data set is ignored. It is recommended that you allow only a small group of people to access the Siebel Server Manager to run your data quality tasks, otherwise you run the risk of corrupting your data.

To perform batch mode data cleansing

  1. Start the Server Manager Program.

  2. At the srvrmgr prompt, enter a command like one of those in the following table to perform data cleansing.

    Business Component Example of Server Manager Command

    Account

    run task for comp DQMgr with bcname=Account, bobjname= Account, opType="Data Cleansing", objwhereclause="[field_name] LIKE 'search_string*'", DqSetting="'','',''"

    Business Address

    run task for comp DQMgr with bcname= "Business Address", bobjname="Business Address", opType="Data Cleansing", objwhereclause="[field_name] LIKE 'search_string*'", DqSetting="'','', 'business_address_datacleanse.xml'"

    Contact

    run task for comp DQMgr with bcname=Contact, bobjname=Contact, opType="Data Cleansing", objwhereclause LIKE "[field_name]='search_string*'", DqSetting="'','','contact_datacleanse.xml'"

    List Mgmt Prospective Contact

    run task for comp DQMgr with bcname= "List Mgmt Prospective Contact", bobjname="List Mgmt", opType="Data Cleansing", objwhereclause LIKE "[field_name]='search_string*'", DqSetting="'','','prospect_datacleanse.xml'"

Matching Data Using Batch Jobs

Depending on your business requirements, you might want to use batch jobs to perform data matching on some or all of the records in the supported business components. If you run a data matching batch job on all the records in a business component, the work can often be completed more quickly by splitting the work into a number of smaller batch jobs (not more than 50,000 to 75,000 records at a time). When data matching has been performed on all of the records in the business component, you can run future data matching batch jobs on just the new or changed records.

If you want to perform data matching for some number of mutually-exclusive subsets of the records in a business component, such as all the records where a field name starts with a given letter, use a separate job to specify each subset, with WHERE clauses as follows:

objwhereclause="[field_name] LIKE 'A*'"
objwhereclause="[field_name] LIKE 'B*'"
...
objwhereclause="[field_name] LIKE 'Z*'"
objwhereclause="[field_name] LIKE 'a*'"
...
objwhereclause="[field_name] LIKE 'z*'"

The following example further describes batch data matching.

Example of Batch Data Matching Using the Universal Connector

You must run batch mode key generation on all existing records before you run real-time data matching. The Universal Connector requires generated keys in the key tables first before you can run real-time data matching. The key generation is done within the deduplication task, which is the reason for running deduplication on all existing records first. For more information about batch data cleansing and matching, see Batch Data Cleansing and Data Matching.

The following procedure describes how to start a data matching batch job.

To perform batch mode data matching

  1. Follow the instructions in Generating or Refreshing Keys Using Batch Jobs.

  2. At the srvrmgr prompt, enter commands like those in the following table to perform data matching.

    Business Component Example of Server Manager Command

    Account

    run task for comp DQMgr with DqSetting="'Delete'", bcname=Account, bobjname=Account, opType=DeDuplication, objwhereclause="[Name] like 'search_string*'"

    Contact

    run task for comp DQMgr with DqSetting="'Delete'", bcname=Contact, bobjname=Contact, opType=DeDuplication, objwhereclause="[Name] like 'search_string*'"

    List Mgmt Prospective Contact

    run task for comp DQMgr with DqSetting="'Delete'", bcname="List Mgmt Prospective Contact", bobjname="List Mgmt", opType=DeDuplication, objwhereclause="[Name] like 'search_string*'"

Full Data Matching Jobs

In a full data matching job, the records for which you want to locate duplicates and the candidate records that can include those duplicates are defined by the same search specification. A full data matching job is specified with the value Yes in the DQSetting parameter, see Incremental Data Matching Jobs.

Full data matching jobs are useful when:

  • You want to perform data matching on a whole database table.

  • You are setting up the data quality installation.

  • You perform data matching for the customer data for a particular business component for the first time.

A typical example of a command for a full data matching job is as follows:

run task for comp DQMgr with DqSetting="'','Yes','account_match.xml'", 
bcname=Account, bobjname=Account, opType=DeDuplication, objwhereclause="[Name] LIKE 
'A*'"

Jobs like this that perform data matching for a subset of records are still considered to be full data matching jobs because the data to be checked does not depend on earlier data matching.

Incremental Data Matching Jobs

If you want to perform data matching for some number of nonexclusive subsets of the records in a business component, such as all the records that have been created or updated since you last ran data matching, use a WHERE clause that includes an appropriate timestamp, and also adjust the DqSetting clause of the command as shown in the following table.

DqSetting Parameter Sequence Valid Values Comments

First section

Leave blank

Specify as two adjacent quotation marks.

Second section

(Enforce Search Spec on Candidate Records)

Yes or No (default)

Specifies whether or not the same search specification is used for both the records whose duplicates are of interest and the candidate records that can include those duplicates. Use Yes for full data matching batch jobs. Use No for incremental data matching batch jobs.

Third section

Leave blank

None.

This kind of job is considered an incremental data matching job, because data matching was done earlier and does not need to be redone at this time. In an incremental data matching batch job, the records for which you want to locate duplicates are defined by the search specification, but the candidate records that can include those duplicates can be drawn from the whole applicable database table. Incremental data matching batch jobs are useful if you run them regularly, such as once a week. A typical example of a command for an incremental data matching job is as follows:

run task for comp DQMgr with DqSetting="'','No',''",
bcname=Account, bobjname=Account, opType=DeDuplication, objwhereclause="[Updated] >
'08/18/2005 20:00:00'
Note: If you do not specify the DQSetting parameter, or leave the second value of the DQSetting parameter blank, the job will be an incremental data matching job.

Generating or Refreshing Keys Using Batch Jobs

The following procedure describes how to start a batch job to generate or refresh keys for data matching.

To start a batch job to generate or refresh keys

  1. Start the Server Manager Program.

  2. At the srvrmgr prompt, enter one of the commands in the following table to generate or refresh keys.

    Substitute values of your own choosing in the WHERE clauses, as needed.

    Business Component Generate or Refresh Keys? Example of Server Manager Command

    Account

    Generate

    run task for comp DQMgr with bcname=Account, bobjname=Account, opType="Key Generate", objwhereclause="[Updated] > '07/18/2005 16:00:00'"

    Account

    Refresh

    run task for comp DQMgr with bcname=Account, bobjname=Account, opType="Key Refresh", objwhereclause="[Name] LIKE 'search_string*'"

    Contact

    Generate

    run task for comp DQMgr with bcname=Contact, bobjname=Contact, opType="Key Generate", objwhereclause="[Updated] > '07/01/2005 14:10:00'"

    Contact

    Refresh

    run task for comp DQMgr with bcname=Contact, bobjname=Contact, opType="Key Refresh", objwhereclause="[Last Name] LIKE 'search_string*'"

    List Mgmt Prospective Contact

    Generate

    run task for comp DQMgr with bcname="List Mgmt Prospective Contact", bobjname="List Mgmt", opType="Key Generate", objwhereclause="[Updated] > '07/18/2005 16:00:00'"

    List Mgmt Prospective Contact

    Refresh

    run task for comp DQMgr with bcname="List Mgmt Prospective Contact", bobjname="List Mgmt", opType="Key Refresh", objwhereclause="[Last Name] LIKE 'search_string*'"

    The examples in the table show slightly different WHERE clauses for key generation and key refresh operations, as follows:

    • The generation commands generate keys for all records in the business component that have been updated since the specified date and time.

    • The refresh commands refresh keys for all records in the business component that match the search string in the specified field.

    You can use either of these two types of WHERE clauses for both generation and refresh operations.

    If you want to generate or refresh keys for all records in the business component, use a WHERE clause containing a wildcard character (*) to match all records, as follows:

    objwhereclause="[field_name] LIKE '*'"
    

Customizing Data Quality Server Component Jobs for Batch Mode

Rather than specifying parameters each time you start a data quality batch job, you can customize the Data Quality Manager server component with the parameters that you require. This is mainly for ease of use when starting tasks using the srvrmgr program.

You use the Administration - Server Configuration views to create customized components according to the Data Quality Manager Server component. You specify Data Quality Manager as the Component Type.

  • For more information about creating custom component definitions, see Siebel System Administration Guide.

  • For information about component customization for your third-party data quality vendor software, consult your third-party vendor.

You must enable new custom Data Quality Manager components before you can use them. And, if you change parameters of running components, you must shut down and restart the components or restart the Siebel Server for the changes to take effect.

Note: You can also set specific parameters for a data quality task and save the configuration as a template by using the Administration - Server Configuration screen, Job Templates view. The benefit in doing so is that there is no need to copy component definitions. For more information about Siebel CRM templates, see Configuring Siebel Business Applications.

Merge Algorithm in the Object Manager Layer

The Merge Records functionality is used by customers to enhance data quality. For example, duplicate accounts might be merged to a target account or you might want to merge duplicate opportunities. To call the feature, select two or more records and choose Edit, then  Merge Records from the application-level menu. For more information about the Merge Records menu option, see Siebel Fundamentals.

Example of the Merge Records Process

You want to merge accounts A1 and A2 into A1. These accounts might have child quotation marks or associated contacts. These relationships are defined using one-to-many or many-to-many links. The following describes what happens after the merge:

  • Account A2 is deleted after the merge.

  • The contacts associated with A2 are associated with A1 after the merge.

The links defined between the business components are used to implement the merge algorithm. The algorithm used by the merge process at the OM layer is explained in the following topics for one-to-many and many-to-many links.

Overview of Merge Algorithm

The following section provides a brief overview of what happens during the merge process using the example of merging accounts A1 and A2 into A1.

The merge process starts by enumerating through all link definitions that might be relevant, for example, in the case of the example, where the source business component is accounts.

One-to-Many Relationship

A one-to-many relationship defines the destination field, which is the foreign key in the detail table that points to a row in the parent table. Only links where the source field is "Id", that is, where the foreign key in the detail table stores the ROW_ID of the parent table row, are considered.

To make children of A2 point to A1, the merge must update the destination field in the detail table to now point to the ROW_ID of A1.

User property name: Use Literals for Merge

Use Literals For Merge: S_BU

Value: TRUE

When merging two records, the child records of the loser record point to the remaining record and the LAST_UPD and LAST_UPD_By columns of those child records are also updated. For example, account A2 is merged to account A1. Account A2 has service request SR1, and SR2. The columns LAST_UPD, and LAST_UPD_BY of SR1 and SR2 are updated during merge process.

From the example, link account or quote foreign key in S_DOC_Quote is account Id (TARGET_OU_ID). TARGET_OU_ID stored the ROW_ID of the A2. It is now updated to point to ROW_ID of A1.

SQL generated:

UPDATE S_DOC_QUOTE set TARGET_OU_ID = 'Row Id of A1' 

where:

TARGET_OU_ID is equal to 'Row Id of A2'

While the merge is processing the link account or quote, it also checks to see if there are other foreign keys from quote pointing to account using the join definitions. These keys are also updated.

An optimization is used to ensure that there are no redundant update statements. For example, if there are two links defined (account or quote and account or quote with primary with the same destination field Account Id), the process would update TARGET_OU_ID of S_DOC_QUOTE twice to point to A1. To avoid this scenario, a map of table name or column name of the processed field is maintained. The update is skipped if the column has been processed before.

After the update you might have duplicate children for an account. For example, if the unique key for a quote is the name of the quote, merging two accounts with quotation marks of the same name will result in duplicates. The CONFLICT_ID column of children that will become duplicates after the merge is updated. This operation is performed before the actual update.

The user must examine duplicate children (identified by CONFLICT_ID being set) to make sure that they are true duplicates. For example, if the merged account has child quotation marks named Q1 and Q1, it is possible that these refer to distinct quotation marks. If this is the case, the name of one of the quotation marks must be updated and the children must be merged.

Many-to-Many Relationship

The many-to-many relationship (Accounts-Contacts) differs slightly from the one-to-many relationship in that it is implemented using an intersection table that stores the ROW_IDs of parent-child records. On a merge, the associations must be updated. The Contacts associated with the old Account is now associated with the new Account.

The Inter parent column of the intersection table is updated to point to the new parent. As in the one-to-many case, to avoid redundant updates, a map of intersection tables that have been processed is maintained. Therefore, if the source and target business components use the same base table, both child and parent columns are updated.

The CONFLICT_ID column of intersection table entries that become duplicates after the merge is updated.

In contrast to the one-to-many link case, duplicates in the intersection table imply that the same child is being associated with the parent two or more times. However, there might be cases where the intersection table has entries besides the ROW_ID of the parent and child rows that store information specific to the association.

The duplicate association records are only preserved when records are determined as unique (according to the intersection table unique key). This means those duplicate association records might have some unique attributes and these attributes are part of a unique key of the intersection table. CONFLICT_ID does not account for uniqueness among records.

Merging of Duplicate Records

After you run data matching in batch mode, duplicate records are displayed in the Duplicate Accounts, Duplicate Contacts, and Duplicate Prospects views in the Administration - Data Quality screen. You can then determine which records you want to retain and which records you want to merge with the retained record.

Caution: Merging records is an irreversible operation. You must review all records carefully before using the following procedure and initiating a merge.

You can merge duplicate records in the following ways:

  • Merge Records option (Edit, Merge Records). Performs the standard merge functionality available in Siebel CRM for merging records. That is, this action keeps the record you indicate and associates all child records from the record to be deleted before that record is actually deleted. For more information about the Merge Records menu option, see Siebel Fundamentals.

  • Merge button (from appropriate Duplicate Resolution View). Performs a sequenced merge of the records selected in the sequence specified. This includes populating currently empty fields in the remaining record with values from the records to be deleted, as described in Sequenced Merges. This action also performs a cleanup in the appropriate Deduplication Results table to remove the unnecessary duplicate records. This is the preferred method for deduplicating account, contact, and prospect records.

Sequenced Merges

You use a sequenced merge to merge multiple records into one record. You assign sequence numbers to the records so that the record with the lowest sequence number becomes the remaining record, and the other records, the records to be deleted, are merged with the remaining record.

When records are merged using a sequence merge, the following rules apply:

  • All non-NULL fields from the remaining record are kept.

    Any fields that were NULL in the remaining record are populated by information (if any) from the deleted records. Missing fields in the remaining record are populated in ascending sequence number order from corresponding fields in the deleted records.

  • The children and grandchildren (for example, activities, orders, assets, service requests, and so on) of the deleted records are merged by associating them to the remaining record.

    Sequenced merge is especially useful if many fields are empty, such as when a contact record with a Sequence of 2 has a value for Email address, but its Work Phone # field is empty, and a contact record with a Sequence number of 3 has a value of Work Phone #. If the field Email address and Work Phone # in the remaining record (sequence number 1) are empty, the value of Email address is taken from the records with sequence number 2, and the value of Work Phone # is taken from the record of sequence number 3.

    A sequence number is required for each record even if there are only two records.

Field Characteristics for Sequenced Merges

A field must have the following specific characteristics to be eligible for use in a sequenced merge:

  • The field cannot be a calculated field and must reside on a physical database column.

  • The field must be active, that is designated as Active in the respective business component.

Process of Merging Duplicate Records

When you run a batch process, and depending on the number of duplicates in your system, you might find there are hundreds of rows in the Duplicate Accounts, Duplicate Contacts, and Duplicate Prospect views (in the Administration - Data Quality screen). In this case, it is recommended that you use the following process to filter and merge duplicate records:

  1. Filtering Duplicate Records

    This involves creating a query to find a subset of the duplicate records and then review the query results. For example, you might want to create a query that includes a subset of all duplicate records where the Name field starts with the letter A.

  2. Merging Duplicate Records

    After the query results appear, you merge duplicate records using either the Merge button or the Merge Records option.

Caution: You must perform batch data matching first before trying to resolve duplicate records. For more information about batch data matching, see Batch Data Cleansing and Data Matching.

Filtering Duplicate Records

Use the following procedure to filter duplicate records. This task is a step in Process of Merging Duplicate Records.

Note: You can use either standard or fuzzy query methods, depending on your needs. For more information about using fuzzy query, see Using Fuzzy Query.

To filter duplicate records

  1. Navigate to the Administration - Data Quality screen in your Siebel application.

  2. Click one of the following links:

    • Duplicate Accounts

    • Duplicate Contacts

    • Duplicate Prospects

  3. Click Query, enter your search criteria, and then click Go.

    The search results appear.

You now decide what you want to do with the duplicate records.

Merging Duplicate Records

You must follow a slightly different procedure to merge child duplicate records. If you do not follow the correct procedure, orphan records can be created. Use the following procedure to merge duplicate records. This task is a step in Process of Merging Duplicate Records.

To merge duplicate records

  1. In the Administration - Data Quality screen, click the Duplicate XXX view for the type of record you have selected, where XXX is either Accounts, Contacts, or Prospects.

    For example, click the Duplicate Accounts view.

  2. In the Duplicate view, drill down on one of the duplicate records.

    The appropriate Duplicate XXX Resolution view appears. The child applet shows the list of duplicate rows with the parent record appearing as the first row.

  3. If two or more records appear to be duplicates, enter a sequence number in the Sequence field for each record.

  4. Edit the records, if necessary.

    For example, you might want to keep some values from fields in records to be deleted. In this case, you can make fields NULL in what will be the remaining records. The values from the corresponding fields in the deleted records are then used to populate the NULL fields after the sequenced merge.

  5. Select the records to be merged.

  6. Click Merge.

    The records are merged to produce one new record. The record with the lowest sequence number assigned is retained after the merge. Missing fields in the retained record are populated from corresponding fields in the deleted records, as described in Sequenced Merges.

Merging Child Duplicate Records

Use the following procedure to merge child duplicate records.

To merge child duplicate records
  1. In the appropriate Duplicate XXX Resolution view, enter 1 in the Sequence field for the parent record.

  2. Enter 2 and so on in the Sequence field for each of the child duplicate records.

  3. Select the records to be merged, and select the parent records last.

  4. Click Merge.

Using Fuzzy Query

To run a query using fuzzy query, this facility must be enabled and several conditions must be met as described in Enabling and Disabling Fuzzy Query.

In particular:

  • The query must not use wildcards.

  • The query must specify values in fields designated as fuzzy query mandatory fields. For information about identifying the mandatory fields, see Identifying Mandatory Fields for Fuzzy Query.

  • The query must leave optional fields blank.

If the conditions for fuzzy query are not satisfied, then any queries you make use standard query functionality.

Using Fuzzy Query for Accounts

Use the following procedure to use fuzzy query for accounts.

To use fuzzy query for accounts

  1. Navigate to Accounts screen, then the Accounts List view.

  2. Click the Query button.

  3. Enter your query, and then click Go.

    The query results contain fuzzy matches in addition to regular query matches.

Using Fuzzy Query for Contacts

Use the following procedure to use fuzzy query for contacts.

To use fuzzy query for contacts

  1. Navigate to Contacts screen, then the Contacts List view.

  2. Click the Query button.

  3. Enter your query, and then click Go.

    The query results contain fuzzy matches in addition to regular query matches.

Using Fuzzy Query for Prospects

Use the following procedure to use fuzzy query for prospects.

To use fuzzy query for prospects

  1. Navigate to List Management screen, then the Prospects view.

  2. Click the Query button.

  3. Enter your query, and then click Go.

    The query results contain fuzzy matches in addition to regular query matches.

Example of Enabling and Using Fuzzy Query with Accounts

This topic gives an example of enabling and using fuzzy query. You might use this feature differently, depending on your business model.

In the following example, you enable fuzzy query for accounts, and then enter the query criteria. The query results contain fuzzy matches from the DeDuplication business service in addition to regular query matches.

Note: EAI Siebel Adapter does not support fuzzy queries. In addition, scripting does not support fuzzy queries.

To enable and use fuzzy query for accounts

  1. Perform the steps in Enabling and Disabling Fuzzy Query

  2. Perform the steps in Enabling Data Quality at the User Level.

    Note: For this example, set the Fuzzy Query - Max Returned data quality setting to 10.
  3. Navigate to the Accounts screen, then the Account list view.

  4. Enter your query, and then click Go.

    For this example, in the Name field, enter Symphony.

    Up to 10 records having Name set to Symphony are displayed.

    Note: If the number of Symphony account records is fewer than 10, then the fuzzy query results includes records where symphony is lowercase (as well as uppercase). For example, if four records for Symphony and 100 records for symphony are found in the database, the fuzzy query result shows four Symphony records and six symphony records. However, if fuzzy query is disabled, only the four Symphony records appear.

Calling Data Matching and Data Cleansing from Scripts or Workflows

This topic provides information about calling data matching and data cleansing methods from external callers such as scripts or workflows.

You can call data quality from external callers to perform data matching. You can use the Value Match method of the Deduplication business service to:

  • Match data in field or value pairs against the data within Siebel business components

  • Prevent duplicate data from getting into the Siebel application through non-UI data streams

You can also call data quality from external callers to perform data cleansing. There are preconfigured Data Cleansing business service methods—Get Siebel Fields and Parse. Using an external caller, such as scripting or a workflow process, you first call the Get Siebel Fields method, and then call the Parse method to cleanse contacts and accounts.

The following scenarios provide more information about calling data quality from external callers:

Scenario for Data Matching Using the Value Match Method

This topic gives one example of how you can call the Value Match business service method using Siebel Workflow. You can use the Value Match method differently, depending on your business model.

In this scenario, a company must add contacts into the Siebel application from another application in the enterprise. To avoid introducing duplicate contacts into the Siebel application, the implementation uses a workflow process that includes steps that call EAI adapters and a step that calls the Value Match method.

In this case, the implementation calls the Value Match method as a step in the workflow process that adds the contact. This step matches incoming contact information against the contacts within the Siebel database. To prevent the introduction of duplicate information into the Siebel application, the implementation adds processing logic to the script using the results returned in the Match Info property set. The company can either reject potential duplicates with a high score, or it can include additional steps to add likely duplicates as records in the DeDuplication Results Business Component, so that they immediately become visible in the appropriate Duplicate Record Resolution view.

For information about how to call and use the Value Match method, see Value Match Method.

Scenario for Data Cleansing Using Data Cleansing Business Service Methods

This topic gives one example of how you can call the Data Cleansing business service methods using Siebel Workflow. You might use the methods differently, depending on your business mode.

A system administrator or data steward in an enterprise wants to cleanse data before it enters the data through EAI or EIM interfaces. To do this, the system administrator or data steward uses a script or workflow that cleanses the data. The script or workflow calls the Get Siebel Fields method, which returns a list of cleansed fields for the applicable business component. Then the script or workflow calls the Parse method, which returns the data for the cleansed fields. For information about how to call and use the Get Siebel Fields and Parse methods, see Data Cleansing Business Service Methods.

Deduplication Business Service Methods

This topic describes the following Deduplication business service method: Value Match Method. Scenario for Data Matching Using the Value Match Method gives one example of how you can call the Deduplication business service Value Match method.

Note: For information about other deduplication business service methods that are available, see Siebel Tools Online Help.

Value Match Method

You can use the Value Match method of the Deduplication business service to find potential matching records in the Siebel application or when you want to prevent duplicate data from getting into the Siebel application through non-UI data streams. For more information about business services and methods, see Siebel Developer's Reference.

Arguments

The Value Match method consists of input and output arguments, some of which are property sets. The following table describes the input arguments, and the next table describes the output arguments.

Caution: The Value Match method arguments are specialized. Do not configure these components.

Name Type Property Name Description Comments

Adapter Settings

Property Set

Threshold

The threshold score for a duplicate record. A match is considered only if the score exceeds this value.

Optional. The value Override can be specified to override the corresponding setting information obtained by the service from the administration screens, vendor properties, and so on.

Match Values

Property Set

Business component field names, and value pairs:

<Name1><Value1>, <Name2><Value2>, <Name3><Value3>, ...

The matched business component's field name and the corresponding field value:

 (Last Name, 'Smith') 

 (First Name, 'John'),

 and so on ...
Note: Each pair must be a child property set of Match Values.

These name-value pairs are used as the matched value rather than the current row ID of the matched business component. The vendor field mappings for the matched business component are used to map the business component field names to vendor field names.

BC Name

Property

BC Name

The name of the matched business component.

Required.

Update Modification Date

Property

Update Modification Date

If set to N, the match modification date is not updated.

Optional. The default is Y.

Use Result Table

Property

Use Result Table

If set to N, matches are not added to the result table. Instead, matches are determined by the business service.

Optional. The default is Y.

Note: Adapter Settings and Match Values are child property sets of the input property set.
Return Value

For each match, a separate child property set called Match Info is returned in the output with properties specific to the match (such as Matchee Row ID and Score), as well as some general output parameters as shown in the following table.

Caution: The Value Match method arguments are specialized. Do not configure these components.

Name Type Property Name Description Comments

End Time

Property

End Time

The run end time.

None

Match Info

Note: Match Info is a child property set of the output property set.

Property Set

Matchee Row ID

The row ID of a matching record.

If you match against existing records, the record ROW_IDs are found and returned in the Match Info property set.

Score

The score of a matching record.

Num Results

Property

Num Results

The number of actual matches.

None

Start Time

Property

Start Time

The run start time.

None

Called From

Any means by which you can call business service methods, such as with Siebel e Script or from a workflow process.

Example

The following is an example of using Siebel eScript to call the Value Match method. This script calls the Value Match method to look for duplicates of John Smith from the Contact business component and then returns matches, if any. After the script finishes, determine what you want to do with the duplicate records, that is, either merge or remove them.

function Script_Open ()
{
TheApplication().TraceOff();
TheApplication().TraceOn("sdq.log", "Allocation", "All");
TheApplication().Trace("Start of Trace");

// Create the Input property set and a placeholder for the Output property set
var svcs;
var sInput, sOutput, sAdapter, sMatchValues;
var buscomp; 

svcs = TheApplication().GetService("DeDuplication");
sInput = TheApplication().NewPropertySet();
sOutput = TheApplication().NewPropertySet();
sAdapter = TheApplication().NewPropertySet();
sMatchValues = TheApplication().NewPropertySet();

// Set Generic Settings input property parameters
sInput.SetProperty("BC Name", "Contact");
sInput.SetProperty("Use Result Table", "N");
sInput.SetType("Generic Settings");

// Set Match Values child input property parameters
sMatchValues.SetProperty("Last Name", "Smith");
sMatchValues.SetProperty("First Name", "John");
sMatchValues.SetType("Match Values");
sInput.AddChild(sMatchValues);

// Set Adapter Settings child input property parameters
sAdapter.SetProperty("Search Level", "Narrow");
sAdapter.SetProperty("Population", "Default");
sAdapter.SetType("Adapter Settings");
sInput.AddChild(sAdapter);

// Invoke the "Value Match" business service
TheApplication().Trace("Property set created, ready to call Match method");
svcs.InvokeMethod("Value Match", sInput, sOutput);

// Get the Output property set and its values
TheApplication().Trace("Value Match method invoked");
var propName = "";
var propVal = "";
propName = sOutput.GetFirstProperty();
while (propName != "")
{
  propVal = sOutput.GetProperty(propName);
  TheApplication().Trace(propName);
  TheApplication().Trace(propVal);
  propName = sOutput.GetNextProperty()

}
TheApplication().Trace("End Of Trace");
TheApplication().TraceOff();

}        

Data Cleansing Business Service Methods

This topic describes the following data cleansing business service methods: Get Siebel Fields Method and Parse Method. Scenario for Data Cleansing Using Data Cleansing Business Service Methods gives one example of how you can call the data cleansing business service methods.

Get Siebel Fields Method

Get Siebel Fields is one of the methods of the Data Cleansing business service. This method returns a list of cleansed fields for a given business component. For more information about business services and methods, see Siebel Developer's Reference.

Arguments

Get Siebel Fields arguments are listed in the following table.

Argument Name Display Name Input or Output Data Type Description Required?

BusComp Name

Bus Comp Name

Input

String

The name of the business component.

No

Field Names

Field Names

Output

Hierarchy

The name of the hierarchy.

Yes

Return Value

Child values: Name of the properties are Field 1, Field 2, and so on and corresponding values are Field Name.

Usage

This method is used with the Parse method in the process of cleansing data in real time, and it is used with the Parse All function in the process of using a batch job to cleanse data.

Called From

Any means by which you can call business service methods, such as with Siebel Workflow or Siebel eScript.

Parse Method

Parse is one of the methods of the Data Cleansing business service. This method returns the cleansed field data. For more information about business services and methods, see Siebel Developer's Reference.

Arguments

Parse arguments are listed in the following table.

Argument Name Display Name Input or Output Data Type Description Required?

BusComp Name

Bus Comp Name

Input

String

The name of the business component.

No

Input Field Values

Input Field Values

Input

Hierarchy

A list of field values.

Yes

Output Field Values

Output Field Values

Output

Hierarchy

A list of field values.

Yes

Return Value

Child name values are Field Name and Field Date.

Usage

This method is used following the Get Siebel Fields method in the process of cleansing data in real time.

Called From

Any means by which you can call business service methods, such as with Siebel Workflow or Siebel eScript. For more information about Siebel Workflow, see Siebel Business Process Designer Administration Guide.

Troubleshooting Data Quality

If data cleansing or data matching is not working properly in real-time mode, check the following:

If you have configured new business components for data cleansing or data matching, also check the following:

  • Business component Class property. Verify that the business component Class property is CSSBCBase.

  • Vendor Properties. Verify that the vendor parameters and vendor field mappings have the correct values and that the values are formatted correctly. For example, there must be a space after a comma in vendor properties that have a compound value.

Tip: Check My Oracle Support regularly for updates to troubleshooting and other important information. For more information, see Information about Data Quality on My Oracle Support.