8Optimizing Data Quality Performance

Optimizing Data Quality Performance

This chapter provides recommendations for optimizing Siebel CRM and Oracle Customer Hub performance. It includes the following topics:

Optimizing Data Cleansing Performance

The following are recommendations for achieving good performance with data cleansing when working with large volumes of data:

  • Include only new or recently modified records in the batch data cleansing process.

  • Cleansing all records in the Siebel database each time a data cleansing is performed can cause performance issues. Include an Object WHERE clause when you submit your batch job, as shown in the following table. Split the tasks into smaller tasks and run them concurrently.

To Cleanse Use This in Your Object WHERE Clause

Updated records

[Last Clnse Date] < [Updated]

New records

[Last Clnse Date] IS NULL

Updated and new records

[Last Clnse Date] < [Updated] OR [Last Clnse Date] IS NULL

To speed up the data cleansing task for large databases, run batch jobs to cleanse a smaller number of records at a time using an Object WHERE clause. For more information about data cleansing for large batches, see Cleansing Data Using Batch Jobs.

Optimizing Data Matching Performance

The following are recommendations for achieving good performance with data matching when working with large volumes of data:

  • Work with a database administrator to verify that the table space is large enough to hold the records generated during the data matching process.

    During the batch data matching process, the information on potential duplicate records is stored in the S_DEDUP_RESULT table as a pair of row IDs of the duplicate records and the match scores between them. The number of records in the results table S_DEDUP_RESULT can include up to six times the number of records in the base tables combined. Remember that:

    • If the base tables contain many duplicates, more records are inserted in the results table.

    • If different search types are used, a different set of duplicate records might be found and will be inserted into the results table.

    • If you use a low match threshold, the matching process generates more records to the results table.

  • Remove obsolete result records manually from the S_DEDUP_RESULT table by running SQL statements directly on this table.

    When a duplicate record is detected, the information about the duplicate is automatically placed in the S_DEDUP_RESULT table, whether or not the same information exists in that table. Running multiple batch data matching tasks therefore results in a large number of duplicate records in the table. Therefore, it is recommended that you manually remove the existing records in the S_DEDUP_RESULT table before running a new batch data matching task. You can remove the records using any utility that allows you to submit SQL statements.

    Note: When truncating the S_DEDUP_RESULT table, all potential duplicate records found for all data matching business components are deleted.
  • Match performance is slow when the SOH string is set as follows:

    S_POSTN_CON.CON_LAST_NAME = 'SOH'
    

    Setting the following values improves match performance:

    • Navigate to the Contact Business Component.

    • Add the following user property and value under the Last Name field:

      User Property Value

      Sort Search Optimization

      FALSE

For more information about running batch data matching, see Matching Data Using Batch Jobs.