Siebel Data Quality Administration Guide > Data Quality Performance Considerations >

Optimizing Data Matching Performance


The following recommendations for data matching should help you achieve good performance when working with large volumes of data:

  • You can work with a database administrator to verify that the SIEBEL_4K table space is large enough to hold the records generated during the deduplication process.

    During the batch deduplication process, the information of the deduplication records is stored in the S_DEDUP_RESULT table in the format of a pair of row Ids of the duplicate records and the match scores between them. The number of records in the results table S_DEDUP_RESULT can include up to six times the number of records in your account and contact tables combined. You should consider the following:

    • If the base tables include many deduplicates, more records are inserted in the results table.
    • If different search types are used, a different number of duplicate records may be found and are inserted in the results table.
    • If you use a low match threshold (in the lower range of 100), the matching process generates more records to the results table.
  • You can remove obsolete matching results records manually.

    When a duplicate record is detected, it is automatically placed in the S_DEDUP_RESULT table, whether or not the same duplicate record exists in that table. Running multiple batch deduplication tasks results in a large number of duplicate records in the these tables. Therefore, it is recommended that you manually remove the existing records in the S_DEDUP_RESULT tables before running a new batch deduplication task. You can remove the records using any utility that allows you to submit SQL statements. For more information about running batch deduplication, see Working with Data Cleansing and Data Matching in Real-Time and Batch Modes.

    NOTE:  Removing the records from the S_DEDUP_RESULT tables does not cause a loss of data because these tables are again populated when a new batch deduplication task is run.

Siebel Data Quality Administration Guide