Siebel Data Quality Administration Guide > Optimizing Siebel Data Quality Performance >
Optimizing Data Matching Performance
The following are recommendations for achieving good performance with data matching when working with large volumes of data:
- Work with a database administrator to verify that the table space is large enough to hold the records generated during the data matching process.
During the batch data matching process, the information on potential duplicate records is stored in the S_DEDUP_RESULT table as a pair of row IDs of the duplicate records and the match scores between them. The number of records in the results table S_DEDUP_RESULT can include up to six times the number of records in the base tables combined. Remember that:
- If the base tables contain many duplicates, more records are inserted in the results table.
- If different search types are used, a different set of duplicate records may be found and will be inserted into the results table.
- If you use a low match threshold, the matching process generates more records to the results table.
- Remove obsolete result records manually from the S_DEDUP_RESULT table by running SQL statements directly on this table.
When a duplicate record is detected, the information about the duplicate is automatically placed in the S_DEDUP_RESULT table, whether or not the same information exists in that table. Running multiple batch data matching tasks therefore results in a large number of duplicate records in the table. Therefore, it is recommended that you manually remove the existing records in the S_DEDUP_RESULT table before running a new batch data matching task. You can remove the records using any utility that allows you to submit SQL statements.
NOTE: When truncating the S_DEDUP_RESULT table, all potential duplicate records found for all data matching business components are deleted.
For more information about running batch data matching, see Matching Data Using Batch Jobs.