Siebel Data Quality Administration Guide > Data Quality Performance Considerations >

Optimizing SDQ Matching Server Performance


The three key tables S_PER_DEDUP_KEY, S_PRSP_DEDUPKEY, and S_ORG_DEDUP_KEY may include six times more records than their corresponding base tables, depending on the key type used in the key generation stage. For limited key type, they may include two to four times more records. For standard key type, they may include at least six times more records. Work with a database administrator and follow recommendations for the Siebel Data Quality Matching Server to optimize the parameters for your database:

  • You can execute concurrent Data Quality Manager server tasks to deduplicate the data. Query the base tables to find the search specifications such that each query result set contains the preferred number of records, between 50,000 and 75,000 per server task. Start concurrent server tasks using these search specifications and continue the deduplication operation until the entire table is completely processed.
  • When you want to run key generation, you can remove all keys in the key tables first.
  • When you want to run deduplication, you can remove all records in the result table first.
  • You can allocate space for the SIEBEL_4K table space following the sizing recommendations in Table 21.
    Table 21.  Table Size Recommendations
    Table
    Sizing Recommendation

    S_PER_DEDUP_KEY S_ORG_DEDUP_KEY S_PRSP_DEDUPKEY

    These tables may include between two and six times more records than their corresponding base tables, depending on the key type used during the key generation stage, as follows:

    • Limited key type. May include between two and four times more records than the corresponding base table.
    • Standard key type. May include up to an estimated six times more records than the corresponding base table.

    S_DEDUP_RESULT

    This table may include between five and six times the number of records in the three base tables combined. Use the following guidelines to help determine table size:

    • If a Typical or Exhaustive search type is used, more records are inserted into the results table.
    • If a low match threshold is used, such as a threshold in the lower 100 range, the matching process generates a larger number of records that are inserted into the results table.
  • For the DB2 platform, you can use the following REORG utility commands on the DEDUP_KEY column after key generation on the key tables. The DEDUP_KEY column is based on the M1 clustered index.
    • reorgchk current statistics on table siebel.S_PER_DEDUP_KEY

      This command checks F4 for the M1 index to see whether a reorganization is needed.

    • reorg table siebel.S_PER_DEDUP_KEY

      This command reorganizes the table and usually takes about 30 minutes to run on 1 to 2 million records.

    • reorgchk update statistics on table siebel.S_PER_DEDUP_KEY

      This command updates the statistics.

  • For the DB2 platform, if your performance seems degraded, you can run the following command on all tables associated with SDQ. (This includes tables such as: S_PER_DEDUP_KEY, S_ORG_DEDUP_KEY, S_ORG_EXT, S_PRSP_CONTACT, S_CONTACT, S_PRSP_CONTACT, S_PARTY, S_PARTY_PER, and S_DEDUP_RESULT).

    runstats on table siebel.S_PER_DEDUP_KEY

    If the above command returns an error message, use this one instead:

    runstats on table Siebel.S_CONTACT with distribution indexes all

  • After your initial deduplication or key generation, you can include only new and updated records in deduplication and key generation processes.

    TIP:   If you have more than 100,000 records in your base tables, it is time consuming to reprocess all of them.

  • When performing deduplication and key generation, you can exclude records that are up to date. You use the DeDup Key Modification Date and DeDup Last Match Date business component fields in your search specifications to exclude records. For example:
    • You can add one of the following to the search specification (Object WHERE Clause) for key generation:
      • For updated records: ([DeDup Key Modification Date] < [Updated])
      • For new records: ([DeDup Key Modification Date] IS NULL)
      • For updated and new records (same as key refresh):

    ([DeDup Key Modification Date] < [Updated]) OR ([DeDup Key Modification Date] IS NULL)

    • You can add one of the following to the search specification (Object WHERE Clause) for deduplication:
      • For updated records: ([DeDup Last Match Date] < [Updated])dd
      • For new records: ([DeDup Last Match Date] IS NULL))
      • For updated and new records:

    (([DeDup Last Match Date] < [Updated]) OR ([DeDup Last Match Date] IS NULL))

  • You can set the Data Quality Settings values as shown in Table 22.
    Table 22.  Performance Improvement Suggestions
    Improvement for...
    Comments

    Data Quality Settings

    From the application-level menu, choose Navigate > Site Map > Administration - Data Quality > Data Quality Settings to set the following parameters:

    • Key Type (key generation). Set to Limited.
    • Search Type (deduplication). Set to Narrow.
    • Match Threshold (deduplication). Set to a number greater than or equal to 75. The higher the threshold, the faster the deduplication process runs.

    Object sort clause

    Set this parameter based on the key generation parameters for deduplication. For example, use:

    • Person (contact or prospect). Use Last Name, First Name, Middle Name
    • Company (account). Use Name or Name, Location
  • You can set the object sort clause based on the key generation parameters (deduplication) described in Table 22.
  • You can execute concurrent Data Quality Manager server tasks to deduplicate data.

    For more information, see the following procedure.

To execute concurrent Data Quality Manager server tasks to deduplicate data

  1. Query the base tables to find the search specifications so that each query result set contains between 50,000 and 75,000 records.

    NOTE:  For users who have over 100,000 records in their base tables, it can be time consuming to reprocess all the records. For that reason, the recommended record set for subsequent matching and key generation jobs consists only of outdated and new records. Outdated records are records that are modified after they are tagged. In SQL terms, outdated records are those in which [Updated] > (DQ tag).

    For example, the following table provides search specification solutions using the Object WHERE Clause to run key generation or deduplication.

    To
    Query
    for...
    Key Generation Example
    Deduplication Example

    Updated
    records

    ([DeDup Key Modification Date]<[Updated])

    ([DeDup Last Match Date]<[Updated])

    New
    records

    ([DeDup Key Modification Date] IS NULL)

    ([DeDup Last Match Date] IS NULL)

    Updated
    and new
    records

    ([DeDup Key Modification Date]<[Updated])
    OR
    ([DeDup Key Modification Date] IS NULL)

    (([DeDup Last Match Date]<[Updated])
    OR
    ([DeDup Last Match Date] IS NULL))

  2. Start concurrent server tasks using the search specifications provided in Step 1 until the entire table is completely processed.
Siebel Data Quality Administration Guide