Bookshelf v8.0: Matching Data Using Batch Jobs

Siebel Data Quality Administration Guide > Using Siebel Data Quality >

Matching Data Using Batch Jobs

Depending on your business requirements, you might want to use batch jobs to perform data matching on some or all of the records in the supported business components. If you must run a data matching batch job on all the records in a business component, the work can often be completed more quickly by splitting the work into a number of smaller batch jobs (not more than 50,000 to 75,000 records at a time). When data matching has been performed on all of the records in the business component, you can run future data matching batch jobs on just the new or changed records.

If you want to perform data matching for some number of mutually-exclusive subsets of the records in a business component, such as all the records where a field name starts with a given letter, use a separate job to specify each subset, with WHERE clauses as follows:

objwhereclause="[field_name] LIKE 'A*'"
objwhereclause="[field_name] LIKE 'B*'"
...
objwhereclause="[field_name] LIKE 'Z*'"
objwhereclause="[field_name] LIKE 'a*'"
...
objwhereclause="[field_name] LIKE 'z*'"

The following example further describes batch data matching.

Example of Batch Data Matching Using the Universal Connector

You must run batch mode key generation on all existing records before you run real-time data matching. The Universal Connector requires generated keys in the key tables first before you can run real-time data matching. The key generation is done within the deduplication task, which is the reason for running deduplication on all existing records first. For more information about batch data cleansing and matching, see Batch Data Matching and Data Cleansing.

The following procedure describes how to start a data matching batch job.

To perform batch mode data matching

Follow the instructions in Generating or Refreshing Keys Using Batch Jobs.

At the srvrmgr prompt, enter commands like those in the following table to perform data matching.


Business Component	Example of Server Manager Command
Account	run task for comp DQMgr with DqSetting="'Delete'", bcname=Account, bobjname=Account, opType=DeDuplication, objwhereclause="[Name] like 'search_string*'"
Account
Contact	run task for comp DQMgr with DqSetting="'Delete'", bcname=Contact, bobjname=Contact, opType=DeDuplication, objwhereclause="[Name] like 'search_string*'"
Contact
List Mgmt Prospective Contact	run task for comp DQMgr with DqSetting="'Delete'", bcname="List Mgmt Prospective Contact", bobjname="List Mgmt", opType=DeDuplication, objwhereclause="[Name] like 'search_string*'"
List Mgmt Prospective Contact

Full Data Matching Jobs

In a full data matching job, the records for which you want to locate duplicates and the candidate records that can include those duplicates are defined by the same search specification. A full data matching job is specified with the value Yes in the DQSetting parameter, see Table 19.

Full data matching jobs are useful when:

You want to perform data matching on a whole database table.
You are setting up the SDQ installation.
You perform data matching for the customer data for a particular business component for the first time.

A typical example of a command for a full data matching job is as follows:

run task for comp DQMgr with DqSetting="'','Yes','account_match.xml'", bcname=Account, bobjname=Account, opType=DeDuplication, objwhereclause="[Name] LIKE 'A*'"

Jobs like this that perform data matching for a subset of records are still considered to be full data matching jobs because the data to be checked does not depend on earlier data matching.

Incremental Data Matching Jobs

If you want to perform data matching for some number of nonexclusive subsets of the records in a business component, such as all the records that have been created or updated since you last ran data matching, use a WHERE clause that includes an appropriate timestamp, and also adjust the DqSetting clause of the command as shown in Table 19.

Table 19. DqSetting Parameter Details and Sample Values
DqSetting Parameter Sequence	Valid Values	Comments
First section	Leave blank	Specify as two adjacent quotation marks.
Second section (Enforce Search Spec on Candidate Records)	Yes No (default)	Specifies whether or not the same search specification is used for both the records whose duplicates are of interest and the candidate records that can include those duplicates. Use Yes for full data matching batch jobs. Use No for incremental data matching batch jobs.
Third section	Leave blank	None.

This kind of job is considered an incremental data matching job, because data matching was done earlier and does not need to be redone at this time. In an incremental data matching batch job, the records for which you want to locate duplicates are defined by the search specification, but the candidate records that can include those duplicates can be drawn from the whole applicable database table. Incremental data matching batch jobs are useful if you run them regularly, such as once a week. A typical example of a command for an incremental data matching job is as follows:

run task for comp DQMgr with DqSetting="'','No',''",
bcname=Account, bobjname=Account, opType=DeDuplication, objwhereclause="[Updated]
> '08/18/2005 20:00:00'

NOTE: If you do not specify the DQSetting parameter, or leave the second value of the DQSetting parameter blank, the job will be an incremental data matching job.

Siebel Data Quality Administration Guide		Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Legal Notices.