Siebel Data Quality Administration Guide > Using Siebel Data Quality > Matching Data Using Batch Jobs >

Example of Batch Data Matching Using the Universal Connector


This topic provides an example of batch data matching using the Universal Connector in which Firstlogic is used as the matching engine.

If you are using Firstlogic software you can run full data matching jobs or incremental data matching jobs as described in the following topics.

Full Data Matching Jobs

In a full data matching job, the records for which you want to locate duplicates and the candidate records that can include those duplicates are defined by the same search specification. A full data matching job is specified with the value Yes in the DQSetting parameter, see Table 14.

Full data matching jobs are useful when:

  • You want to perform data matching on a whole database table.
  • You are setting up the SDQ installation.
  • You perform data matching for the customer data for a particular business component for the first time.

A typical example of a command for a full data matching job is as follows:

run task for comp DQMgr with DqSetting="'','Yes','account_match.xml'", bcname=Account, bobjname=Account, opType=DeDuplication, objSortClause="Dedup Token", objwhereclause="[Name] LIKE 'A*'"

Jobs like this that perform data matching for a subset of records are still considered to be full data matching jobs because the data to be checked does not depend on earlier data matching.

Incremental Data Matching Jobs

If you want to perform data matching for some number of nonexclusive subsets of the records in a business component, such as all the records that have been created or updated since you last ran data matching, use a WHERE clause that includes an appropriate timestamp, and also adjust the DqSetting clause of the command as shown in Table 14.

Table 14. DqSetting Parameter Details and Sample Values for Firstlogic
DqSetting Parameter Sequence
Valid Values
Comments

First section

Leave blank

This section is not used by Firstlogic. Specify as two adjacent single quotes.

Second section (Enforce Search Spec on Candidate Records)

  • Yes
  • No (default)

Specifies whether or not the same search specification is used for both the records whose duplicates are of interest and the candidate records that can include those duplicates.

  • Use Yes for full data matching batch jobs.
  • Use No for incremental data matching batch jobs.

Third section

  • account_match.xml
  • account_incremental_match.xml
  • contact_match.xml
  • contact_incremental_match.xml
  • prospect_match.xml
  • prospect_incremental_match.xml

Specifies the name of the data flow associated with the current batch job. Choose the name to specify based on the business component for which you are doing data matching. If your batch job will examine only records created or changed since the last batch job, use a data flow name that includes "incremental;" otherwise, use the shorter data flow name.

This kind of job is considered an incremental data matching job, because data matching was done earlier and does not need to be redone at this time. In an incremental data matching batch job, the records for which you want to locate duplicates are defined by the search specification, but the candidate records that can include those duplicates can be drawn from the whole applicable database table. Incremental data matching batch jobs are useful if you run them regularly, such as once a week. A typical example of a command for an incremental data matching job is as follows:

run task for comp DQMgr with DqSetting="'','No','account_incremental_match.xml'", bcname=Account, bobjname=Account, opType=DeDuplication, objSortClause="Dedup Token", objwhereclause="[Updated] > '08/18/2005 20:00:00'

NOTE:  If you do not specify the DQSetting parameter, or leave the second value of the DQSetting parameter blank, the job will be an incremental data matching job.

Siebel Data Quality Administration Guide Copyright © 2006, Oracle. All rights reserved.