Bookshelf v7.8: Data Cleansing and Data Matching

Siebel Data Quality Administration Guide > Overview of Siebel Data Quality >

Data Cleansing and Data Matching

The data stored in account, contact, prospect, and business address records in Oracle's Siebel Business Applications represents your existing and potential customers. Because of the importance of this data, maintaining its quality is essential. To ensure data quality, SDQ provides functionality to clean this data and to remove duplicated data.

In SDQ, data cleansing is used to correct data and make data consistent in new or modified customer records and typically consists of the following functions:

Automatic population of fields in addresses. If a user enters valid values for Zip Code, City, and Country, SDQ automatically supplies a State field value. Likewise, if a user enters valid values for City, State, and Country, SDQ automatically supplies a Zip Code value
Address correction. SDQ stores street address, city, state, and postal code information in a uniform and consistent format, as mandated by U.S. postal requirements. For recognized U.S. addresses, address correction provides ZIP+4 data correction and stores the data in certified U.S. Postal Service format. For example, 100 South Main Street, San Mateo, CA 94401 becomes 100 S. Main St., San Mateo, CA 94401-3256.
Capitalization. SDQ converts account, contact, and prospect names to mixed case (initial capitals). Address fields can be converted to mixed case, all lowercase, or all uppercase.
Standardization. SDQ ensures account, contact, and prospect information is stored in a uniform and consistent format. For example, IBM Corporation becomes IBM Corp.

Data cleansing is supported for the Account, Business Address, Contact, and List Mgmt Prospective Contact business components. For each business component, particular fields are used in data cleansing and this set of fields is configurable.

Data matching is the identification of potential duplicates for account, contact, and prospect records. Potential duplicate records are displayed in the Siebel application allowing you to manually merge duplicate records into a single record.

Data matching is supported for the Account, Contact, and List Mgmt Prospective Contact business components. For each business component, a set of fields is used for comparisons in the data matching process. The set of fields is configurable, and you can also specify other matching preferences such as the degree of matching required for records to be identified as potential duplicates.

TIP: The term deduplication is often used as a synonym for data matching particularly in names of user properties and other system parameters.

In SDQ you can enable and use both data cleansing and data matching at the same time, or you can use data cleansing and data matching on their own.

Siebel Data Quality Administration Guide		Copyright © 2006, Oracle. All rights reserved.