2Overview of Data Quality

Overview of Data Quality

This chapter provides an overview of data quality functionality and products for Siebel CRM and Oracle Customer Hub. It includes the following topics:

Data Profiling

Data profiling typically provides profiling capabilities that are set in an application specifically designed to put control of data quality processes in the hands of business information owners, such as data analysts and data stewards. The solution also provides data analysis, reporting, and monitoring capabilities.

When data quality is measured, it can be effectively managed. Data profiling provides the metrics and reports that business information owners need to continuously measure, monitor, track, and improve data quality at multiple points across the organization. Data profiling also enables business information owners and IT (information technology) to work together to deploy lasting data quality programs. Business information owners use data profiling to build data quality rules and define data quality targets together with the IT team, which then manages deployment enterprise-wide.

You can use data profiling to:

  • Analyze and rank data according to completeness, conformity, consistency, duplication, integrity, and accuracy (you must use rules and reference data to analyze and rank data).

  • Identify, categorize, and quantify low-quality data

For more information about data profiling and Oracle data profiling offerings, see Oracle Fusion Middleware Upgrade Guide for Oracle Data Integrator 11g Release 1 on Oracle Technology Network (http://www.oracle.com/technetwork/indexes/documentation/index.html).

Data Parsing and Standardization

Data parsing and standardization typically provides data standardization capabilities, enabling data analysts and data stewards to standardize and validate their customer data. An interface is usually included which can be used to design, build, and manage data quality efforts.

The solution offers data parsing and standardization capabilities that can be used to:

  • Standardize, validate, enhance, and enrich your customer data

  • Standardize and validate mailing addresses for a wide range of countries

  • Parse and standardize freeform text data elements (you must use rules and reference data dictionaries to parse and standardize freeform text data elements.)

For more information about data parsing and standardization and Oracle offerings within the data parsing and standardization arena, seeOracle Fusion Middleware Upgrade Guide for Oracle Data Integrator 11g Release 1 on Oracle Technology Network (http://www.oracle.com/technetwork/indexes/documentation/index.html).

Data Matching and Data Cleansing

The data stored in account, contact, and prospect records in Oracle’s Siebel CRM represents your existing and potential customers. Because of the importance of this data, maintaining its quality is essential. To ensure data quality, functionality is provided to clean this data and to remove duplicated data.

Data Cleansing

Data cleansing is used to correct data and make data consistent in new or modified customer records and typically consists of the following functions:

  • Automatic population of fields in addresses. If a user enters valid values for Zip Code, City, and Country, data quality automatically supplies a State field value. Likewise, if a user enters valid values for City, State, and Country, data quality automatically supplies a Zip Code value.

  • Address correction. Data quality stores street address, city, state, and postal code information in a uniform and consistent format, as mandated by U.S. postal requirements. For recognized U.S. addresses, address correction provides ZIP+4 data correction and stores the data in certified U.S. Postal Service format. For example, 100 South Main Street, San Mateo, CA 94401 becomes 100 S. Main St., San Mateo, CA 94401-3256.

  • Capitalization. Based on configuration, data quality converts fields for account, contact, prospect, and address to mixed case, all lowercase, or all uppercase.

  • Standardization. Data quality ensures account, contact, and prospect information is stored in a uniform and consistent format. For example, IBM Corporation becomes IBM Corp.

Data cleansing is supported for the Account, Business Address, Contact, and List Mgmt Prospective Contact business components. For each business component, particular fields are used in data cleansing and this set of fields is configurable.

Data Matching

Data matching is the identification of potential duplicates for account, contact, and prospect records. Potential duplicate records are displayed in the Siebel application allowing you to manually merge duplicate records into a single record.

Data matching is supported for the Account, Contact, and List Mgmt Prospective Contact business components. For each business component, a set of fields is used for comparisons in the data matching process. The set of fields is configurable, and you can also specify other matching preferences such as the degree of matching required for records to be identified as potential duplicates.

Tip: The term deduplication is often used as a synonym for data matching particularly in names of system parameters.

In data quality you can enable and use both data cleansing and data matching at the same time, or you can use data cleansing and data matching on their own.

Data Quality Products for Data Matching and Data Cleansing

The data quality products available for performing data quality functions within Siebel CRM enterprise and Oracle Customer Hub are divided into two categories:

  • Data quality products that are embedded into Siebel CRM enterprise and Oracle Customer Hub

  • Data quality products that use an open connector to connect to third-party data quality vendors

Embedded Data Quality products

The data quality products that are embedded into Siebel CRM and Oracle Customer Hub for data matching and cleansing are:

  • Oracle Data Quality Matching Server. Provides real-time and batch data matching functionality using licensed third-party Informatica Identity Resolution software with functionality from Informatica Identity Resolution. For more information, see Oracle Data Quality Matching Server.

  • Oracle Data Quality Address Validation Server. Provides address validation and standardization functionality using licensed third-party Informatica Identity Resolution software with functionality from Informatica Identity Resolution. For more information, see Oracle Data Quality Address Validation Server.

Open Connector to Third-Party Data Quality Vendors

The Universal Connector provides real-time and batch data matching functionality and data cleansing functionality, as long as the associated third-party software also supports data cleansing.

Note: In previous releases, Universal Connector was known as SDQ Universal Connector.

If using a third-party data quality vendor for data matching, then Siebel Data Quality is mandatory (since Siebel Data Quality has the underlying infrastructure for enabling data quality). Integration between the Siebel application and the third-party data quality vendor is not possible without Siebel Data Quality.

Siebel Data Quality is a user based license, containing the underlying infrastructure and business services for enabling data quality. All Siebel CRM data quality users must license data quality at the user level using Siebel Data Quality.

Related Topic

Siebel Data Quality

Oracle Data Quality Matching Server

The Oracle Data Quality Matching Server provides real-time and batch data matching functionality using licensed third-party IIR software.

The Oracle Data Quality Matching Server is an identity search application that searches your identity data, finds duplicates in it, and matches any duplicates found to other identity data. Running as an application server or suite of servers, Oracle Data Quality Matching Server does the following:

  • Reads identity data from your databases, using specified instructions and permissions.

  • Does not change your data but instead keeps a copy of it, thereby ensuring data consistency.

  • Builds the SSA_NAME3 fuzzy indexes, thereby enabling the correct identity data to be found.

  • Provides several simple search client procedures including, single search, batch search, and duplicate finder.

About Using the Oracle Data Quality Matching Server

You can use theOracle Data Quality Matching Server to do the following:

  • Perform real-time search for people, companies, contacts, addresses, and households.

  • Discover duplicates and establish relationships in real time.

  • Build relationship link tables.

  • Match external files and databases.

The Oracle Data Quality Matching Server connector uses the Universal Connector in a mode where match candidate acquisition takes place within the Oracle Data Quality Matching Server, not within Siebel CRM. Since the match keys are generated and stored within the Oracle Data Quality Matching Server, key generation and key refresh operations are eliminated within Siebel CRM. This integration, whereby match candidate acquisition takes place within the Oracle Data Quality Matching Server cannot be used by other third-party data quality matching engines.

For more information about Oracle Data Quality Matching Server installation and configuration, see Process of Installing the Oracle Data Quality Matching Server and Configuring Data Quality for Oracle Data Quality Matching Server.

Oracle Data Quality Address Validation Server

The Oracle Data Quality Address Validation Server is an address standardization application that provides capabilities to parse, standardize, transliterate, duplicate, and validate address data, resulting in improved address data quality. The validation capability requires the licensing of appropriate postal directories for the countries where address validation is required.

The Oracle Data Quality Address Validation Server uses a licensed version of the third-party software, IIR, for data cleansing.

Features of Oracle Data Quality Address Validation Server are:

  • Integrated single API supporting all countries:

    Oracle Data Quality Address Validation Server lets you use a single API for all countries, so that you can start working immediately and add countries without the need for additional programming. The API is compatible with all major programming languages.

  • Advanced validation, and correction of worldwide postal addresses, including address coverage for more than 240 countries:

    Oracle Data Quality Address Validation Server matches and corrects all address data, filters out superfluous information, assesses deliverability, and generates a detailed report with suggestions for possible sources of address problems.

  • Parsing and standardization:

    Oracle Data Quality Address Validation Server parses both structured and unstructured data, identifies residues, and formats and standardizes the data (without the need for payment of special data license fees).

  • Convenient updating:

    Postal reference tables in many countries change frequently. Oracle Data Quality Address Validation Server has arrangements with many local postal organizations (including Informatica Address Doctor) that allows you to receive monthly, quarterly, or biannual updates. Reference tables for each country are provided in a separate, operating system-independent database that is easy to update from a CD, DVD, or by downloading over the Internet.

The Universal Connector is integrated with the Oracle Data Quality Address Validation Server for data cleansing.

About Using the Oracle Data Quality Address Validation Server

You can use the Oracle Data Quality Address Validation Server to cleanse data on account, contact, and prospect data from the UI in your Siebel application, or by running a batch job in Siebel CRM. You can also cleanse the data in EAI mode by sending in the address data in Simple Object Access Protocol (SOAP) format.

When you enter a new address using the contact or account screen in your Siebel application, all address data is validated, cleansed, and standardized before being committed to the Siebel database. If the address cannot be validated, then the address is standardized by using the Upper, Lower, or Camel case (depending on Oracle Data Quality Address Validation Server configuration). In addition, the account name, contact name, and other attributes are standardized.

When new contacts, accounts, or addresses are entered into Siebel CRM through a batch job, address standardization is applied before committing any records to the Siebel database.

In all cases:

  • The Oracle Data Quality Address Validation Server evaluates and modifies the record according to configuration.

    Oracle Data Quality Address Validation Server returns an address validation flag and the validation status.

  • The Siebel database is then updated with the cleansed data, which has been formatted and standardized with address validation.

  • In the Siebel application, the updated cleansed record is displayed on the UI.

For more information about Oracle Data Quality Address Validation Server installation and configuration, see Process of Installing the Oracle Data Quality Address Validation Server and Configuring Siebel CRM for the Oracle Data Quality Address Validation Server.

Universal Connector

Note: In previous releases, Universal Connector was known as SDQ Universal Connector.

The Universal Connector is a connector to third-party software that allows Siebel CRM to use the capabilities of a third-party application for data matching, data cleansing, or both data matching and data cleansing on account, contact, and prospect data within the Siebel application.

The Universal Connector supports data cleansing on account, contact, and prospect data in real-time and batch processing modes. The Universal Connector works across various languages and operating systems, though the support offered by particular third-party software for data matching or data cleansing might not cover all of the languages supported by Siebel CRM. For more information about platforms supported, see the Certifications tab on My Oracle Support:

Note: For information about the Certifications application, see article 1492194.1 (Article ID) on My Oracle Support.

To use the Universal Connector, you must obtain, license, and install third-party software in addition to obtaining Siebel Data Quality product licensing. The data matching and data cleansing capabilities of the Universal Connector are driven by the capabilities and configuration options of the third-party software.

Note: Certain third-party software from data quality vendors are certified by Oracle. For information about third-party solutions and about products that are certified for the Universal Connector, visit the Alliances section and the Partners section on the Oracle and Siebel Web site: http://www.oracle.com/siebel/index.html.

The Universal Connector can be used in two different modes:

  • The Oracle Data Quality Matching Server connector uses the Universal Connector in a mode where match candidate acquisition takes place within the Oracle Data Quality Matching Server. This mode applies only to the Oracle Data Quality Matching Server.

  • Third-party data quality vendors use the Universal Connector in a mode where match candidate acquisition takes place within Siebel CRM.

You can configure the Universal Connector to specify which fields are used for data cleansing and data matching and their mapping to external application field names.

Note: The Oracle Data Quality License is valid only for use with Oracle Master Data Management and Oracle CRM deployments.

How Data Quality Relates to Other Entities in Siebel CRM

The data quality products integrate into the overall Siebel CRM environment from Oracle, as shown in the following image, as follows:

  • In real-time mode, the Universal Connector is called by interactive object managers such as the Call Center object manager.

  • In batch mode, the Universal Connector is called by the preconfigured server component, Data Quality Manager (DQMgr), either from the Siebel application user interface, or by starting tasks with the Siebel Server Manager command-line interface, the srvrmgr program. For more information, see Siebel System Administration Guide on Siebel Bookshelf.

    Note: The Siebel Bookshelf is available on Oracle Technology Network ( http://www.oracle.com/technetwork/indexes/documentation/index.html) and Oracle Software Delivery Cloud. It might also be installed locally on your intranet or on a network location.
  • The Universal Connector obtains account, contact, and prospect field data from the Siebel database using the Deduplication business service for data matching, and the Data Cleansing business service for data cleansing. Like other business services, these are reusable modules containing a set of methods. Using data quality functionality, business services simplify the task of moving data and converting data formats between the Siebel application and external applications. The business services can also be accessed by Siebel VB or Siebel eScript code or directly from a workflow process.

  • The fields used in data cleansing and data matching are sent to the appropriate cleansing or matching engine.

  • Data matching and data cleansing can also be enabled for the Enterprise Application Integration (EAI) adapter and Oracle’s Siebel Universal Customer Master (UCM) products.

For more information about business services and enabling data quality when using EAI, see Integration Platform Technologies: Siebel Enterprise Application Integration.

Data Quality Architecture: This image is described in the surrounding text.