Oracle® Enterprise Data Quality

Customer Data Services Pack Release Notes

Release 11g R1 (11.1.1.7)

E49691-02

May 2014

This document contains the release information for Oracle Enterprise Data Quality (EDQ) Customer Services Data Pack Release 11g R1 (11.1.1.7).

Oracle recommends you review its contents before installing, or working with the EDQ-CDS product.

1 New Features and Improvements

The following sections describe the new features introduced in each release, as well as, improvements:

Section 1.1, "Release 11g R1 (11.1.1.7.4)"
Section 1.2, "Release 11g R1 (11.1.1.7.3)"

1.1 Release 11g R1 (11.1.1.7.4)

This section addresses Release 11g R1 (11.1.1.7.4).

1.1.1 Bug 17861632: Improve Matching of Company Names with Initials and Acronyms

EDQ-CDS does not find a match for some entities where the entity name is an acronym, standardization or abbreviation. The following new entity match rule groups have been created:

[E230] Standardized full name acronym exact

[E240] Full name without suffixes acronym exact

[E250] Full name without suffixes acronym contains

See Oracle Enterprise Data Quality Customer Data Services Pack Matching Guide for more details about these new rules

1.1.2 Bug 18048606: Improve Matching Involving Stronger Address Matches

EDQ-CDS does not find match for some entities where there is a strong address match and only a weak name match. The following new entity match rule groups have been created:

[E260] Entity name without suffixes loose typos

[E270] Full name without suffixes first token

The following new entity match rule has been created:

[E900AA] Full Address Exact

1.2 Release 11g R1 (11.1.1.7.3)

This section addresses Release 11g R1 (11.1.1.7.3).

1.2.1 Issue # SIE-271: Generic Batch Jobs

New generic batch jobs for clustering and matching have been added to the product. These jobs allow an external data source to easily be connected to the EDQ-CDS services with minimal configuration effort. The jobs can also be used as the basis for customization of the product.

1.2.2 Issue # SIE-272, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, and 293: Matching Improvements

The following Matching improvements are delivered:

A new Elimination Rule for entities and individuals disabled by default, to eliminate Driver-Driver matches was added. It can be enabled in your Run Profile using the externalized property. For example:
```
phase.*.process.Match\ -\ Entity.[ELIM010C]\ ELIMINATE\ MATCHES\ BETWEEN\ TWO\ DRIVER\ RECORDS.entity_match_rules_enabled = true
```

The entity and individual cluster limits are now configurable in your Run Profile using the externalized properties. For example:

phase.*.process.Match\ -\ Entity.*.entity_match_cluster_comparison_limit = 15000
phase.*.process.Match\ -\ Entity.*.entity_match_cluster_group_limit = 0

The full address match rules no longer use the value of the subadminarea attribute in their comparisons.
With matching services, leading zeroes are stripped only on numeric postalcodes to avoid a numeric postalcode reinterpreted as a number by an external program where leading zeroes are automatically stripped. For example, Excel may reformat numeric postalcodes as a number by removing the leading zeroes. This is not enabled by default in the Run Profile. If there are any alpha characters present, the leading zeroes are not stripped.

The address conflict entity rules, Name[...]Address Conflict, can be disabled using the externalized properties. For example:

phase.*.process.Match\ -\ Entity.[E040V]\ Script\ full\ name\ without\ suffixes\ exact\;\ address\ conflict.entity_match_rules_enabled = false

New entity match rules, Name[...]Address 1 Typo, City, Country, have been added to the existing match rule groups. These rule ensure that the matching weight is against the address line 1, city and country rather than the postalcode. This means that the postalcode can be different in any number of ways and a very similar address1, and exact city and country will be a match.
The entity name standardization processes and processors have been changed to ensure that whitespace surrounding ampersands (&) in name fields are correctly normalized.
A new entity cluster with the prefix, NMA (Full Name Metaphone, Address No Numbers), has been added.
New entity match rules and group [E125] Full name all words shorter with typos, have been added. These rules allow typo-tolerant matching on full entity names where the shorter name is fully contained within the longer name.
The Individual Cluster group with the prefix, LMC, has been changed from level 3 to level 2.

2 Bugs and Issues Resolved

The following sections describe the bugs and issues that are resolved in each release:

Section 2.1, "Release 11g R1 (11.1.1.7.4)"
Section 2.2, "Release 11g R1 (11.1.1.7.3)"

2.1 Release 11g R1 (11.1.1.7.4)

This section addresses Release 11g R1 (11.1.1.7.4).

2.1.1 Bug 18237187: EDQ-CDS Is Not Matching Some Company Names with Abbreviations

EDQ-CDS does not find a match for some entities where the entity name is an acronym and some or all of the acronym appears in the entity phrase strip list data.

All 2, 3, and 4 letter space-separated acronyms have been removed from the Entity Full Latin Phrase Strip List by deactivating these entries in the initialization project and removing these rows from the .jmp files.

2.2 Release 11g R1 (11.1.1.7.3)

This section addresses Release 11g R1 (11.1.1.7.3).

2.2.1 BugDB # 16969621: Use Provided Country Value in Preference to Default Code in Address Cleaning Service

The Address Cleaning service has been changed so that it will always use the country code you provide rather than the default country code in the Run Profile. If you do not provide a country code or it cannot be mapped to the standard country codes in EDQ-CDS, then EDQ-CDS attempts to derive country from the address data provided. When this is not possible, the default country code in the Run Profile is used.

2.2.2 BugDB # 16743518: Entity Script to Latin Maps and Strip Lists Have Duplicates

The reference data entities with duplicates have been corrected as follows:

Ent Name - International Script Phrase Strip List — Removed the duplicate for Thailand and changed the entries that had multiple data sources so that only one entry exists with pipe delimited data source values rather than multiple rows in the data.
Ent Name - International Script Token Strip List — Changed entries that contained multiple data sources to have only one entry with pipe delimited data source values rather than multiple rows in the data.
Ent Name - International Script to Latin Phrase Map — De-duplicated the arabic script value that maps to both General People's Congress (GPC) and GP by only including the GPC entry (GPC is from Saudi Arabia and GP is from Iran.)
Ent Name - International Script to Latin Token Map — Removed the duplicate entry, mapping ЗАТ to 'PLC', in the Russian data source.

2.2.3 BugDB # 16622146: Russian Character Replace Map Should Not Contain Both Upper and Lower Case

The Russian character replace map in the Transliterate processor had both the upper and lower case versions of the Cyrillic characters. When this was used with the 'ignore case' option in the replace text, it caused EDQ display a duplicate keys error. The Transliterate processor has been changed to map the three Cyrillic characters that had different versions for the upper and lower case to the character given for the lower case versions.

3 Related Documents

For more information, see the following documents in the Oracle Enterprise Data Quality documentation set:

Oracle Enterprise Data Quality Installation Guide
Oracle Enterprise Data Quality Architecture Guide
Oracle Enterprise Data Quality Customer Data Services Siebel Integration Guide
Oracle Enterprise Data Quality Customer Data Services Siebel Connector Installation Guide

See the latest version of this and all documents in the Oracle Enterprise Data Quality Documentation website at

http://download.oracle.com/docs/cd/E48549_01/index.htm

4 Documentation Accessibility

For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.

Access to Oracle Support

Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.

Oracle Enterprise Data Quality Customer Data Services Pack Release Notes, Release 11g R1 (11.1.1.7)

E49691-02

This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.

This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate failsafe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.