Siebel Data Quality Administration Guide > Installing and Upgrading Siebel Data Quality >

SDQ Matching Server Libraries


Character and name patterns differ substantially between languages, therefore the matching rules for the SDQ Matching Server are compiled in a set of shared libraries adapted for different languages or language families. All of these shared libraries have the same name (for example, n3sqsb.dll on Windows), but are installed in language-specific subdirectories as shown in Table 9.

Each interactive object manager uses the language-specific library from the \bin\<language> folder on Windows or the /lib/<language> folder on UNIX respectively, and the keys that are generated have LANG_ALGRTHM_CD in the key table, which reflects the library's population and code page. Only records with the same LANG_ALGRTHM_CD values are considered for matching against each other.

The library in the ENU folder is the most generic library as it uses the Default (=International) population, which can be used to deduplicate records in all Latin languages. Latin languages are languages predominant in the Americas, Western Europe, Australia, and New Zealand.

NOTE:  The international library intentionally ignores certain words and abbreviations because those words and abbreviations can have a different meaning in other non-Latin1 languages. Examples include GmBH (German), Oys (Finnish), and other abbreviations for corporate structures.

In addition, the Siebel CRM installation media includes matching libraries for other languages and code pages. You can retrieve these additional shared libraries by installing the other language packs on the Siebel Server. Table 10 lists the languages supported.

For real-time matching, the object manager always uses the n3sqsb of its language. However, it is different for batch tasks. For batch tasks, the DQMgr by default also uses Language ENU with its international population library.

To use a different population or matching library (other than ENU) for batch deduplication, you must clone the DQMgr component and set its language parameter to the language of the library that you want to use. This is optional for Latin languages. For example, the DEU, FRA, and ITA libraries all result in slightly better matches when you clone the DQMgr (instead of using the international ENU library), but there is the added cost of having to create a separate DQMgr for each language and run separate batch tasks with the object WHERE clause set to process only records in that language.

If you are only using Western languages for real-time AND batch deduplication, you can copy the ENU n3sqsb to the other WESTERN (Latin) languages in the \bin\<language> folder (on Windows) or the /lib/<language> folder (on UNIX), so that all keys generated from real-time and batch data matching will have the same LANG_ALGRTHM_CD value of DefaultLatin_1_Mixed.

For non-latin languages (ARA, JPN, KOR, or THA), it is essential to create a separate DQMgr with the parameter language set accordingly in the component definition (as the library is loaded on first access and the language cannot be specified dynamically for batch tasks). For example:

  • Create DQMgr_ARA where Language is equal to ARA (Language=ARA)

    NOTE:  By default, the Application Repository File parameter changes to siebel.srf and you must change this if using a Siebel SIA application.

  • Then run the batch tasks (Key Generate and DeDuplication) using an object WHERE clause setting that only retrieves records with Arabic data, for example, using the [Country] or [Language Code] fields.

NOTE:  You must ensure that any fields that you use in the object WHERE clause or a rule's search spec are always populated through configuration in Siebel Tools, for example, by setting a predefault value and/or exposing the fields in the GUI and making them required.     

Table 10. SDQ Matching Libraries and Supported Languages
Matching Library and
Code Page
Languages Supported and Language Code
Syntax (Name and Value Setting)

International ("Default") Latin_1_Mixed code page

DAN - Danish    

SSA Population-Codepage DAN "Denmark", "Latin_1_Mixed"

DEU - German

SSA Population-Codepage DEU "Germany", "Latin_1_Mixed"

ENU - U.S. English

SSA Population-Codepage ENU "Default", "Latin_1_Mixed"

ESN - Spanish

SSA Population-Codepage ESN "Spain", "Latin_1_Mixed"

FIN - Finnish

SSA Population-Codepage FIN "Finland", "Latin_1_Mixed"

FRA - French

SSA Population-Codepage FRA "French", "Latin_1_Mixed"

ITA - Italian

SSA Population-Codepage ITA "Italy", "Latin_1_Mixed"

International ("Default") Latin_1_Mixed code page

(continued)

NLD - Dutch

SSA Population-Codepage NLD "Netherlands", "Latin_1_Mixed"

PTB - Brazilian Portuguese

SSA Population-Codepage PTB = "Brazil", "Latin_1_Mixed"

PTG - Portuguese

SSA Population-Codepage PTG = "Portugal", "Latin_1_Mixed"

SVE - Swedish

SSA Population-Codepage SVE = "Sweden", "Latin_1_Mixed"

NOTE:  The following non-ENU n3sqsb libraries are single language libraries that will only match within the Population and Codepage specified.

Arabic

ARA - Arabic

SSA Population-Codepage ARA "Arabic", "Arabic"

Chinese (Simplified)

CHS - Simplified Chinese

SSA Population-Codepage CHS "China", "Chinese_Simp"

Chinese (Traditional)

CHT - Traditional Chinese

SSA Population-Codepage CHT "China", "Chinese_Trad"

Czech

CSY - Czech

SSA Population-Codepage CSY "Czech", "Latin_2_1250"

Greek

ELL - Greek

SSA Population-Codepage ELL "Greece", "Greek"

Hebrew

HEB - Hebrew

SSA Population-Codepage HEB "Israel", "Hebrew"

Japanese

JPN - Japanese

SSA Population-Codepage JPN "Japan", "Japanese"

Korean

KOR - Korean

SSA Population-Codepage KOR "South_Korea", "Korean"

Polish

PLK - Polish

SSA Population-Codepage PLK "Poland", "Latin_2_1250"

Thailand

THA - Thai

SSA Population-Codepage THA "Thailand", "Thai"

NOTE:  The SDQ Matching Server does not support the ability to find matches across languages that are not supported by the installed library. For example, English and French data can be compared using the international library, but Chinese and Spanish data cannot be compared because Chinese requires a separate library.

Siebel Data Quality Administration Guide Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Legal Notices.