Siebel Data Quality Administration Guide > Siebel Data Quality Matching Server >

Data Quality Matching Libraries for Multiple Languages


The matching rules for the Data Quality Matching Server are compiled in a set of dynamic-link libraries (DLLs) for various languages. Because the character and name patterns differ substantially between languages, rules typically are tuned specifically for each language or language family.

The Siebel Data Quality Matching Server includes a set of matching libraries that cover a variety of languages and code pages. By default, the installation uses a generalized international library that is built to support a set of Latin1-based languages (languages predominant in the Americas, Western Europe, Australia, and New Zealand). In addition, the Siebel installation CD-ROMs include reference libraries for other regions and code pages. Table 6 provides the matching libraries and languages supported. For information about code pages, see Global Deployment Guide.

NOTE:  The international library intentionally ignores certain words and abbreviations because those words and abbreviations may have a different meaning in other non-Latin1 languages.

Table 6.  Supported Matching Libraries and Languages
International Library
Other Libraries
DAN - Danish
CHS - Simplified Chinese
DEU - German
CHT - Traditional Chinese
ESN - Spanish
CSY - Czech
ENU - English
ELL - Greek
FIN - Finnish
HEB - Hebrew
FRA - French
JPN - Japanese
ITA - Italian
KOR - Korean
NLD - Dutch
PLK - Polish
PTG - Portuguese
 
PTB - Brazilian Portuguese
 
SVE - Swedish
 

You can view the settings for the matching libraries using Siebel Tools (DeDuplication Business Service > Business Service User Prop > SSA Population Codepage*). For more information about Siebel Tools, see Siebel Tools Reference.

NOTE:  The matching rules for each language or combination of languages are delivered in the form of DLLs. You can retrieve additional DLLs by installing other language packs on the Siebel Server. For more information about DLLs, see Universal Connector Architecture.

Better matching may be achieved if the region-specific library is used. However, the international library is best if the data is not limited to that region, because a dataset can include a heterogeneous mixture of international names. Installing region-specific libraries for Latin-based languages requires that an administrator replace the library file on the Siebel Server with the language-specific version of the file. For example, for Windows ENU, the library is placed in C:\Siebel\SiebSrvr\bin\enu. For UNIX ENU, the library is placed in /export/home/siebel/siebsrvr/lib/enu.

The library file installed on each Siebel Server should be in sync with the data that is processed from that machine. For example, if the Japanese library is installed, a batch component request for key generation or deduplication should be constrained to Japanese data.

If the Siebel Server is running in Japanese, it loads and references the Japanese version of the matching libraries while the other Siebel Servers (running in a different language) load and reference other matching library files on their own server file systems. The match keys table in the database stores keys generated from different libraries on different Siebel Servers, together with indicators for code page and population (matching library). When a match request is executed, the list of possible match candidates is built based on the match keys from the same code page and population.

NOTE:  The Siebel Data Quality Matching Server does not support the ability to find matches across languages that are not supported by the installed library file. For example, English and French can be compared using the international library, but Chinese and Spanish data cannot be compared because the matching rules for Chinese requires a separate library.


 Siebel Data Quality Administration Guide 
 Published: 15 May 2003