Understanding the Sun Match Engine

The Address Internal Constants File (addressInternalConstants*.cfg)

The address internal constants file defines and configures tokens and array sizes used by the address standardizer. This file is used internally by the standardization engine and most of the parameters should not be modified.

One parameter you might need to modify is spCh, which defines any special characters that should not be removed from addresses during standardization. By default, the standardization process keeps hyphens (-), pound signs (#), forward slashes (/), ampersands (&), and pipes (|). Any other special characters found in the address are removed unless they are defined for the spCh parameter. Delineate each special character in the list with a space, as shown below.

spCh = & < >

Characters that are not included in the standard ISO 8859-1 (Latin-1) character set must be preceded by a back slash (\) and represented in Unicode. For example, use the following to retain right and left single quotes (” ’) in addresses:


spCh = \u2018 \u2019

Note –

Periods (.) and commas (,) are always removed from addresses, even if they are added to the spCh list.