Understanding the Sun Match Engine

The Address Patterns File (addressPatterns*.dat)

The address patterns file defines the expected input patterns of each individual street address field being standardized so the Sun Match Engine can recognize and process these values. Tokens are used to indicate the type of address component in the input and output fields. This file contains two rows for each pattern. The first row defines the input pattern for each address field and provides an example. The second row defines the output pattern for each address field, the pattern type, the relative importance of the pattern compared to other patterns, and usage flags (as shown below).

AU A1 TY                01 Oak B Street
NA NA ST                T* 75                TX

When an address is parsed, each line of the address is delineated by a pipe (|) and sent to the parser separately. The output tokens for each line are then concatenated and the output pattern is processed using the addressOutPatterns*.dat file to determine whether the output pattern is listed in the file. If the pattern is found, output patterns are modified as indicated in the addressOutPatterns*.dat file to resolve any ambiguities that might arise when two lines of address information contain common elements. The relative importance determines which pattern to use when the format of the input field matches more than one pattern. This file should only be modified by personnel with a thorough understanding of address patterns and tokens.

The syntax of this file is:

input-pattern example output-pattern pattern-class pattern-modifier priority usage-flag exclude-flag

You can modify or add entries in this table as needed. Table 18 describes the columns in the addressPatterns*.dat file.

Table 18 Address Patterns File




Tokens that represent a possible input pattern from an individual unparsed street address field. Each token represents one component. For more information about address tokens, see Address Type Tokens.


An example of a street address that fits the specified pattern. This file element is optional. 


Tokens that represent the output pattern for the specified input pattern. Each token represents one component of the output of the Sun Match Engine. For more information about address tokens, see Address Type Tokens.


An indicator of the type of address component represented by the pattern. Possible pattern types are listed in Pattern ClassesPattern Classes.


An indicator of whether the priority of the pattern is averaged against other patterns that match the input. Pattern modifiers are listed in Pattern Modifiers.


The priority weight to use for the pattern when the pattern is a sub-pattern of a larger input pattern. For more information, see Priority Indicators.


A flag indicating how the term is used (for more information, see Pattern Classes). This file element is optional.


This file element is optional. 

Following is an excerpt from the addressPatternsUS.dat file.

NU DR TY A1 AU                     01   123 South Avenida B Oak
HN PD PT NA NA                     H* 70

NU DR TY NU DR                     01   123 South Avenida 1 West
HN PD PT NA SD                     H* 70

NU A1 TY AU TY                     01   123 C circle hill drive
HN HS NA NA ST                     H* 70

NU A1 AM A1 TY                     01   123 M & M road
HN NA NA NA ST                     H* 65

NU TY AU A1                        01   123 Avenida Oak B
HN PT NA NA                        H* 60

NU TY NU A1                        01   123 Avenida 1 B
HN PT NA NA                        H* 60