Understanding the Sun Match Engine

Address Pattern File Components

The address patterns files use pattern type tokens, pattern classes, pattern modifies, and priority indicators to process and parse address data. Before modifying any of the patterns files, you must have a good understanding of these file components.

Address Type Tokens

The address pattern and clues files use tokens to denote different components in a street address, such as street type, house number, street names, and so on. These files use one set of tokens for input fields and another set for output fields. You can use only the predefined tokens to represent address components; the Sun Match Engine does not recognize custom tokens.

Table 20 lists and describes each input token; Table 21 lists and describes each output token.

Table 20 Input Address Pattern Type Tokens

Token 

Description 

A1

Alphabetic value, one character in length 

AM

Ampersand 

AU

Generic word 

BP

Building property 

BU

Building unit 

BX

Post office box 

DA

Dash (as a starting character) 

DR

Street direction 

EI

Extra information 

EX

Extension 

FC

Numeric fraction 

HR

Highway route 

MP

Mile posts 

NL

Common words, such as “of”, “the”, and so on 

NU

Numeric value 

OT

Ordinal type 

PT

Prefix type 

RR

Rural route 

SA

State abbreviation 

TY

Street type 

WD

Descriptor within the structure 

WI

Identifier within the structure 

Table 21 lists and describes each output token.

Table 21 Output Address Pattern Tokens

Token 

Description 

1P

Building number prefix 

2P

Second building number prefix 

BD

Property or building directional suffix 

BI

Structure (building) identifier 

BN

Property or building name 

BS

Building number suffix 

BT

Property or building type suffix 

BX

Post office box descriptor 

BY

Structure (building) descriptor 

DB

Property or building directional prefix 

EI

Extra information 

EX

Extension index 

H1

First house number (the actual number) 

H2

Second house number (house number suffix) 

HN

House number 

HS

House number suffix 

N2

Second street name 

NA

Street name 

NB

Building number 

NL

Conjunctions that connect words or phrases in one component type (usually the street name) 

P1

House number prefix 

P2

Second house number prefix 

PD

Directional prefix to the street name 

PT

Street type prefix to the street name 

RR

Rural route descriptor 

RN

Rural route identifier 

S2

Street type suffix to the second street name 

SD

Directional suffix to the street name 

ST

Street type suffix to the street name 

TB

Property or building type prefix 

WI

Identifier within the structure 

WD

Descriptor within the structure 

XN

Post office box identifier 

Pattern Classes

Each pattern defined in the address patterns file must have an associated pattern class. The pattern class indicates a portion of the input pattern or the type of address data that is represented by the pattern. You can specify any of the following pattern classes.

These classes are also specified as usage flags in the patterns file and the master clues file.

Pattern Modifiers

Each pattern type must be followed by a pattern modifier that indicates how to handle cases where one or more defined patterns is found to be a sub-pattern of a larger input pattern. In this case, the Sun Match Engine must know how to prioritize each defined pattern that is a part of the larger pattern. There are two pattern modifiers.

Priority Indicators

The priority indicator is a numeric value following the pattern modifier that indicates the priority weight of the pattern. These values work best when defined as a multiple of five between and including 35 and 95. If a pattern is assigned a priority of 90 or 95 and the pattern matches, or is a sub-pattern of, the input pattern, the match engine stops searching for additional matching patterns and uses the high-priority matching pattern.