The address patterns files use pattern type tokens, pattern classes, pattern modifies, and priority indicators to process and parse address data. Before modifying any of the patterns files, you must have a good understanding of these file components.
The address pattern and clues files use tokens to denote different components in a street address, such as street type, house number, street names, and so on. These files use one set of tokens for input fields and another set for output fields. You can use only the predefined tokens to represent address components; the Sun Match Engine does not recognize custom tokens.
Table 20 lists and describes each input token; Table 21 lists and describes each output token.
Table 20 Input Address Pattern Type Tokens
Token |
Description |
---|---|
Alphabetic value, one character in length |
|
Ampersand |
|
Generic word |
|
Building property |
|
Building unit |
|
Post office box |
|
Dash (as a starting character) |
|
Street direction |
|
Extra information |
|
Extension |
|
Numeric fraction |
|
Highway route |
|
Mile posts |
|
Common words, such as “of”, “the”, and so on |
|
Numeric value |
|
Ordinal type |
|
Prefix type |
|
Rural route |
|
State abbreviation |
|
Street type |
|
Descriptor within the structure |
|
Identifier within the structure |
Table 21 lists and describes each output token.
Table 21 Output Address Pattern Tokens
Token |
Description |
---|---|
Building number prefix |
|
Second building number prefix |
|
Property or building directional suffix |
|
Structure (building) identifier |
|
Property or building name |
|
Building number suffix |
|
Property or building type suffix |
|
Post office box descriptor |
|
Structure (building) descriptor |
|
Property or building directional prefix |
|
Extra information |
|
Extension index |
|
First house number (the actual number) |
|
Second house number (house number suffix) |
|
House number |
|
House number suffix |
|
Second street name |
|
Street name |
|
Building number |
|
Conjunctions that connect words or phrases in one component type (usually the street name) |
|
House number prefix |
|
Second house number prefix |
|
Directional prefix to the street name |
|
Street type prefix to the street name |
|
Rural route descriptor |
|
Rural route identifier |
|
Street type suffix to the second street name |
|
Directional suffix to the street name |
|
Street type suffix to the street name |
|
Property or building type prefix |
|
Identifier within the structure |
|
Descriptor within the structure |
|
Post office box identifier |
Each pattern defined in the address patterns file must have an associated pattern class. The pattern class indicates a portion of the input pattern or the type of address data that is represented by the pattern. You can specify any of the following pattern classes.
W - the address pattern represents a unit within a structure, such as an apartment or suite number
T - the address pattern represents a street type or direction
These classes are also specified as usage flags in the patterns file and the master clues file.
Each pattern type must be followed by a pattern modifier that indicates how to handle cases where one or more defined patterns is found to be a sub-pattern of a larger input pattern. In this case, the Sun Match Engine must know how to prioritize each defined pattern that is a part of the larger pattern. There are two pattern modifiers.
* - An asterisk indicates that the priority weight for the matching pattern is averaged down equally with the other matching sub-patterns.
+ - A plus sign indicates that the priority weight for the matching pattern is not averaged down equally with the other matching sub-patterns.
The priority indicator is a numeric value following the pattern modifier that indicates the priority weight of the pattern. These values work best when defined as a multiple of five between and including 35 and 95. If a pattern is assigned a priority of 90 or 95 and the pattern matches, or is a sub-pattern of, the input pattern, the match engine stops searching for additional matching patterns and uses the high-priority matching pattern.