The business patterns file (bizpatterns.dat) defines multiple formats expected from the business name input fields along with the standardized output of each format. The patterns and output appear in two-row pairs in this file, as shown below.
4 PNT AST SEP-GLC ORT PNT AST DEL ORT |
The first line describes the input pattern and the second describes the output pattern using tokens to denote each component. The supported tokens are described in Business Name Tokens. A number at the beginning of the first line indicates the number of components in the given business name format. You can modify this file using the following syntax.
length input-pattern output-pattern |
The following table lists and describes the components in the above syntax.
Table 20 Business Patterns File Components
Component |
Description |
---|---|
The number of business name components in the input field. |
|
input-pattern |
Tokens that represent a possible input pattern from the unparsed business name fields. Each token represents one component. For more information about address tokens, see Business Name Tokens. |
output-pattern |
Tokens that represent the output pattern for the specified input pattern. Each token represents one component. For more information about business name tokens, see Business Name Tokens. |
Below is an excerpt from the business patterns file.
4 PNT AST SEP-GLC ORT PNT AST DEL ORT 4 NFG AJT SEP-GLC ORT PNT PNT DEL ORT 4 NF AJT SEP-GLC ORT PNT PNT DEL ORT 4 CST IDT NF ORT PNT PNT PNT ORT 4 PNT AJT SEP-GLC ORT PNT PNT DEL ORT |
The business patterns file uses tokens to denote different components in a business name, such as the primary name, alias type key, URL, and so on. The file uses one set of tokens for input fields and another set for output fields. The tokens indicate the type key files to use to determine the appropriate values for each output field. You can use only the predefined tokens to represent business name components; the standardization engine does not recognize custom tokens.
Table 21 lists and describes each input token; Table 22 lists and describes each output token.
Table 21 Business Name Input Pattern Tokens
Pattern Identifier |
Description |
---|---|
A connector token |
|
A primary name of a business |
|
A hyphenated primary name of a business |
|
A common business term |
|
The URL of the business’ web site |
|
A business alias type key (usually an acronym) |
|
A country name |
|
A nationality |
|
A city or state type key |
|
An industry type key |
|
Both an industry and an adjective type key |
|
An adjective type key |
|
An association type key |
|
An organization type key |
|
A separator key |
|
Generic term, not recognized as a specific business name component, with an internal hyphen |
|
Generic term, not recognized as a specific business name component |
|
A single character, not recognized as a specific business name component |
|
A joining comma (a glue type separator) |
|
A joining hyphen (a glue type separator) |
|
The text “and” |
|
A glue type key, such as a forward slash, connecting two parts of a business name component |
|
A business primary name followed by a hyphen and a generic term that is not recognized as a specific business name component |
|
A generic term that is not recognized as a specific business name component, followed by a hyphen and a recognized business primary name |
|
Two generic terms, not recognized as specific business name components and separated by a hyphen |
Table 22 lists and describes each output token.
Table 22 Business Name Output Pattern Tokens
Pattern Identifier |
Description |
---|---|
The primary name of the business |
|
The URL of the business |
|
The alias type key of the business (usually an acronym) |
|
The industry type key of the business |
|
The association type key of the business |
|
The organization type key of the business |
|
A generic term not recognized as a business name component |