Understanding the Sun Match Engine

Configuring the Standardization Structure for Address Data (Repository)

The standardization structure is configured in the StandardizationConfig section of the Match Field file, which is described in detail in Match Field Configuration (Repository) in Understanding Sun Master Index Configuration Options (Repository). To configure the required fields for standardization and phonetic encoding, modify the standardization and phonetic encoding structures. The following sections provide additional guidelines and samples specific to standardizing address data.


Note –

In the default configuration, the rules defined for the address data type assume that all input fields must be parsed as well as normalized. Thus, there is no need to configure fields only for normalization.


Address Standardization Structures

For address fields, the source fields in the standardization structure must include the fields predefined for parsing and normalization. This includes any fields containing street address information, which are parsed into the street address fields listed in Address Data Object Structure (excluding the phonetic street name field). The target fields can include any of these parsed fields. Follow the instructions under Defining Master Index Standardization Rules (Repository) in Configuring Sun Master Indexes (Repository) to define fields for normalization. For the standardization-type element, enter Address (for more information, see Sun Match Engine Match and Standardization Types). For a list of field IDs to use in the standardized-object-field-id element, see Table 3.

A sample standardization structure for address data is shown below. This structure parses the first two lines of street address into the standard street address fields. Only the United States domain is defined in this structure.


free-form-texts-to-standardize>
   <group standardization-type="ADDRESS"
    domain-selector="com.stc.eindex.matching.impl.SingleDomainSelectorUS">
      <unstandardized-source-fields>
         <unstandardized-source-field-name>Person.Address[*].Address1    
         </unstandardized-source-field-name>
         <unstandardized-source-field-name>Person.Address[*].Address2
         </unstandardized-source-field-name>
      </unstandardized-source-fields>
      <standardization-targets>
         <target-mapping>
            <standardized-object-field-id>HouseNumber
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].HouseNumber
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>RuralRouteIdentif
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].HouseNumber
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>BoxIdentif
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].HouseNumber
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>MatchStreetName
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetName
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>RuralRouteDescript
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetName
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>BoxDescript
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetName
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>PropDesPrefDirection
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetDir
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>PropDesSufDirection
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetDir
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>StreetNameSufType
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetType
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>StreetNamePrefType
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetType
            </standardized-target-field-name>
         </target-mapping>
      </standardization-targets>
   </group>
</free-form-texts-to-standardize>

Address Phonetic Encoding

When you match or standardize on street address fields, the street name should be specified for phonetic conversion (this is done by default). Follow the instructions under Defining Phonetic Encoding for the Master Index (Repository) in Configuring Sun Master Indexes (Repository) to define fields for phonetic encoding.

A sample of the phoneticize-fields element is shown below. This sample only converts the address street name. You can define additional fields for phonetic encoding.


<phoneticize-fields>
   <phoneticize-field>
      <unphoneticized-source-field-name>Person.Address[*].StreetName
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Person.Address[*].StreetName_Phon
      </phoneticized-target-field-name>
      <encoding-type>NYSIIS</encoding-type>
   </phoneticize-field>
</phoneticize-fields>