Understanding the Master Index Standardization Engine

Address Standardization and Sun Master Index

Master index applications rely on the Master Index Standardization Engine to process address data. To ensure correct processing of address information, you need to customize the Matching Service for the master index application according to the rules defines for the standardization engine. This includes modifying mefa.xml to define parsing and phonetic encoding of the appropriate fields. You can use the Master Index Configuration Editor to modify mefa.xml.

Standardization is defined in the StandardizationConfig section of mefa.xml, which is described in detail in Match Field Configuration in Understanding Sun Master Index Configuration Options . To configure the required fields for parsing and normalization, modify the standardization structure in mefa.xml. To configure phonetic encoding, modify the phonetic encoding structure. You can perform all of these tasks using the Master Index Configuration Editor.

Generally, the address data type processes data that requires parsing prior to processing. You should not need to configure fields to normalize for addresses. The following topics provide information about the fields used in processing address data and how to configure address data standardization for a master index application. The information provided in these topics is based on the default configuration.

Address Data Processing Fields

When standardizing address data, not all fields in a record need to be processed by the Master Index Standardization Engine. The standardization engine only needs to process address fields that must be parsed, normalized, or phonetically converted. For a master index application, these fields are defined in mefa.xml and processing logic for each field is defined in the Standardization Engine node configuration files.

Address Standardized Fields

The Master Index Standardization Engine expects that street address data will be provided in a free-form text field containing several components that must be parsed. By default, the standardization engine is configured to parse these components and to normalize and phonetically encode the street name. You can specify additional fields for phonetic encoding.

If you specify the Address match type for any field in the wizard, a standardization structure for that field is defined in mefa.xml. The fields listed under Address Object Structure are automatically defined as the target fields. Each of these fields has several entries in the standardization structure. This is because different parsed components can be stored in the same field. For example, the house number, post office box number, and rural route identifier are all stored in the house number field. If you do not specify address fields for matching in the wizard but want to standardize the fields, you can create a standardization structure in mefa.xml using the Master Index Configuration Editor.

Address Object Structure

The address fields specified for standardization are parsed into several additional fields. If you specify the Address match type in the wizard, the following fields are automatically added to the object structure and database creation script.

You can add these fields manually if you do not specify a match type in the wizard.

Configuring a Standardization Structure for Address Data

For free–form address fields, the source fields you define for parsing should include the standardization components that are predefined for parsing and normalization. For example, fields containing address information can include any of the field components listed in Address Data Standardization Components. The target fields can include any of these parsed fields. Follow the instructions under Defining Master Index Standardization Rules in Configuring Sun Master Indexes to define fields for standardization. For the standardization-type element, enter Address. For a list of field IDs to use in the standardized-object-field-id element, see Address Data Standardization Components.


Note –

In the default configuration, the rules defined for the address data type assume that all input fields must be parsed as well as normalized. Thus, there is no need to configure fields only for normalization.


A sample standardization structure for address data is shown below. This structure parses the first two lines of street address into the standard street address fields. Only the United States variant is defined in this structure.


free-form-texts-to-standardize>
   <group standardization-type="ADDRESS"
    domain-selector="com.sun.mdm.index.matching.impl.SingleDomainSelectorUS">
      <unstandardized-source-fields>
         <unstandardized-source-field-name>Person.Address[*].Address1    
         </unstandardized-source-field-name>
         <unstandardized-source-field-name>Person.Address[*].Address2
         </unstandardized-source-field-name>
      </unstandardized-source-fields>
      <standardization-targets>
         <target-mapping>
            <standardized-object-field-id>HouseNumber
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].HouseNumber
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>RuralRouteIdentif
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].HouseNumber
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>BoxIdentif
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].HouseNumber
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>MatchStreetName
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetName
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>RuralRouteDescript
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetName
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>BoxDescript
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetName
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>PropDesPrefDirection
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetDir
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>PropDesSufDirection
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetDir
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>StreetNameSufType
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetType
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>StreetNamePrefType
            </standardized-object-field-id>
            <standardized-target-field-name>Person.Address[*].StreetType
            </standardized-target-field-name>
         </target-mapping>
      </standardization-targets>
   </group>
</free-form-texts-to-standardize>

Configuring Phonetic Encoding for Address Data

When you match or standardize on street address fields, the street name should be specified for phonetic conversion (this is done by default in a master index application). Follow the instructions under Defining Phonetic Encoding for the Master Index in Configuring Sun Master Indexes to define fields for phonetic encoding.

A sample of the phoneticize-fields element is shown below. This sample only converts the address street name. You can define additional fields for phonetic encoding.


<phoneticize-fields>
   <phoneticize-field>
      <unphoneticized-source-field-name>Person.Address[*].StreetName
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Person.Address[*].StreetName_Phon
      </phoneticized-target-field-name>
      <encoding-type>NYSIIS</encoding-type>
   </phoneticize-field>
</phoneticize-fields>