Understanding the Sun Match Engine

Configuring the Standardization Structure for Business Names (Repository)

The standardization structure is configured in the StandardizationConfig section of the Match Field file, which is described in detail in Match Field Configuration (Repository) in Understanding Sun Master Index Configuration Options (Repository). To configure the required fields for standardization and phonetic encoding, modify the standardization and phonetic encoding structures. The following sections provide additional guidelines and samples specific to standardizing business names.


Note –

In the default configuration, the rules defined for the business data type assume that all input fields must be parsed as well as normalized. Thus, there is no need to configure fields only for normalization.


Business Name Standardization Structures

For business name fields, the source fields in the standardization structure must include the fields predefined for parsing and normalization. This includes any fields containing business name information, which are parsed into the business name fields listed in Business Name Object Structure (excluding the phonetic business name field). The target fields can include any of these parsed fields. Follow the instructions under Defining Master Index Standardization Rules (Repository) in Configuring Sun Master Indexes (Repository) to define fields for normalization. For the standardization-type element, enter BusinessName (for more information, see Sun Match Engine Match and Standardization Types). For a list of field IDs to use in the standardized-object-field-id element, see Table 3.

A sample standardization structure for business name data is shown below. This structure parses a business name field into the standard business name fields. Note that there is no domain selector specified, which would normally default to the United States domain; however, since business names are not domain dependent, it is irrelevant here.


<free-form-texts-to-standardize>
   <group standardization-type="BusinessName">
      <unstandardized-source-fields>
         <unstandardized-source-field-name>Company.Name    
         </unstandardized-source-field-name>
      </unstandardized-source-fields>
      <standardization-targets>
         <target-mapping>
            <standardized-object-field-id>PrimaryName
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Name
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>OrgTypekeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_OrgType
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>AssocTypeKeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_AssocType
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>IndustrySectorList
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Sector
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>IndustryTypeKeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Industry
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>AliasList
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Alias
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>Url
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_URL
            </standardized-target-field-name>
         </target-mapping>
      </standardization-targets>
   </group>
</free-form-texts-to-standardize>

Business Name Phonetic Encoding

When you match on business name fields, the name field should be specified for phonetic conversion (by default, the wizard defines this for you). Follow the instructions under Defining Phonetic Encoding for the Master Index (Repository) in Configuring Sun Master Indexes (Repository) to define fields for phonetic encoding.

A sample of the phoneticize-fields element is shown below. This sample only converts the business name. You can define additional fields for phonetic encoding.


<phoneticize-fields>
   <phoneticize-field>
      <unphoneticized-source-field-name>Company.Name_Name
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Company.Name_NamePhon
      </phoneticized-target-field-name>
      <encoding-type>NYSIIS</encoding-type>
   </phoneticize-field>
</phoneticize-fields>