Understanding the Sun Match Engine

Configuring the Matching Service for Business Names (Repository)

To ensure correct processing of business names, you must customize the Matching Service. This includes modifying the Match Field file to support the fields on which you want to match, to standardize the appropriate fields, and to specify the Sun Match Engine as the match and standardization engine (by default, the Sun Match Engine is already specified so this does not need to be changed). Perform the following tasks to configure the Matching Service.

When configuring the Matching Service, keep in mind the information presented in Configuring the Master Index Matching Service (Repository).

Configuring the Standardization Structure for Business Names (Repository)

The standardization structure is configured in the StandardizationConfig section of the Match Field file, which is described in detail in Match Field Configuration (Repository) in Understanding Sun Master Index Configuration Options (Repository). To configure the required fields for standardization and phonetic encoding, modify the standardization and phonetic encoding structures. The following sections provide additional guidelines and samples specific to standardizing business names.


Note –

In the default configuration, the rules defined for the business data type assume that all input fields must be parsed as well as normalized. Thus, there is no need to configure fields only for normalization.


Business Name Standardization Structures

For business name fields, the source fields in the standardization structure must include the fields predefined for parsing and normalization. This includes any fields containing business name information, which are parsed into the business name fields listed in Business Name Object Structure (excluding the phonetic business name field). The target fields can include any of these parsed fields. Follow the instructions under Defining Master Index Standardization Rules (Repository) in Configuring Sun Master Indexes (Repository) to define fields for normalization. For the standardization-type element, enter BusinessName (for more information, see Sun Match Engine Match and Standardization Types). For a list of field IDs to use in the standardized-object-field-id element, see Table 3.

A sample standardization structure for business name data is shown below. This structure parses a business name field into the standard business name fields. Note that there is no domain selector specified, which would normally default to the United States domain; however, since business names are not domain dependent, it is irrelevant here.


<free-form-texts-to-standardize>
   <group standardization-type="BusinessName">
      <unstandardized-source-fields>
         <unstandardized-source-field-name>Company.Name    
         </unstandardized-source-field-name>
      </unstandardized-source-fields>
      <standardization-targets>
         <target-mapping>
            <standardized-object-field-id>PrimaryName
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Name
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>OrgTypekeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_OrgType
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>AssocTypeKeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_AssocType
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>IndustrySectorList
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Sector
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>IndustryTypeKeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Industry
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>AliasList
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Alias
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>Url
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_URL
            </standardized-target-field-name>
         </target-mapping>
      </standardization-targets>
   </group>
</free-form-texts-to-standardize>

Business Name Phonetic Encoding

When you match on business name fields, the name field should be specified for phonetic conversion (by default, the wizard defines this for you). Follow the instructions under Defining Phonetic Encoding for the Master Index (Repository) in Configuring Sun Master Indexes (Repository) to define fields for phonetic encoding.

A sample of the phoneticize-fields element is shown below. This sample only converts the business name. You can define additional fields for phonetic encoding.


<phoneticize-fields>
   <phoneticize-field>
      <unphoneticized-source-field-name>Company.Name_Name
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Company.Name_NamePhon
      </phoneticized-target-field-name>
      <encoding-type>NYSIIS</encoding-type>
   </phoneticize-field>
</phoneticize-fields>

Configuring the Match String for Business Names (Repository)

For matching on business name fields, make sure the match string you specify in the MatchingConfig section of the Match Field file contains all or a subset of the fields that contain the standardized data (the unparsed business names are typically too inconsistent for matching). You can include additional fields for matching if required.

To configure the match string, follow the instructions under Defining the Master Index Match String (Repository) in Configuring Sun Master Indexes (Repository). For the Sun Match Engine, each data type has a different match type (specified by the match-type element). The PrimaryName, OrgTypeKeyword, AssocTypeKeyword, IndustrySectorList, IndustryTypeKeyword, and Url match types are specific to business name matching. You can specify any of the other match types defined in the match configuration file, as well. For more information, see Sun Match Engine Match and Standardization Types.

A sample match string for business name matching is shown below. This sample matches on the company name, the organization type, and the sector.


<match-system-object>
   <object-name>Company/object-name>
   <match-columns>
      <match-column>
         <column-name>Enterprise.SystemSBR.Company.Name_PrimaryName
         </column-name>
         <match-type>PrimaryName</match-type>
      </match-column>
      <match-column>
         <column-name>Enterprise.SystemSBR.Company.Name_OrgType
         </column-name>
         <match-type>OrgTypeKeyword</match-type>
      </match-column>
      <match-column>
         <column-name>Enterprise.SystemSBR.Company.Name_Sector
         </column-name>
         <match-type>IndustryTypeKeyword</match-type>
      </match-column>
   </match-columns>
</match-system-object>