Configuring Sun Master Indexes

Defining Master Index Standardization Rules

If any of the fields against which searching or matching is performed are entered in free-form text format, those fields must be standardized before being sent to the standardization engine. The process of standardization includes reformatting, or parsing, the input data field and then normalizing some of the parsed data to a standard value. For example, street addresses can be parsed into the house number, street name, street type, and so on. The street name and type can then be normalized to their commonly used values. “Ave” might be normalized to “Avenue”, “St.” to “Street”, and so on.

Standardization is defined in mefa.xml. You can define standardization by either using the Configuration Editor or modifying the XML file directly. The changes you make on the Standardization page of the Configuration Editor are reflected in the standardization structures of mefa.xml. The Configuration Editor provides a simplified way of defining standardization.

Perform any of the following tasks to define standardization:

Defining Master Index Fields to be Standardized

When you define fields for standardization, you can specify the type of standardization to perform on each field or group of fields, the nationality of the data, and a field that indicates which nationality to use (if you specify more than one). You also specify which fields contain the data that needs to be parsed and normalized, and which fields contain the parsed and normalized data. For each standardization structure, you can specify more than one source field, but they must use the same standardization type. The source fields in one standardization structure are concatenated before being parsed.

A sample standardization structure for the XML file is included at the end of these instructions.

ProcedureTo Define Fields to be Standardized (Configuration Editor)

  1. In the Projects window, right-click the Configuration node in the project you want to modify, and then click Edit.

    The Configuration Editor appears.

  2. In the object structure in the left pane, create the fields that will contain the parsed components of the new field to be standardized.

    For more information, see Adding a Field to the Master Index Object Structure.

  3. Click the Standardization tab.

    The Standardization page appears.

  4. Click Add.

    The Standardization Type dialog box appears.

  5. Enter values for the Data Type, Domain Selector, and Variant Field Name fields (these are described in Master Index Normalization and Standardization Structure Properties.

  6. To define a variant for the standardization engine to use, do the following:

    1. In the Variants section, Click Add.

    2. On the Variant dialog box, enter values in the fields described in Master Index Variants Properties.

    3. Click OK.

      If you selected the multiple domain selector, you can add multiple variants; otherwise, you can define one default variant and one defined variant.

  7. Under Source Fields to be Standardized, click Add.

    The Select Source Field(s) dialog box appears.

  8. In the left panel, select the field that contains the data that needs to be parsed and normalized, and then click the right arrow.


    Note –

    If the data is contained in more than one field, select all fields that contain the data. For example, a street address might be contained in two fields, such as Street Address and Unit. Both fields should be selected for standardization; they will be concatenated during the standardization process.


  9. If you add a field in error, select the field in the Selected Source Field(s) list, and then click the left arrow.

  10. Click OK.

  11. For each field in which the parsed and normalized data will be stored, do the following:

    1. On the Standardized Fields dialog box, click Add under Target Mappings.

      The Target Mapping dialog box appears.

    2. In the Select Target field, select the name of a field that will contain standardized data.

    3. In the Available Standardization Components list, select the ID associated with the field, and then click Add between the left and right panels.

    4. To change the priority of a component in the Selected Standardization Components list, select the component and then click Move Up or Move Down.

    5. If you add a component in error, select the component in the Selected Standardization Components list, and then click Remove.

    6. Click OK.


      Note –

      For more information about standardization components and the fields to which they pertain, see Understanding the Master Index Standardization Engine.


  12. Click OK on the Standardization Type dialog box.

    The new standardization definition appears in the list.

  13. On the Configuration Editor toolbar, click Save.

ProcedureTo Define Fields to be Standardized (XML Editor)

Before You Begin

In object.xml, create the fields that will contain the parsed components of the field to be standardized. For more information, see Adding a Field to the Master Index Object Structure.

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click mefa.xml.

    The file opens in the NetBeans XML editor.

  2. Scroll to the free-form-texts-to-standardize element in the StandardizationConfig element.

  3. Create a new group element in the free-form-texts-to-standardize element, and then define the standardization-type and domain-selector attributes (these are described in Master Index Normalization and Standardization Structure Properties).

    Make sure the new element falls within the free-form-texts-to-standardize element, but outside any existing group tags.

  4. If you specified the multiple domain selector for the domain-selector attribute, do the following:

    1. In the group element, create a locale-field-name element and a locale-maps element.

    2. Define the elements described in Master Index Variants Properties).

  5. To specify the source fields to standardize, do the following:

    1. If it does not currently exist, create an unstandardized-source-fields element in the appropriate group element (each group element can only include one unstandardized-source-fields element).

    2. For each field standardized by the specified standardization type, create and name a new unstandardized-source-field-name element in the new unstandardized-source-fields element.


      Note –

      If more than one source field is defined, the fields are concatenated prior to standardization (with a pipe (|) between them for the Master Index Standardization Engine). If you want the fields to be processed separately, you need to create two standardization structures. Source fields are designated by their ePaths.


  6. To specify the destination fields for the standardized data, do the following:

    1. In the group element for which destination fields need to be defined, create a standardization-targets element after the unstandardized-source-fields element.

    2. In the new element, create a target-mapping element for each destination field, and then define the last two elements described in Master Index Standardization Source and Target Field Elements.

  7. Save and close the file.


Example 3 Address Standardization Structure


<group standardization-type="Address" domain-selector=
 "com.sun.mdm.index.matching.impl.SingleDomainSelectorUS">
  <locale-field-name>Person.Address[*].CountryCode
  </locale-field-name>
  <locale-maps>
     <locale-codes>
         <value>GB</value>
         <locale>UK</locale>
      </locale-codes>
      <locale-codes>
         <value>UNST</value>
         <locale>US</locale>
      </locale-codes>
      <locale-codes>
         <value>AU</value>
         <locale>AU</locale>
      </locale-codes>
      <locale-codes>
         <value>Default</value>
         <locale>AU</locale>
      </locale-codes>
   </locale-maps>
   <unstandardized-source-fields>
      <unstandardized-source-field-name>Person.Address[*].AddressLine1
      </unstandardized-source-field-name>
      <unstandardized-source-field-name>Person.Address[*].AddressLine2
      </unstandardized-source-field-name>
   </unstandardized-source-fields>
   <standardization-targets>
      <target-mapping>
         <standardized-object-field-id>HouseNumber
         </standardized-object-field-id>
         <standardized-target-field-name>Person.Address[*].HouseNumber
         </standardized-target-field-name>
      </target-mapping>
      <target-mapping>
         <standardized-object-field-id>MatchStreetName
         </standardized-object-field-id>
         <standardized-target-field-name>Person.Address[*].StreetName
         </standardized-target-field-name>
      </target-mapping>
   </standardization-targets>
</group>



Master Index Standardization Source and Target Field Elements

The following table lists and describes the XML elements that define the source and target fields for standardization. The data from the source fields is standardized, and the standardized values are stored in the target fields.

XML File Element or Attribute 

Description 

unstandardized-source-field-name 

The field or fields that contain the data to be standardized. The field is designated by its ePath (for example, Person.FirstName). 

standardized-object-field-id 

An code that identifies the standardized component from the source field to store in the target field. This is specific to the standardization engine in use and must correspond to a standardization component defined by that engine. For more information, see Understanding the Master Index Standardization Engine.

standardized-target-field-name 

The field that stores the standardized data. You can have multiple target fields, depending on how much of the standardized data you want to store. The fields are designated by their ePaths (for example, Person.Alias[*].StdLastName). 

Modifying a Master Index Standardization Definition

You can modify an existing standardization definition. Use caution when modifying standardization after a system is in production because it can cause inconsistent matching results.

ProcedureTo Modify a Standardization Definition (Configuration Editor)

  1. In the Projects window, right-click the Configuration node in the project you want to modify, and then click Edit.

    The Configuration Editor appears.

  2. Click the Standardization tab.

    The Standardization page appears.

  3. In the Standardization Types list, select the definition you want to modify, and then click Edit.

    The Standardization Type dialog box appears.

  4. Do any of the following:

    • Modify any of the fields or perform any of the functions described in Defining Master Index Fields to be Standardized.

    • To modify a variant, select the code under Variants, and then click Edit. Modify either field on the dialog that appears.

    • To remove a variant, select the code under Variants, and then click Remove. Click Yes on the dialog box that appears.

    • To remove a source field, select the field under Source fields to be standardized, and then click Remove. Click Yes on the dialog box that appears.


      Note –

      There must be at least one field in this list.


    • To edit a target field, select the field in the Specifying Target Mappings list and then click Edit.


      Note –

      You can select new components, move selected components up and down in priority, and remove components.


    • To delete a target field, select the field in the Specifying Target Mappings list, and then click Remove. Click Yes on the dialog box that appears.

  5. When you are done making changes, click OK on the Standardization Type dialog box.

  6. On the Configuration Editor toolbar, click Save.

ProcedureTo Modify a Standardization Definition (XML Editor)

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click mefa.xml.

    The file opens in the NetBeans XML editor.

  2. Scroll to the structures-to-normalize element in the StandardizationConfig element.

  3. To modify the standardization type, change the value of the standardization-type attribute.

  4. To change the variant, change the value of the domain-selector element (described in Master Index Normalization and Standardization Structure Properties).

  5. To modify an existing source field, scroll to the appropriate group element, and then change the value of the unstandardized-source-field-name element to the ePath of the new field.

  6. To modify an existing destination field, scroll to the target-mapping element in the standardization-targets section, and then change the value of either target mapping element (these are the last two elements described in Master Index Standardization Source and Target Field Elements).

  7. To remove an existing source field, delete all text between and including the unstandardized-source-field-name element that defines the field.


    Note –

    If no fields require standardization in a defined standardization structure, delete the entire structure as described in Deleting a Master Index Standardization Definition.


  8. To remove an existing destination field, delete all text between and including the target-mapping tags that define the field.


    Note –

    Each standardization structure must have at least one destination field defined for standardized data. If a structure does not contain any fields that need to be standardized, you can delete the entire structure, as described in Deleting a Master Index Standardization Definition.


  9. Save and close the file.

Deleting a Master Index Standardization Definition

You can delete an existing standardization definition. It is not recommended that a standardization definition be deleted after a system is in production since this can cause inconsistent matching results.

ProcedureTo Delete a Standardization Definition (Configuration Editor)

  1. In the Projects window, right-click the Configuration node in the project you want to modify, and then click Edit.

    The Configuration Editor appears.

  2. Click the Standardization tab.

    The Standardization page appears.

  3. In the Standardization Types list, select the definition you want to delete.

  4. Click Remove.

  5. Click Yes on the dialog box that appears.

  6. On the Configuration Editor toolbar, click Save.

ProcedureTo Delete a Standardization Definition (Configuration Editor)

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click mefa.xml.

    The file opens in the NetBeans XML editor.

  2. Scroll to the free-form-texts-to-standardize element in the StandardizationConfig element.

  3. Do either of the following:

    • To delete an existing standardization structure, delete all text between and including the group element that defines the structure.

      Using the example below, to delete the Address object, delete all boldface text.


      <free-form-texts-to-standardize>
         <group standardization-type="BusinessName" domain-selector=
          "com.sun.mdm.index.matching.impl.SingleDomainSelectorUS">
            ...
         </group>
         <group standardization-type="Address" domain-selector=
          "com.sun.mdm.index.matching.impl.SingleDomainSelectorUS">
            ...
         </group>
      </free-form-texts-to-standardize>
    • To specify that no fields require standardization, delete all text between, but not including, the free-form-texts-to-standardize element.

      This deletes all standardization structures.

  4. Save and close the file.