Configuring Sun Master Indexes (Repository)

Defining Master Index Fields to be Standardized (Repository)

When you define fields for standardization, you can specify the type of standardization to perform on each field or group of fields, the nationality of the data, and a field that indicates which nationality to use (if you specify more than one). You also specify which fields contain the data that needs to be parsed and normalized, and which fields contain the parsed and normalized data. For each standardization structure, you can specify more than one source field, but they must use the same standardization type. The source fields in one standardization structure are concatenated before being parsed.

A sample standardization structure for the XML file is included at the end of these instructions.

ProcedureTo Define Fields to be Standardized (Configuration Editor)

  1. In the Projects window, right-click the master index application you want to modify, and then click Open.

  2. If the Configuration Editor dialog box appears, click Edit to check out the listed files.

    The Configuration Editor appears.

  3. In the object structure in the left pane, create the fields that will contain the parsed components of the new field to be standardized.

    For more information, see Adding a Field to the Master Index Object Structure (Repository).

  4. Click the Standardization tab.

    The Standardization page appears.

  5. Click Add.

    The Standardization Type dialog box appears.

  6. Enter values for the Type, Domain Selector, and Locale Field Name fields (these are described in Master Index Normalization and Standardization Structure Properties (Repository).

  7. To define a national domain for the standardization engine to use, do the following:

    1. In the Locale Codes section, Click Add.

    2. On the Locale Codes dialog box, enter values in the fields described in Master Index Locale Codes Properties (Repository).

    3. Click OK.

      If you selected the multiple domain selector, you can add multiple national domains; otherwise, you can define one default national domain and one defined national domain.

  8. Under Source Fields to be Standardized, click Add.

    The Select Source Field(s) dialog box appears.

  9. In the left panel, select the field that contains the data that needs to be parsed and normalized, and then click the right arrow.


    Note –

    If the data is contained in more than one field, select all fields that contain the data. For example, a street address might be contained in two fields, such as Street Address and Unit. Both fields should be selected for standardization; they will be concatenated during the standardization process.


  10. If you add a field in error, select the field in the Selected Source Field(s) list, and then click the left arrow.

  11. Click OK.

  12. For each field in which the parsed and normalized data will be stored, do the following:

    1. On the Standardized Fields dialog box, click Add under Target Mappings.

      The Target Mapping dialog box appears.

    2. In the Select Target field, select the name of a field that will contain standardized data.

    3. In the Available Standardization Components list, select the ID associated with the field, and then click Add between the left and right panels.

    4. To change the priority of a component in the Selected Standardization Components list, select the component and then click Move Up or Move Down.

    5. If you add a component in error, select the component in the Selected Standardization Components list, and then click Remove.

    6. Click OK.


      Note –

      For more information about standardization components and the fields to which they pertain, see Understanding the Sun Match Engine.


  13. Click OK on the Standardization Type dialog box.

    The new standardization definition appears in the list.

  14. On the Configuration Editor toolbar, click Save.

ProcedureTo Define Fields to be Standardized (XML Editor)

Before You Begin

In the Object Definition file, create the fields that will contain the parsed components of the field to be standardized. For more information, see Adding a Field to the Master Index Object Structure (Repository).

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click the Match Field file.

    The file opens in the NetBeans XML editor.

  2. Scroll to the free-form-texts-to-standardize element in the StandardizationConfig element.

  3. Create a new group element in the free-form-texts-to-standardize element, and then define the standardization-type and domain-selector attributes (these are described in Master Index Normalization and Standardization Structure Properties (Repository)).

    Make sure the new element falls within the free-form-texts-to-standardize element, but outside any existing group tags.

  4. If you specified the multiple domain selector for the domain-selector attribute, do the following:

    1. In the group element, create a locale-field-name element and a locale-maps element.

    2. Define the elements described in Master Index Locale Codes Properties (Repository)).

  5. To specify the source fields to standardize, do the following:

    1. If it does not currently exist, create an unstandardized-source-fields element in the appropriate group element (each group element can only include one unstandardized-source-fields element).

    2. For each field standardized by the specified standardization type, create and name a new unstandardized-source-field-name element in the new unstandardized-source-fields element.


      Note –

      If more than one source field is defined, the fields are concatenated prior to standardization (with a pipe (|) between them for the Sun Match Engine). If you want the fields to be processed separately, you need to create two standardization structures. Source fields are designated by their ePaths.


  6. To specify the destination fields for the standardized data, do the following:

    1. In the group element for which destination fields need to be defined, create a standardization-targets element after the unstandardized-source-fields element.

    2. In the new element, create a target-mapping element for each destination field, and then define the last two elements described in Master Index Standardization Source and Target Field Elements (Repository).

  7. Save and close the file.


Example 3 Address Standardization Structure


<group standardization-type="Address" domain-selector=
 "com.stc.eindex.matching.impl.SingleDomainSelectorUS">
  <locale-field-name>Person.Address[*].CountryCode
  </locale-field-name>
  <locale-maps>
     <locale-codes>
         <value>GB</value>
         <locale>UK</locale>
      </locale-codes>
      <locale-codes>
         <value>UNST</value>
         <locale>US</locale>
      </locale-codes>
      <locale-codes>
         <value>AU</value>
         <locale>AU</locale>
      </locale-codes>
      <locale-codes>
         <value>Default</value>
         <locale>AU</locale>
      </locale-codes>
   </locale-maps>
   <unstandardized-source-fields>
      <unstandardized-source-field-name>Person.Address[*].AddressLine1
      </unstandardized-source-field-name>
      <unstandardized-source-field-name>Person.Address[*].AddressLine2
      </unstandardized-source-field-name>
   </unstandardized-source-fields>
   <standardization-targets>
      <target-mapping>
         <standardized-object-field-id>HouseNumber
         </standardized-object-field-id>
         <standardized-target-field-name>Person.Address[*].HouseNumber
         </standardized-target-field-name>
      </target-mapping>
      <target-mapping>
         <standardized-object-field-id>MatchStreetName
         </standardized-object-field-id>
         <standardized-target-field-name>Person.Address[*].StreetName
         </standardized-target-field-name>
      </target-mapping>
   </standardization-targets>
</group>