Configuring Sun Master Indexes (Repository)

Defining Master Index Normalization Rules (Repository)

Normalization is a part of the standardization process, and is the process of changing non-standard values to a common, standard value. For example, the first name a person uses might not be their given name, but might be a nickname instead. To ensure that a proper match is made between first names, nicknames are normalized based on a configurable list. For example, the common value for “Liz” and “Elizabeth” would be “Elizabeth”.

Normalization is defined in the Match Field file. You can define normalization by either using the Configuration Editor or modifying the XML file directly. The changes you make on the Normalization page of the Configuration Editor are reflected in the normalization structures of the Match Field file. The Configuration Editor provides a simplified way of defining normalization.

Perform any of the following tasks to define normalization:

Defining a Master Index Field to be Normalized (Repository)

When you define a field for normalization, you define which field contains the data that needs to be normalized and which field will contain the normalized data. You can also specify one or more national domains to use for normalization. A sample normalization structure for the XML file appears at the end of these instructions.

ProcedureTo Define a Field to be Normalized (Configuration Editor)

  1. In the Projects window, right-click the master index application you want to modify, and then click Open.

  2. If the Configuration Editor dialog box appears, click Edit to check out the listed files.

    The Configuration Editor appears.

  3. In the object structure in the left pane, add the field that will contain the normalized value.

    For more information, see Adding a Field to the Master Index Object Structure (Repository).

  4. Click the Normalization tab.

    The Normalization page appears.

  5. Click Add.

    The Normalized Field dialog box appears.

  6. Enter or select a value for each of the fields described in Master Index Normalization and Standardization Structure Properties (Repository).

  7. To specify a national domain for the type of data being standardized, do the following:

    1. In the Locale Field Name field, select the field whose value in incoming records will indicate which variant to use.

    2. In the Locale Codes section, click Add.

    3. On the dialog box that appears, enter values in the fields described in Master Index Locale Codes Properties (Repository).

    4. Click OK.

      If you selected the multiple domain selector, you can add multiple national domains; otherwise, you can add one default national domain and one field-defined national domain.

  8. On the Normalized Field dialog box, click OK.

    The new normalization definition appears in the list.

  9. On the Configuration Editor toolbar, click Save.

ProcedureTo Define a Field to be Normalized (XML Editor)

Before You Begin

In the Object Definition file, create the field that will contain the new normalized value. For more information, see Adding a Field to the Master Index Object Structure (Repository).

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click the Match Field file.

    The file opens in the NetBeans XML editor.

  2. In the structures-to-normalize element, create and name a new group element.

    Make sure the new element falls within the structures-to-normalize element, but outside any existing group tags.

  3. In the new group element, define the standardization-type and domain-selector attributes (these are described in Master Index Normalization and Standardization Structure Properties (Repository).

  4. If you specified the multiple domain selector for the domain-selector attribute, do the following:

    1. In the group element, create a locale-field-name element and a locale-maps element (described in Master Index Normalization and Standardization Structure Properties (Repository) and Master Index Locale Codes Properties (Repository)).

    2. For each variant you want to use, define a locale-codes, value, and locale element in the locale-maps element (described in Master Index Locale Codes Properties (Repository)).

  5. To specify the source fields to normalize, do the following:

    1. Create a new unnormalized-source-fields element in the group element.

    2. Create a source-mapping element in the new unnormalized-source-fields element.

    3. Define the unnormalized-source-field-name and standardized-object-field-id elements (these are described inMaster Index Normalization and Standardization Structure Properties (Repository)).

  6. To map the normalized data to destination fields, do the following:

    1. Create a new normalization-targets element under the unnormalized-source-fields element that defines the field to map.

    2. Create a target-mapping element in the new normalization-targets element.

    3. Define the standardized-object-field-id and standardized-target-field-name elements (these are described in Master Index Normalization and Standardization Structure Properties (Repository)).

  7. Save and close the file.


Example 2 First and Last Name Normalization


<structures-to-normalize>
         <group standardization-type="PersonName" domain-selector=
          "com.stc.eindex.matching.impl.MultiDomainSelector">
            <locale-field-name>Person.PobCountry</locale-field-name>
            <locale-maps>
              <locale-codes>
                <value>GB</value>
                <locale>UK</locale>
              </locale-codes>
              <locale-codes>
                <value>UNST</value>
                <locale>US</locale>
              </locale-codes>
              <locale-codes>
                <value>Default</value>
                <locale>US</locale>
              </locale-codes>
            </locale-maps>
            <unnormalized-source-fields>
               <source-mapping>
                  <unnormalized-source-field-name>
                   Person.Alias[*].FirstName
                  </unnormalized-source-field-name>
                  <standardized-object-field-id>FirstName
                  </standardized-object-field-id>
               </source-mapping>
               <source-mapping>
                  <unnormalized-source-field-name>
                   Person.Alias[*].LastName
                  </unnormalized-source-field-name>
                  <standardized-object-field-id>LastName
                  </standardized-object-field-id>
               </source-mapping>
            </unnormalized-source-fields>
            <normalization-targets>
               <target-mapping>
                  <standardized-object-field-id>FirstName
                  </standardized-object-field-id>
                  <standardized-target-field-name>
                     Person.Alias[*].StdFirstName
                  </standardized-target-field-name>
               </target-mapping>
               <target-mapping>
                  <standardized-object-field-id>LastName
                  </standardized-object-field-id>
                  <standardized-target-field-name>
                     Person.Alias[*].StdLastName
                  </standardized-target-field-name>
               </target-mapping>
            </normalization-targets>
         </group>

Master Index Normalization and Standardization Structure Properties (Repository)

The following table lists and describes the Configuration Editor fields and their corresponding XML elements that define the fields to be normalized or standardized in the master index application.

You can specify one or more national domains for data to be standardized. For a single national domain, you only need to specify the national domain if you need to standardize data that is not from the United States. If you are standardizing data from multiple countries, use the multiple domain selector. This requires that one field in the object structure identify which national domain to use for each field that will be standardized. For example, the value of the Country field in a system record could be used to tell the standardization engine which national domain to use for a particular set of data. If you specified the multiple domain selector in the domain-selector element, you must also define the identifying field and then map the values that can be populated into that field to their corresponding national domain.

The following rules apply to the multiple domain selector:

For more information about the fields and elements described in the following table, see Understanding the Sun Match Engine.

Configuration Editor Field

XML File Element or Attribute 

Description 

Type

standardization-type 

The type of standardization to perform on the source fields. This is specific to the type of data being processed. 

Domain Selector 

domain-selector 

The Java class used by the standardization engine to determine the national domain of the data being processed. For the Sun Match Engine, the following classes can be specified. If no selector is specified, the default is US. The Sun Match Engine supports Australian, French, United Kingdom, and United States national domains. Possible values for this field are: 

  • com.stc.eindex.matching.impl. SingleDomainSelectorAU

  • com.stc.eindex.matching.impl. SingleDomainSelectorFR

  • com.stc.eindex.matching.impl. SingleDomainSelectorUK

  • com.stc.eindex.matching.impl. SingleDomainSelectorUS

  • com.stc.eindex.matching.impl. MultipleDomainSelector

Locale Field Name 

locale-field-name 

The ePath to an identifying field in the object structure that identifies which of the defined national domains (element locale-codes) to use. If no field is specified for the Sun Match Engine, the standardization engine defaults to the United States, regardless of whether any national domains are defined. This field must be contained in the object that contains the fields defined for normalization in this structure.

Unnormalized Source

unnormalized-source- field-name 

The field that contains the data to be normalized. The field is designated by its ePath (for example, Person.FirstName). 

Unnormalized Standardization Component

standardized-object- field-id 

An identification code that identifies the field to normalize to the standardization engine. This ID is specific to the standardization engine and must correspond to a standardization component defined by that engine. 

Normalized Standardization Component

standardized-object- field-id 

An identification code that identifies the field that contains the normalized data to the standardization engine. This is specific to the standardization engine in use and must correspond to a standardization component defined by that engine. 

Normalized Target

standardized-target- field-name 

The field that will store the normalized data. The field is designated by its ePath (for example, Person.Alias[*].StdLastName). 

Master Index Locale Codes Properties (Repository)

The following table lists and describes the Configuration Editor fields and XML elements that define a national domain for normalization or standardization. In the XML file, each value and locale pair are defined within a locale_codes element. A list of locale_codes elements can be defined in the locale_maps element.

Configuration Editor Field 

XML File Element or Attribute 

Description 

Value 

value 

A value that indicates to the standardization engine which national domain to use to standardize the data. When the value is contained in the Locale Field Name field (or the locale-field-name element), the standardization engine uses the corresponding Locale field (or locale element) to determine the national domain. To specify a default national domain, enter “Default”.

Locale

locale 

A code indicating which national domain to use to standardize data when the identifying field value in a transaction matches the corresponding Value field or element. Select one of the following codes. 

  • AU - Australia

  • FR - France

  • UK - United Kingdom

  • US - United States

Modifying a Master Index Normalization Definition (Repository)

Once you create a normalization definition, you can modify it as needed. Use caution when modifying normalization definitions once a system is in production. This can cause inconsistent match results.

ProcedureTo Modify a Normalization Definition (Configuration Editor)

  1. In the Projects window, right-click the master index application you want to modify, and then click Open.

  2. If the Configuration Editor dialog box appears, click Edit to check out the listed files.

    The Configuration Editor appears.

  3. Click the Normalization tab.

    The Normalization page appears.

  4. In the Normalization Mappings list, click the definition you want to modify.

  5. Click Edit.

  6. Do any of the following:

    • Modify any of the fields described in Master Index Normalization and Standardization Structure Properties (Repository).

    • To modify a national domain, select the national domain under Locale Codes, and then click Edit. Modify either field on the dialog box that appears.

    • To remove a national domain, select the national domain under Locale Codes, and then click Remove. Click Yes on the dialog box that appears.

    • Click OK.

  7. On the Configuration Editor toolbar, click Save.

ProcedureTo Modify a Normalization Structure (XML Editor)

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click the Match Field file.

    The file opens in the NetBeans XML editor.

  2. Scroll to the structures-to-normalize element in the StandardizationConfig element.

  3. To modify the normalization type, change the value of the standardization-type attribute.

  4. To change the national domain, change the value of the domain-selector element as described in Master Index Normalization and Standardization Structure Properties (Repository).

  5. To modify an existing source field, scroll to the unnormalized-source-fields element in the appropriate group element, and then change the value of any source field elements (these are described in Master Index Normalization and Standardization Structure Properties (Repository)).

  6. To modify an existing destination field, scroll to the normalization-targets element in the appropriate group element, and then change the value of any target field elements (these are described in Master Index Normalization and Standardization Structure Properties (Repository)).

  7. Save and close the file.

Deleting a Master Index Normalization Definition (Repository)

If a defined normalization structure is not needed, you can delete the normalization structure from the standardization configuration. If no data requires normalization, you can delete all normalization structures. It is not recommend that you delete a normalization definition once a system is in production. This can cause inconsistent match results.

ProcedureTo Delete a Normalization Definition

  1. In the Projects window, right-click the master index application you want to modify, and then click Open.

  2. If the Configuration Editor dialog box appears, click Edit to check out the listed files.

    The Configuration Editor appears.

  3. In the Configuration Editor toolbar, click the Normalization tab.

    The Normalization page appears.

  4. In the Normalization Mappings list, click the definition you want to delete.

  5. Click Remove.

  6. On the Configuration Editor toolbar, click Save.

ProcedureTo Delete a Normalization Structure

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click the Match Field file.

    The file opens in the NetBeans XML editor.

  2. Scroll to the structures-to-normalize element in the StandardizationConfig element.

  3. Do either of the following:

    • To delete an existing normalization structure, delete all text between and including the group element that defines the structure.

    • To specify that no objects require normalization, delete all text between, but not including, the structures-to-normalize element.

  4. Save and close the file.