Understanding Sun Master Index Configuration Options

Survivor Strategy Configuration

The Update Manager contains the logic used to generate the single best record (SBR) for a given object. The SBR is defined by a mapping of fields from external systems to the SBR, allowing you to define the fields from each system that are kept in the SBR. For each field in the SBR, an ePath denotes the location in the external system records from which the value is retrieved. Since there can be many external systems, you can optionally specify a strategy to select the SBR field from the list of external values. You can also specify any additional fields that might be required by the selection strategy to determine which external system contains the best data (by default, the record’s update date and time is always taken into account). The Update Manager also specifies any custom Java classes to be used for different types of update transactions, such as merges, unmerges, changes to existing records, and new record inserts.

The Update Manager is configured in update.xml. The following topics describe the Update Manager and update.xml.

The Survivor Calculator and the SBR

The survivor calculator generates and updates the SBR for each record. The SBR for an enterprise object is created from what is considered to be the most reliable information contained in each system record for a particular object. The information used from each local system to populate the SBR is determined by the survivor calculator defined in the Update Manager. The fields defined in the survivor calculator are also the fields contained in the SBR. You can configure the survivor calculator to determine the best fields for the SBR from a combination of all the source system records. The survivor calculator can consider factors such as the relative reliability of a system, how recent the data is, and whether data entered from the MIDM overwrites data entered from any other system.

The survivor calculator consists of the rules defined for the survivor helper and the weighted calculator.


Note –

Phonetic and standardized fields do not need to be defined in update.xml since their field values are determined by the standardization engine for the SBR.


Update Manager Components

The logic that determines how the fields in the SBR are populated and how certain updates are performed is highly configurable in a master index application, allowing you to design and develop the match strategy that best suits your processing requirements.

Configuring the Update Manager consists of customizing the following components:

Survivor Helper

The survivor helper defines a list of fields on which survivor calculation is performed, and thus the list of fields included in the SBR. Each field is called a candidate field. For each candidate field, you specify whether to use the default survivor calculation strategy or a custom strategy. The survivor helper must list each field contained in the SBR; any fields that are not listed here will not be populated in the SBR.

For each field, you can specify system fields to be taken into consideration as well as a specific survivorship strategy. There are three basic strategies provided by Sun Master Index to determine survivorship for each field. You can define and implement custom strategies.

You can further configure the strategy for each field by filtering out unwanted or invalid values from the SBR. For more information, see SBR, Matching, and Blocking Filter Configuration.

Survivor Helper Default Strategy

This strategy maps fields directly from the local system records to the SBR. When you specify the default survivor strategy for a field, you must also specify the parameter that defines the source system. For example, if you specify the default survivor calculator for the field “Person.LastName” and define the preferred system as “SystemA”, the last name field in the SBR is always taken from SystemA (unless the value is overridden in the MIDM).

The default survivor strategy is com.sun.mdm.index.survivor.impl.DefaultSurvivorStrategy.

Survivor Helper Weighted Strategy

This strategy is the most complex survivor strategy, and uses a combination of weighted calculations to determine the most reliable source of data for each field. This strategy is highly customizable and you can define which calculation or set of calculations to use for each field. The calculations can be based on the update date of the data, system reliability, and agreement between systems. In the default configuration of the file, the calculations are defined in the WeightedCalculator section of the file.

The weighted survivor strategy is com.sun.mdm.index.survivor.impl.WeightedSurvivorStrategy. You can define general weighted calculations to be performed by default for each field, and you can define specialized calculations to be performed for specific fields.

Survivor Helper Union Strategy

This strategy combines the data from all source systems to populate the fields in the SBR for which this strategy is specified. For example, if you store aliases for person names in the database, you want to store all possible alias records and not just the “best” alias information. In order to do this, specify the union strategy for the alias object. This means that all alias information from all source systems is stored in the SBR.

The union strategy is applied to entire objects rather than to fields. This strategy combines all child objects from an enterprise objects source systems to populate the SBR. If the source systems contain two or more instances of a child object with the same unique key (such as two home telephone numbers), the union strategy only populates the most current child object in the SBR. For example, if the union strategy is assigned to the address object and each address object is identified by a unique key (such as the address type), the SBR only contains the most current address record of each address type (for example, one home address, one office address, and so on).

The union strategy is com.sun.mdm.index.survivor.impl.UnionSurvivorStrategy.

Weighted Calculator

By default, the weighted calculator implements the weighted strategy defined above. Use the WeightedCalculator section to define conditions and weights that determine the best information with which to populate the SBR. The weighted calculator selects a single value for the SBR from a set of system fields. The selection process is based on the different qualities defined for each field.

The weighted calculator defines two sets of rules. The default rules apply to all fields in a record except those fields for which rules are specifically defined. The candidate rules only apply to those fields for which they are specifically defined. If you modify the default rules, the changes will apply to all fields except the fields for which candidate rules are defined.

You can define several strategies to help the weighted calculator determine the best information to populate into each field of the SBR. Each of these strategies is defined by a quality, a preference, and a utility. The quality defines the type of weighted calculation to perform, the preference indicates the source being rated, and the utility indicates the reliability. You can define multiple strategies for each field, and a linear summation on the utility score of each strategy determines the best value to populate in the SBR field.

The weighted calculator strategies include:

Weighted Calculator SourceSystem Strategy

This strategy indicates the best source system for a field, and is used when the quality of the field in question depends on its origin. For example, to indicate that the data from SystemA for a specific field is of a higher quality than SystemB, define a SourceSystem quality for “SystemA” and one for “SystemB”. Then assign SystemA a higher utility value (85.0, for example) and SystemB a lower utility value (30.0, for example). This indicates that SystemA is a more reliable source for the field. If both SystemA and SystemB contain the specified field, the value from SystemA is populated into the SBR. If the field is empty in SystemA but the field in SystemB contains a value, then the value from SystemB is used.

Weighted Calculator SystemAgreement Strategy

This strategy prorates the utility score based on the number of systems whose values for the specified field are in agreement. For example, if the first name field for SystemA is “John”, for SystemB is “John”, and for SystemC is “Jon”, SystemA and SystemB together receive two-thirds of the utility score, while SystemC only receives one-third. The value populated into the SBR is “John”. You do not need to define a preference for the SystemAgreement strategy, but you must define source systems.

Weighted Calculator MostRecentModified Strategy

This strategy ranks the field values from the source systems in descending order according to the time that the object was last modified. The value populated in the SBR comes from the most recently modified object. You do not need to define a preference for the MostRecentModified strategy, but you must define a utility.

Update Manager Policies

The Update Manager policies specify custom Java classes that provide additional processing logic for each type of update transaction. By default, this additional processing is not defined in a standard master index application. You can define custom update policies by creating the custom classes in the Source Packages node of the EJB project associated with the main master index project. NetBeans also provides the ability to build and compile the custom Java code, and Sun Master Index automatically incorporates the classes when you generate the application. The Java classes defining the update policies are specified for the master index application in the UpdateManagerConfig element of update.xml.

Update Manager Update Policies

There are seven types of update policies defined in the Update Manager.

Update Manager Update Policy Flag

The update policy section includes a flag that can prevent the update policies from being carried out if no changes were made to the existing record. When set to “true”, the SkipUpdateIfNoChange flag prevents the update policies from being performed when no changes are made to an existing record. Setting the flag to true helps increase performance when processing a large number of updates.

The update.xml File

The properties for the update process are defined in update.xml. Some of the information entered into the default configuration file is based on the fields defined in the wizard and some is standard across all implementations. For most implementations, this file will require customization.

The following topics provide information about working with update.xml:

Modifying update.xml

You can customize the configuration of the Update Manager by modifying update.xml. This file cannot be modified using the Configuration Editor; you need to modify the file directly. You can modify this file at any time, but it is not recommended after moving into production. The configuration controls how the SBR for each object is created, and modifying the file can cause discrepancies in how SBRs are formed before and after the modifications. It might also cause discrepancies in match results, since matching is performed against the SBR. You must regenerate the application and redeploy the project after modifying this file. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.

The update.xml File Structure

This topic describes the structure of the XML file, general requirements, and constraints. It also provides a sample implementation.

update.xmlFile Description

Table 12 lists each element in update.xml and provides a description of each element along with any requirements or constraints for each element.

Table 12 update.xml File Structure

Element/Attribute 

Description 

SurvivorHelperConfig

The configuration of the overall survivor strategy. The SurvivorHelperConfig attributes (module-name and parser-class) define the module name and Java class, and their default values should not be changed.

helper-class

The Java class that determines how to retrieve values from system records and to set them in the SBR. The default class uses the ePath notation to retrieve and set the values. 

default-survivor-strategy

The configuration information for the survivor strategy to use as the default. You can define multiple survivor strategies and use different strategy combinations for the candidate fields. Any field that is not assigned a specific survivor strategy in the candidate-definitions list uses the default survivor strategy specified here.

strategy-class

The Java class name of the default strategy. 

parameters

A list of optional parameters for the default survivor strategy. 

parameter

A parameter for the default survivor strategy. The default parameter points to another section in update.xml that configures the default class. There can be multiple parameter elements.

description

An optional element that briefly describes the parameter. 

parameter-name

The name of the parameter. 

  • For the DefaultSurvivorStrategy, this value is preferredSystem.

  • For the WeightedSurvivorStrategy, this value is ConfigurationModuleName.

  • For the UnionSurvivorStrategy. this is not used.

parameter-type

The Java data type for the parameter value. For both the DefaultSurvivorStrategy and the WeightedSurvivorStrategy, this value is java.lang.String.

parameter-value

The value of the named parameter. 

  • For the DefaultSurvivorStrategy, this is the processing code of the source system from which the SBR field value is retrieved.

  • For the WeightedSurvivorStrategy, this is the name of the module-name element that defines the weighted calculator to use as the default strategy (by default, WeightedSurvivorCalculator).

candidate-definitions

The configuration information for the fields to be included in the SBR. For any field that does not use the default survivor strategy, an alternate strategy is defined. All field that are included in the SBR must be listed here. 

candidate-field/name

The qualified field name for a field in the SBR (for more information about field notations, see Master Index Field Notations).

description

A short description of the candidate field (this element is optional). 

system-fields

A field (other than the candidate field) that is evaluated to determine the value for the SBR. One example of this would be to evaluate the last update date of the system records to determine which value is most recent. This element is not currently used by any of the standard survivor strategies provided with the master index application, but might be useful when defining custom strategies. 

field-name

The name of the field to use to determine the value for the SBR. 

survivor-strategy

An alternate survivor strategy to use for the given field in place of the default strategy defined in the default-survivor-strategy element. If a strategy is not specifically defined for a field, the default strategy is used for that field.

strategy-class

The name of the Java class to use for the alternate survivor strategy. 

WeightedCalculator

The configuration of the weighted calculator. By default, this is the strategy specified as the default strategy in the default-survivor-strategy element. The WeightedCalculator attributes (module-name and parser-class) define the module name and Java class, and their default values should not be changed unless you create a custom class.

default-parameters

The configuration information for the default weighted calculator logic for all fields except those whose logic is defined in the candidate-field element below.

parameter

A parameter for the default logic of the weighted calculator. 

quality

The type of weighted calculation to perform. You can specify any of the following: 

  • SourceSystem

  • SystemAgreement

  • MostRecentModified

For more information about these qualities, see Weighted Calculator.

preference

The preferred value for the specified quality. This element is only used for the SourceSystem quality and must be a source system code.

utility

A value that indicates the reliability of the specified quality for determining the best field value for the SBR. You define the scale for the utility values. 

candidate-field

A field for which you want to use custom logic for the weighted calculator. The logic you specify here overrides the logic defined in the default-parameters section, but only for the fields specified. Each candidate field is identified by a name attribute and defines the survivor strategies for one field.

candidate-field/name

The name of the candidate field for which you want to define override logic. 

parameter

A parameter configuring the weighted calculator for the candidate field. You can define multiple parameters for each candidate field. 

quality

The type of weighted calculation to perform. You can specify any of the following: 

  • SourceSystem

  • SystemAgreement

  • MostRecentModified

    For more information about these qualities, see Weighted Calculator.

preference

The preferred value for the specified quality. This element is only used for the SourceSystem quality and the preference must be a source system code.

utility

A value that indicates the reliability of the specified quality for determining the best field value for the SBR. You define the scale for the utility values. 

UpdateManagerConfig

The configuration information for the Update Manager. This section defines a list of Java classes to manage custom processing for different types of transactions. You can create the custom classes in the Source Packages folder of the EJB project and then specify those classes here. The UpdateManagerConfig attributes (module-name and parser-class) define the module name and Java class, and their default values should not be changed.

EnterpriseMergePolicy

A class that defines additional processing to perform when two enterprise objects are merged. 

EnterpriseUnmergePolicy

A class that defines additional processing to perform when two enterprise objects are unmerged. 

EnterpriseUpdatePolicy

A class that defines additional processing to perform when a record is updated. 

EnterpriseCreatePolicy

A class that defines additional processing to perform when a new record is created. 

SystemMergePolicy

A class that defines additional processing to perform when two system objects are merged. 

SystemUnmergePolicy

A class that defines additional processing to perform when system objects are unmerged. 

UndoAssumeMatchPolicy

A class that defines additional processing to perform when an assumed match transaction is reversed. 

SkipUpdateIfNoChange

An indicator of whether the update policies are carried out if no changes are made to the existing record. Specify “true” to prevent the update policies from being performed when no changes are made to an existing record. 

update.xml Example

Below is a sample of update.xml using a very small object structure based on person data. Note that standardized and phonetic fields are included in the candidate fields to ensure that they are also included in the SBR. In this sample, all fields use the default strategy except those included in the Alias object, which uses the union strategy. The value that is populated in the LastName field of the SBR is dependent on the SSN field of the system objects. In addition, custom logic is defined only for the SSN field; the remaining fields use the default logic defined in the default-parameters element.


<SurvivorHelperConfig module-name="SurvivorHelper" 
 parser-class="com.sun.mdm.index.configurator.impl.SurvivorHelperConfig">
   <helper-class>com.sun.mdm.index.survivor.impl.DefaultSurvivorHelper
   </helper-class>
   <default-survivor-strategy>
      <strategy-class>
       com.sun.mdm.index.survivor.impl.WeightedSurvivorStrategy
      </strategy-class>
      <parameters>
         <parameter>
            <parameter-name>ConfigurationModuleName</parameter-name>
            <parameter-type>java.lang.String</parameter-type>
            <parameter-value>WeightedSurvivorCalculator
            </parameter-value>
         </parameter>
      </parameters>
   </default-survivor-strategy>
   <candidate-definitions>
      <candidate-field name="Person.LastName">
         <system-fields>
            <field-name>Person.SSN</field-name>
         </system-fields>
      </candidate-field>
      <candidate-field name="Person.FirstName"/>
      <candidate-field name="Person.MiddleName"/>
      <candidate-field name="Person.DOB"/>
      <candidate-field name="Person.Gender"/>
      <candidate-field name="Person.SSN"/>
      <candidate-field name="Person.FnamePhoneticCode"/>
      <candidate-field name="Person.LnamePhoneticCode"/>
      <candidate-field name="Person.StdFirstName"/>
      <candidate-field name="Person.StdLastName"/>
      <candidate-field name="Person.Alias[*].*">
         <survivor-strategy>
            <strategy-class>
             com.sun.mdm.index.survivor.impl.UnionSurvivorStrategy
            </strategy-class>
         </survivor-strategy>
      </candidate-field>
   </candidate-definitions>
</SurvivorHelperConfig>
<WeightedCalculator module-name="WeightedSurvivorCalculator"
 parser-class="com.sun.mdm.index.configurator.impl.WeightedCalculatorConfig">
   <candidate-field name="Person.SSN">
      <parameter>
         <quality>SourceSystem</quality>
         <preference>SBYN</preference>
         <utility>100.0</utility>
      </parameter>
      <parameter>
         <quality>MostRecentModified</quality>
         <utility>75.0</utility>
      </parameter>
   </candidate-field>
   <default-parameters>
      <parameter>
         <quality>MostRecentModified</quality>
         <utility>80.0</utility>
      </parameter>
      <parameter>
         <quality>SourceSystem</quality>
         <preference>SBYN</preference>
         <utility>100.0</utility>
      </parameter>
   </default-parameters>
</WeightedCalculator>
<UpdateManagerConfig module-name="UpdateManager" 
 parser-class="com.sun.mdm.index.configurator.impl.UpdateManagerConfig">
   <EnterpriseMergePolicy>com.sun.mdm.index.user.CustomMergePolicy
   </EnterpriseMergePolicy>
   <EnterpriseUnmergePolicy>com.sun.mdm.index.user.CustomUnmergePolicy
   </EnterpriseUnmergePolicy>
   <EnterpriseUpdatePolicy>com.sun.mdm.index.user.CustomUpdatePolicy
   </EnterpriseUpdatePolicy>
   <EnterpriseCreatePolicy>com.sun.mdm.index.user.CustomCreatePolicy
   </EnterpriseCreatePolicy>
   <SystemMergePolicy>com.sun.mdm.index.user.CustomSystemMergePolicy
   </SystemMergePolicy>
   <SystemUnmergePolicy>com.sun.mdm.index.user.CustomSystemUnmergePolicy
   </SystemUnmergePolicy>
   <UndoAssumeMatchPolicy>com.sun.mdm.index.user.CustomUndoMatchPolicy
   </UndoAssumeMatchPolicy>
   <SkipUpdateIfNoChange>true</SkipUpdateIfNoChange>
</UpdateManagerConfig>

Weighted Calculator Logic

The following sample illustrates how the weighted calculator uses the parameters you define to determine which field values to use in the SBR. Using this sample, if there is a value in only one of the system records but not in the other, that value is used in the SBR regardless of update date. If there is a value in both system records and they were updated at the same time, the SAP field value is used (80.0>30.0). If there is a value in both system records, but CDW was the most recently modified, the value from CDW is populated into the SBR ((30.0+70.0)>80.0)


<default-parameters>
   <parameter>
      <quality>SourceSystem</quality>
      <preference>SAP</preference>
      <utility>80.0</utility>
   </parameter>
   <parameter>
      <quality>MostRecentModified</quality>
      <utility>70.0</utility>
   </parameter>
   <parameter>
      <quality>SourceSystem</quality>
      <preference>CDW</preference>
      <utility>30.0</utility>
   </parameter>
</default-parameters>