Understanding Sun Master Index Configuration Options

Manager Service Configuration

In master.xml, you define certain system parameters for the Manager Service, such as matching thresholds, EUID properties, and the blocking query to use for match processing. The Manager Service is the main interface of the indexing system. This interface coordinates all components of the master index application, including the database, master index project, Master Index Data Manager, runtime environment, and match engine. The main interface is a stateless session bean, though some methods return objects that have handles to stateful beans.

The following topics describe the Manager Service and master.xml.

Manager Service Components

In master.xml, you define certain properties of the match process, such as duplicate and match thresholds, the query to use for matching, logic for automatic merges, and properties of the EUIDs assigned by the master index application (such as their length and whether a checksum value is used). This file is also used to define the update mode (optimistic or pessimistic) and merged record updates.

The following Manager Service components are configured by master.xml:

Master Controller Configuration

The MasterControllerConfig element of master.xml controls four components of the matching and update process.

Custom Logic Classes in master.xml

Custom logic classes specify any custom plug-ins created for the master index project that define custom processing for the execute match methods. If no classes are specified, execute match processing is carried out using the default logic (this is described in Understanding Sun Master Index Processing ).

Update Mode in master.xml

The update mode specifies whether a record’s potential duplicate list is reevaluated when key fields are updated in the record. Performing the reevaluation helps keep the potential duplicate list current, but requires more system resources.

There are two update modes.

Merged Record Updates in master.xml

The merge update status determines whether changes can be made to records that have a status of “merged”. These are the EUID records that are not retained after a merge. For example, when an incoming record is an assumed match with an SBR that has a status of “merged”, the master index application checks the value of the merged-record-update element. If the element is set to “Enabled”, the merged SBR is updated with the new information. If the element is set to “Disabled”, an exception is thrown and the update is not performed. Typically, it is recommended that merged records not be updated.

Blocking Query in master.xml

The blocking query, specified by the query-builder element, identifies one of the queries defined in query.xml as the query to use for match processing. This query is used by the master index application when searching for a candidate pool of possible matches to an incoming record. If the query takes any parameters, they are defined using the option element.

Transactional Support

Sun Master Index supports local and distributed transaction processing. You can configure the master index application to distribute transactions across applications, to distribute transactions only within the master index application, or to not use distributed transactions at all. This is defined in the transaction element.

Decision Maker

The DecisionMakerConfig element of master.xml allows you to specify how the Manager Service evaluates query results. For the default Decision Maker, you can configure these parameters:

When the master index application processes an incoming record, it compares the new record against existing records in the database and assigns a matching weight between possible matches with the incoming record. The master index application uses the values that you specify in this section to determine how to handle records that fall within certain matching weight ranges. Records with a matching weight above the duplicate threshold are treated as potential duplicates; records with a matching weight above the match threshold are treated as potential duplicates or assumed matches, depending on the value of the OneExactMatch parameter and the number of records with a matching weight above the match threshold.

OneExactMatch

This parameter specifies logic for assumed matches. If OneExactMatch is set to true and there is more than one record above the match threshold, then none of the records are considered an assumed match and all are flagged as potential duplicates. If OneExactMatch is set to false and there is more than one record above the match threshold, then the record with the highest matching weight is considered an assumed match and the rest are flagged as potential duplicates.

SameSystemMatch

This parameter indicates whether the master index application will match two records that originated from the same system whose matching weight falls above the match threshold. If SameSystemMatch is set to true, no assumed matches are made between records associated with the same system. If SameSystemMatch is set to false, assumed matches can be made between records associated with the same system.

DuplicateThreshold

The duplicate threshold specifies the matching probability weight at or above which two records are considered to potentially represent the same object. Records with matching weights between the duplicate and match thresholds are always flagged as potential duplicates. A thorough data analysis combined with testing will help determine the best value for the duplicate and match thresholds.

MatchThreshold

The match threshold specifies the matching probability weight at or above which two records are assumed to be a match and are automatically merged in the master index database.

EUID Generator

The EUID generator controls how EUIDs are created for each unique record in the master index database. For the default EUID generator, you can define three parameters.

IdLength

This parameter defines the length of the EUIDs created by the master index application. By default, the length of the EUID columns in the master index database is 20. If you choose an ID length larger than 20, make sure to manually modify the length of the EUID columns in the database creation scripts.

ChecksumLength

The ChecksumLength parameter allows you to specify the length of a checksum value. Checksum values help validate EUIDs to ensure accurate identification of records as they are transmitted throughout the system. The checksum process attaches a number, generated through an algorithm, to the end of a new EUID. When a host system receives this number, it strips off the checksum digits to obtain the EUID, and then recalculates the checksum using the same algorithm process. If the checksum values agree, the host system knows the EUID number is correct. Specify “0” (zero) if you do not want to use the checksum function.

Using a checksum value affects the IdLength parameter. If you specify a checksum length greater than 0, the EUID generator creates sequential EUIDs based on the sbyn_seq_table table, and then appends the checksum value to the end of the EUID to determine the final EUID number. For example, if you set IdLength to 8 and CheckSum to 2, then the EUIDs assigned by the master index application will be 10 characters long. If the next sequence number is 10908000, the EUID assigned to the next record is 10908000 plus the checksum (it might be 1090800034, for example). The next EUID would be 10908001 plus the checksum (1090800125, for example). The first eight digits are sequential, but the last two digits are seemingly arbitrary.

If you use a checksum value, make sure to take into consideration the total length of the EUIDs (IdLength plus ChecksumLength) when determining the length of the EUID columns in the database.

ChunkSize

For efficiency, the default EUID generator does not need to query the sbyn_seq_table table in the database each time a new EUID is created. Instead, you can specify a number of EUIDs to be allocated in chunks to the EUID generator. For example, if you specify a chunk size of 1000, EUIDs are allocated to the generator 1000 ID numbers at a time. The generator can process up to 1000 new records and assign all 1000 numbers without needing to query sbyn_seq_table. When all 1000 EUIDs are used, another 1000 are allocated. If the server running the master index application is reset before all 1000 numbers are used, the unused numbers are discarded and never used, meaning that EUIDs might not always be assigned sequentially.

Specifying a chunk size affects the numbering of the EUID column in the sbyn_seq_table. If you specify a chunk size of 1, then each time a new EUID is assigned, the value of the EUID column increases by one. If you specify a larger chunk size, then the value of the EUID column increases by the value of the chunk size each time the allocated EUIDs are used. For example, if you specify a chunk size of 1000, the beginning EUID sequence number is 1000, even though EUIDs are assigned beginning with 0001, then 0002, and so on. When the first 1000 EUIDs are assigned, another 1000 EUID numbers are allocated to the generator and the EUID column changes from 1000 to 2000.

The master.xml File

The properties of the Manager Service are defined in master.xml. The information entered into the default configuration file is standard across all implementations, so the file will require some customization.

The following topics provide information about working with master.xml:

Modifying master.xml

You can modify master.xml at any time, but you must regenerate the application and redeploy the project after making any changes to the file. Use caution when updating this file after moving into production, since changing certain properties, such as the blocking query, can cause unexpected matching and weighting results. Most of the configuration options in this file cannot be modified using the Configuration Editor. The exceptions are the match and duplicate thresholds. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.

The master.xml File Structure

This topic describes the structure of the XML file, general requirements, and constraints. It also provides a sample implementation.

master.xml File Description

Table 10 lists each element in master.xml and provides a description of each element along with any requirements or constraints for each element.

Table 10 master.xml File Structure

Element/Attribute 

Description 

MasterControllerConfig

The configuration class for the Manager Service. The attributes define the module name and Java class. The default values should not be changed. 

logic-class

A custom plug-in that defines custom processing logic for the execute match functions that can be called from client applications. This element is optional. 

logic-class-gui

A custom plug-in that defines custom processing logic for the execute match function that is called from the Master Index Data Manager (MIDM). This element is optional. 

update-mode

An indicator of whether to recalculate potential duplicates when a record is updated. Specify Pessimistic to recalculate potential duplicates; specify Optimistic to prevent potential duplicate recalculation on updates.

merged-record-update

An indicator of whether records with a status of Merged can be updated. Specify Enabled to allow updates of merged records; specify Disabled to ensure that records with a Merged status are not updated.

execute-match

Specifies the blocking query to use for match processing. 

query-builder

The name of the blocking query to use for match processing. The name must match a query defined in query.xml. 

option

Optional parameters for the blocking query. Currently parameters are not used by any predefined blocking queries. 

option/key

A parameter for the blocking query. 

option/value

The value of the key specified by the corresponding key attribute.

transaction

The transaction mode for the master index application. Specify one of the following values: 

  • LOCAL – Transactions are not distributed.

  • CONTAINER – Transactions are distributed across applications.

  • BEAN – Transactions are distributed within the master index application.

DecisionMakerConfig

The configuration class for the Decision Maker. The attributes define the module name and Java class. The default values should not be changed. 

decision-maker-class

The Java class that contains the methods used by the Decision Maker class. The default value, com.sun.mdm.index.decision.impl.DefaultDecisionMaker, should not need to be changed, but you can implement a custom Decision Maker class. The default class accepts the parameters described below.

parameters

A list of parameters for the Decision Maker class. 

parameter

A definition of a Decision Maker parameter. The parameters element can contain multiple parameter elements, each defining one parameter.

description

A brief description of the parameter. This element is optional. 

parameter-name

The name of the parameter. The default Decision Maker class takes the following parameters (see Decision Makerfor more information about these parameters).

  • OneExactMatch - A Boolean indicator of whether an assumed match is made when there are more than one record above the match threshold.

  • SameSystemMatch - A Boolean indicator of whether an assumed match can be made between two records that originate from the same external system.

  • DuplicateThreshold - The lowest match weight at which two records are considered to be potential duplicates.

  • MatchThreshold - The lowest match weight at which two records are assumed to be a match of one another.

parameter-type

The type of parameter. Valid values are java.lang.Long, java.lang.Short, java.lang.Byte, java.lang.String, java.lang.Integer, java.lang.Boolean, java.lang.Double, or java.lang.Float.

parameter-value

The value of the parameter. For OneExactMatch and SameSystemMatch, this must be a Boolean value. For MatchThreshold and DuplicateThreshold, this must be a Float value.

EuidGeneratorConfig

The configuration class for the EUID Generator. The attributes define the module name and Java class. The default values should not be changed. 

euid-generator-class

The Java class used by the master index application to generate new EUIDs. The default class is com.sun.mdm.index.idgen.impl.DefaultEuidGenerator, which assigns sequential EUIDs based on the three parameters described below.

parameters

A list of parameters for the EUID Generator class. 

parameter

A parameter definition. The parameters element can contain multiple parameter elements, each defining one parameter.

description

A brief description of the parameter. This element is optional. 

parameter-name

The name of the parameter. The default EUID Generator class takes the following parameters (see EUID Generator for more information about these parameters).

  • IdLength - The length of the EUIDs generated by the master index application.

  • CheckSum - The length of the checksum value used to validate EUIDs.

  • ChunkSize - The number of EUIDs allocated to the server at one time.

parameter-type

The type of parameter. Valid values are java.lang.Long, java.lang.Short, java.lang.Byte, java.lang.String, java.lang.Integer, java.lang.Boolean, java.lang.Double, or java.lang.Float.

parameter-value

The value of the parameter. For the default parameters, the values are all integers. 

master.xml Example

Below is a sample of master.xml configuration.


<MasterControllerConfig module-name="MasterController" parser-class=
 "com.sun.mdm.index.configurator.impl.master.MasterControllerConfiguration">
   <logic-class>CustomMatchLogic</logic-class>
   <logic-class-gui>CustomMatchLogicMIDM</logic-class-gui>
   <update-mode>Pessimistic</update-mode>
   <merged-record-update>Disabled</merged-record-update>
   <execute-match>
      <query-builder name="BLOCKER-SEARCH"></query-builder>
   </execute-match>
</MasterControllerConfig>
<DecisionMakerConfig module-name="DecisionMaker" parser-class=
 "com.sun.mdm.index.configurator.impl.decision.DecisionMakerConfiguration">
   <decision-maker-class>
      com.sun.mdm.index.decision.impl.DefaultDecisionMaker
   </decision-maker-class>
   <parameters>
      <parameter>
         <parameter-name>OneExactMatch</parameter-name>
         <parameter-type>java.lang.Boolean</parameter-type>
         <parameter-value>false</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>SameSystemMatch</parameter-name>
         <parameter-type>java.lang.Boolean</parameter-type>
         <parameter-value>true</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>DuplicateThreshold</parameter-name>
         <parameter-type>java.lang.Float</parameter-type>
         <parameter-value>7.25</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>MatchThreshold</parameter-name>
         <parameter-type>java.lang.Float</parameter-type>
         <parameter-value>29.0</parameter-value>
      </parameter>            
   </parameters>
</DecisionMakerConfig>
<EuidGeneratorConfig module-name="EuidGenerator" parser-class=
"com.sun.mdm.index.configurator.impl.idgen.EuidGeneratorConfiguration">
   <euid-generator-class>
      com.sun.mdm.index.idgen.impl.DefaultEuidGenerator
   </euid-generator-class>
   <parameters>
      <parameter>
         <parameter-name>IdLength</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>10</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>ChecksumLength</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>0</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>ChunkSize</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>1000</parameter-value>
      </parameter>
   </parameters>
</EuidGeneratorConfig>