JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Java CAPS Master Index Configuration Reference     Java CAPS Documentation
search filter icon
search icon

Document Information

Oracle Java CAPS Master Index Configuration Reference

Related Topics

About Oracle Java CAPS Master Index

Oracle Java CAPS Master Index Configuration

Features of Oracle Java CAPS Master Index

Configuration Overview for Oracle Java CAPS Master Index

About the Configuration Files for Oracle Java CAPS Master Index

Master Index object.xml File

Master Index query.xml File

Master Index mefa.xml File

Master Index master.xml File

Master Index update.xml File

Master Index filter.xml

Master Index validation.xml File

Master Index security.xml File

Master Index edm.xml File

Match and Standardization Engine Configuration Files

Using the Editors for Oracle Java CAPS Master Index

XML Editors

Master Index Configuration Editor

The object.xml File

query.xml

master.xml

mefa.xml

update.xml

update.xml

edm.xml

Match Configuration File

Master Index Object Definition Configuration

Master Index Object Definition Components

Master Index Object Definition Objects

Master Index Object Definition Fields

Master Index Object Definition Relationships

The Master Index object.xml File

Modifying the Master Index Object Definition

The object.xml File Structure

object.xml File Description

object.xml Example

Query Configuration

Query Builder Components

Basic Queries in a Master Index

Blocking Queries in a Master Index

Blocking Query Block Processing

Blocking Query for Matching

Phonetic Queries in a Master Index

Range Searching

The query.xml File

Modifying query.xml

The query.xml File Structure

query.xml File Description

query.xml Example

Range Search Processing

Basic Query Range Searching

Blocking Query Range Searching

Blocking Query Offset Values

Blocking Query Constants

Blocking Query Offset and Constant Combinations

Manager Service Configuration

Manager Service Components

Master Controller Configuration

Custom Logic Classes in master.xml

Update Mode in master.xml

Merged Record Updates in master.xml

Blocking Query in master.xml

Transactional Support

Decision Maker

OneExactMatch

SameSystemMatch

DuplicateThreshold

MatchThreshold

EUID Generator

IdLength

ChecksumLength

ChunkSize

The master.xml File

Modifying master.xml

The master.xml File Structure

master.xml File Description

master.xml Example

Match Field Configuration

Matching Service Components

Standardization Configuration

Data Reformatting

Data Normalization

Phonetic Encoding

Matching Configuration

MEFA Configuration

Match and Standardization Engines

Block Picker and Pass Controller

Phonetic Encoders

Sample Standardization and Matching Sequence

The mefa.xml File

Modifying mefa.xml

The mefa.xml File Structure

mefa.xml Description

mefa.xml Example

Survivor Strategy Configuration

The Survivor Calculator and the SBR

Update Manager Components

Survivor Helper

Survivor Helper Default Strategy

Survivor Helper Weighted Strategy

Survivor Helper Union Strategy

Weighted Calculator

Weighted Calculator SourceSystem Strategy

Weighted Calculator SystemAgreement Strategy

Weighted Calculator MostRecentModified Strategy

Update Manager Policies

Update Manager Update Policies

Update Manager Update Policy Flag

The update.xml File

Modifying update.xml

The update.xml File Structure

update.xmlFile Description

update.xml Example

Weighted Calculator Logic

SBR, Matching, and Blocking Filter Configuration

Master Index Field Filters

SBR Filters

Blocking Query Filters

Match String Filters

Exclusion Lists

The filter.xml File

Modifying filter.xml

filter.xml File Structure

filter.xml Example

Field Validation Configuration

The validation.xml File

Modifying validation.xml

validation.xml File Structure

update.xml Example

Master Index Data Manager Configuration

About the MIDM

MIDM Configuration Components

Object and Field Properties

Relationship Properties

Display Properties

Page Display Properties

Audit Log

Local ID Labels

Search Page Configuration

Implementation Configuration

The midm.xml File Structure

Modifying midm.xml

midm.xml File Description

midm.xml File Example

Master Index Field Notations

ePath Notation

ePath Syntax

ePath Notation Example

Qualified Field Name Notation

Qualified Field Name Syntax

Qualified Field Name Example

Simple Field Name Notation

Simple Field Notation Syntax

Simple Field Notation Example

Manager Service Configuration

In master.xml, you define certain system parameters for the Manager Service, such as matching thresholds, EUID properties, and the blocking query to use for match processing. The Manager Service is the main interface of the indexing system. This interface coordinates all components of the master index application, including the database, master index project, Master Index Data Manager, runtime environment, and match engine. The main interface is a stateless session bean, though some methods return objects that have handles to stateful beans.

The following topics describe the Manager Service and master.xml.

Manager Service Components

In master.xml, you define certain properties of the match process, such as duplicate and match thresholds, the query to use for matching, logic for automatic merges, and properties of the EUIDs assigned by the master index application (such as their length and whether a checksum value is used). This file is also used to define the update mode (optimistic or pessimistic) and merged record updates.

The following Manager Service components are configured by master.xml:

Master Controller Configuration

The MasterControllerConfig element of master.xml controls four components of the matching and update process.

Custom Logic Classes in master.xml

Custom logic classes specify any custom plug-ins created for the master index project that define custom processing for the execute match methods. If no classes are specified, execute match processing is carried out using the default logic (this is described in Oracle Java CAPS Master Index Processing Reference).

Update Mode in master.xml

The update mode specifies whether a record’s potential duplicate list is reevaluated when key fields are updated in the record. Performing the reevaluation helps keep the potential duplicate list current, but requires more system resources.

There are two update modes.

Merged Record Updates in master.xml

The merge update status determines whether changes can be made to records that have a status of “merged”. These are the EUID records that are not retained after a merge. For example, when an incoming record is an assumed match with an SBR that has a status of “merged”, the master index application checks the value of the merged-record-update element. If the element is set to “Enabled”, the merged SBR is updated with the new information. If the element is set to “Disabled”, an exception is thrown and the update is not performed. Typically, it is recommended that merged records not be updated.

Blocking Query in master.xml

The blocking query, specified by the query-builder element, identifies one of the queries defined in query.xml as the query to use for match processing. This query is used by the master index application when searching for a candidate pool of possible matches to an incoming record. If the query takes any parameters, they are defined using the option element.

Transactional Support

Oracle Java CAPS Master Index supports local and distributed transaction processing. You can configure the master index application to distribute transactions across applications, to distribute transactions only within the master index application, or to not use distributed transactions at all. This is defined in the transaction element.

Decision Maker

The DecisionMakerConfig element of master.xml allows you to specify how the Manager Service evaluates query results. For the default Decision Maker, you can configure these parameters:

When the master index application processes an incoming record, it compares the new record against existing records in the database and assigns a matching weight between possible matches with the incoming record. The master index application uses the values that you specify in this section to determine how to handle records that fall within certain matching weight ranges. Records with a matching weight above the duplicate threshold are treated as potential duplicates; records with a matching weight above the match threshold are treated as potential duplicates or assumed matches, depending on the value of the OneExactMatch parameter and the number of records with a matching weight above the match threshold.

OneExactMatch

This parameter specifies logic for assumed matches. If OneExactMatch is set to true and there is more than one record above the match threshold, then none of the records are considered an assumed match and all are flagged as potential duplicates. If OneExactMatch is set to false and there is more than one record above the match threshold, then the record with the highest matching weight is considered an assumed match and the rest are flagged as potential duplicates.

SameSystemMatch

This parameter indicates whether the master index application will match two records that originated from the same system whose matching weight falls above the match threshold. If SameSystemMatch is set to true, no assumed matches are made between records associated with the same system. If SameSystemMatch is set to false, assumed matches can be made between records associated with the same system.

DuplicateThreshold

The duplicate threshold specifies the matching probability weight at or above which two records are considered to potentially represent the same object. Records with matching weights between the duplicate and match thresholds are always flagged as potential duplicates. A thorough data analysis combined with testing will help determine the best value for the duplicate and match thresholds.

MatchThreshold

The match threshold specifies the matching probability weight at or above which two records are assumed to be a match and are automatically merged in the master index database.

EUID Generator

The EUID generator controls how EUIDs are created for each unique record in the master index database. For the default EUID generator, you can define three parameters.

IdLength

This parameter defines the length of the EUIDs created by the master index application. By default, the length of the EUID columns in the master index database is 20. If you choose an ID length larger than 20, make sure to manually modify the length of the EUID columns in the database creation scripts.

ChecksumLength

The ChecksumLength parameter allows you to specify the length of a checksum value. Checksum values help validate EUIDs to ensure accurate identification of records as they are transmitted throughout the system. The checksum process attaches a number, generated through an algorithm, to the end of a new EUID. When a host system receives this number, it strips off the checksum digits to obtain the EUID, and then recalculates the checksum using the same algorithm process. If the checksum values agree, the host system knows the EUID number is correct. Specify “0” (zero) if you do not want to use the checksum function.

Using a checksum value affects the IdLength parameter. If you specify a checksum length greater than 0, the EUID generator creates sequential EUIDs based on the sbyn_seq_table table, and then appends the checksum value to the end of the EUID to determine the final EUID number. For example, if you set IdLength to 8 and CheckSum to 2, then the EUIDs assigned by the master index application will be 10 characters long. If the next sequence number is 10908000, the EUID assigned to the next record is 10908000 plus the checksum (it might be 1090800034, for example). The next EUID would be 10908001 plus the checksum (1090800125, for example). The first eight digits are sequential, but the last two digits are seemingly arbitrary.

If you use a checksum value, make sure to take into consideration the total length of the EUIDs (IdLength plus ChecksumLength) when determining the length of the EUID columns in the database.

ChunkSize

For efficiency, the default EUID generator does not need to query the sbyn_seq_table table in the database each time a new EUID is created. Instead, you can specify a number of EUIDs to be allocated in chunks to the EUID generator. For example, if you specify a chunk size of 1000, EUIDs are allocated to the generator 1000 ID numbers at a time. The generator can process up to 1000 new records and assign all 1000 numbers without needing to query sbyn_seq_table. When all 1000 EUIDs are used, another 1000 are allocated. If the server running the master index application is reset before all 1000 numbers are used, the unused numbers are discarded and never used, meaning that EUIDs might not always be assigned sequentially.

Specifying a chunk size affects the numbering of the EUID column in the sbyn_seq_table. If you specify a chunk size of 1, then each time a new EUID is assigned, the value of the EUID column increases by one. If you specify a larger chunk size, then the value of the EUID column increases by the value of the chunk size each time the allocated EUIDs are used. For example, if you specify a chunk size of 1000, the beginning EUID sequence number is 1000, even though EUIDs are assigned beginning with 0001, then 0002, and so on. When the first 1000 EUIDs are assigned, another 1000 EUID numbers are allocated to the generator and the EUID column changes from 1000 to 2000.

The master.xml File

The properties of the Manager Service are defined in master.xml. The information entered into the default configuration file is standard across all implementations, so the file will require some customization.

The following topics provide information about working with master.xml:

Modifying master.xml

You can modify master.xml at any time, but you must regenerate the application and redeploy the project after making any changes to the file. Use caution when updating this file after moving into production, since changing certain properties, such as the blocking query, can cause unexpected matching and weighting results. Most of the configuration options in this file cannot be modified using the Configuration Editor. The exceptions are the match and duplicate thresholds. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.

The master.xml File Structure

This topic describes the structure of the XML file, general requirements, and constraints. It also provides a sample implementation.

master.xml File Description

Table 10 lists each element in master.xml and provides a description of each element along with any requirements or constraints for each element.

Table 10 master.xml File Structure

Element/Attribute
Description
MasterControllerConfig
The configuration class for the Manager Service. The attributes define the module name and Java class. The default values should not be changed.
logic-class
A custom plug-in that defines custom processing logic for the execute match functions that can be called from client applications. This element is optional.
logic-class-gui
A custom plug-in that defines custom processing logic for the execute match function that is called from the Master Index Data Manager (MIDM). This element is optional.
update-mode
An indicator of whether to recalculate potential duplicates when a record is updated. Specify Pessimistic to recalculate potential duplicates; specify Optimistic to prevent potential duplicate recalculation on updates.
merged-record-update
An indicator of whether records with a status of Merged can be updated. Specify Enabled to allow updates of merged records; specify Disabled to ensure that records with a Merged status are not updated.
execute-match
Specifies the blocking query to use for match processing.
query-builder
The name of the blocking query to use for match processing. The name must match a query defined in query.xml.
option
Optional parameters for the blocking query. Currently parameters are not used by any predefined blocking queries.
option/key
A parameter for the blocking query.
option/value
The value of the key specified by the corresponding key attribute.
transaction
The transaction mode for the master index application. Specify one of the following values:
  • LOCAL – Transactions are not distributed.

  • CONTAINER – Transactions are distributed across applications.

  • BEAN – Transactions are distributed within the master index application.

DecisionMakerConfig
The configuration class for the Decision Maker. The attributes define the module name and Java class. The default values should not be changed.
decision-maker-class
The Java class that contains the methods used by the Decision Maker class. The default value, com.sun.mdm.index.decision.impl.DefaultDecisionMaker, should not need to be changed, but you can implement a custom Decision Maker class. The default class accepts the parameters described below.
parameters
A list of parameters for the Decision Maker class.
parameter
A definition of a Decision Maker parameter. The parameters element can contain multiple parameter elements, each defining one parameter.
description
A brief description of the parameter. This element is optional.
parameter-name
The name of the parameter. The default Decision Maker class takes the following parameters (see Decision Makerfor more information about these parameters).
  • OneExactMatch - A Boolean indicator of whether an assumed match is made when there are more than one record above the match threshold.

  • SameSystemMatch - A Boolean indicator of whether an assumed match can be made between two records that originate from the same external system.

  • DuplicateThreshold - The lowest match weight at which two records are considered to be potential duplicates.

  • MatchThreshold - The lowest match weight at which two records are assumed to be a match of one another.

parameter-type
The type of parameter. Valid values are java.lang.Long, java.lang.Short, java.lang.Byte, java.lang.String, java.lang.Integer, java.lang.Boolean, java.lang.Double, or java.lang.Float.
parameter-value
The value of the parameter. For OneExactMatch and SameSystemMatch, this must be a Boolean value. For MatchThreshold and DuplicateThreshold, this must be a Float value.
EuidGeneratorConfig
The configuration class for the EUID Generator. The attributes define the module name and Java class. The default values should not be changed.
euid-generator-class
The Java class used by the master index application to generate new EUIDs. The default class is com.sun.mdm.index.idgen.impl.DefaultEuidGenerator, which assigns sequential EUIDs based on the three parameters described below.
parameters
A list of parameters for the EUID Generator class.
parameter
A parameter definition. The parameters element can contain multiple parameter elements, each defining one parameter.
description
A brief description of the parameter. This element is optional.
parameter-name
The name of the parameter. The default EUID Generator class takes the following parameters (see EUID Generator for more information about these parameters).
  • IdLength - The length of the EUIDs generated by the master index application.

  • CheckSum - The length of the checksum value used to validate EUIDs.

  • ChunkSize - The number of EUIDs allocated to the server at one time.

parameter-type
The type of parameter. Valid values are java.lang.Long, java.lang.Short, java.lang.Byte, java.lang.String, java.lang.Integer, java.lang.Boolean, java.lang.Double, or java.lang.Float.
parameter-value
The value of the parameter. For the default parameters, the values are all integers.
master.xml Example

Below is a sample of master.xml configuration.

<MasterControllerConfig module-name="MasterController" parser-class=
 "com.sun.mdm.index.configurator.impl.master.MasterControllerConfiguration">
   <logic-class>CustomMatchLogic</logic-class>
   <logic-class-gui>CustomMatchLogicMIDM</logic-class-gui>
   <update-mode>Pessimistic</update-mode>
   <merged-record-update>Disabled</merged-record-update>
   <execute-match>
      <query-builder name="BLOCKER-SEARCH"></query-builder>
   </execute-match>
</MasterControllerConfig>
<DecisionMakerConfig module-name="DecisionMaker" parser-class=
 "com.sun.mdm.index.configurator.impl.decision.DecisionMakerConfiguration">
   <decision-maker-class>
      com.sun.mdm.index.decision.impl.DefaultDecisionMaker
   </decision-maker-class>
   <parameters>
      <parameter>
         <parameter-name>OneExactMatch</parameter-name>
         <parameter-type>java.lang.Boolean</parameter-type>
         <parameter-value>false</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>SameSystemMatch</parameter-name>
         <parameter-type>java.lang.Boolean</parameter-type>
         <parameter-value>true</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>DuplicateThreshold</parameter-name>
         <parameter-type>java.lang.Float</parameter-type>
         <parameter-value>7.25</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>MatchThreshold</parameter-name>
         <parameter-type>java.lang.Float</parameter-type>
         <parameter-value>29.0</parameter-value>
      </parameter>            
   </parameters>
</DecisionMakerConfig>
<EuidGeneratorConfig module-name="EuidGenerator" parser-class=
"com.sun.mdm.index.configurator.impl.idgen.EuidGeneratorConfiguration">
   <euid-generator-class>
      com.sun.mdm.index.idgen.impl.DefaultEuidGenerator
   </euid-generator-class>
   <parameters>
      <parameter>
         <parameter-name>IdLength</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>10</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>ChecksumLength</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>0</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>ChunkSize</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>1000</parameter-value>
      </parameter>
   </parameters>
</EuidGeneratorConfig>