JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Understanding Oracle Java CAPS Master Index Configuration Options (Repository)     Java CAPS Documentation
search filter icon
search icon

Document Information

Understanding Oracle Java CAPS Master Index Configuration Options (Repository)

Related Topics

About Oracle Java CAPS Master Index (Repository)

Oracle Java CAPS Master Index Configuration

Features of Oracle Java CAPS Master Index

Configuration Overview for Oracle Java CAPS Master Index (Repository)

About the Configuration Files for Oracle Java CAPS Master Index (Repository)

Master Index Object Definition File

Master Index Candidate Select File

Master Index Match Field File

Master Index Threshold File

Master Index Best Record File

Master Index Field Validation File

Master Index Security File

Master Index Enterprise Data Manager File

Match and Standardization Engine Configuration Files

Using the Editors for Oracle Java CAPS Master Index (Repository)

XML Editors

Configuration Editor - Repository

Master Index Object Definition Configuration (Repository)

Master Index Object Definition Components (Repository)

Master Index Object Definition Objects

Master Index Object Definition Fields

Master Index Object Definition Relationships

The Master Index Object Definition File (Repository)

Modifying the Master Index Object Definition

Object Definition File Description

Object Definition File Example

Candidate Select Configuration (Repository)

Query Builder Components (Repository)

Basic Queries in a Master Index (Repository)

Blocking Queries in a Master Index (Repository)

Phonetic Queries in a Master Index (Repository)

Range Searching (Repository)

The Candidate Select File (Repository)

Modifying the Candidate Select File

Candidate Select File Description

Candidate Select Example

Range Search Processing (Repository)

Basic Query Range Searching

Blocking Query Range Searching

Blocking Query Offset Values

Blocking Query Constants

Blocking Query Offset and Constant Combinations

Threshold Configuration (Repository)

Manager Service Components (Repository)

Custom Logic Classes

Update Mode

Merged Record Updates

Blocking Query

Decision Maker

EUID Generator

The Threshold File (Repository)

Modifying the Threshold File

Threshold File Description

Threshold File Example

Match Field Configuration (Repository)

Matching Service Components (Repository)

Standardization Configuration

Matching Configuration

Match and Standardization Engines

Block Picker and Pass Controller

Phonetic Encoders

Sample Standardization and Matching Sequence (Repository)

The Match Field File (Repository)

Modifying the Match Field File

Match Field File Description

Match Field File Example

Best Record Configuration (Repository)

The Survivor Calculator and the SBR (Repository)

Update Manager Components (Repository)

Survivor Helper

Survivor Helper Default Strategy

Survivor Helper Weighted Strategy

Survivor Helper Union Strategy

Weighted Calculator

Weighted Calculator SourceSystem Strategy

Weighted Calculator SystemAgreement Strategy

Weighted Calculator MostRecentModified Strategy

Update Manager Policies

Update Manager Update Policies

Update Manager Update Policy Flag

The Best Record File (Repository)

Modifying the Best Record File

Best Record File Description

Best Record File Example

Weighted Calculator Logic

Field Validation Configuration (Repository)

The Field Validation File (Repository)

Modifying the Field Validation File

Field Validation File Structure

Field Validation File Example

Enterprise Data Manager Configuration

About the EDM

EDM Configuration Components

Object and Field Properties

Relationships

Page Configurations

Audit Log

Local ID Labels

Search Page Configuration

Implementation Configuration

The Enterprise Data Manager File Structure

Modifying the Enterprise Data Manager File

Enterprise Data Manager File Description

Enterprise Data Manager File Example

Master Index Field Notations

ePath Notation

ePath Syntax

ePath Notation Example

Qualified Field Name Notation

Qualified Field Name Syntax

Qualified Field Name Example

Simple Field Name Notation

Simple Field Notation Syntax

Simple Field Notation Example

Threshold Configuration (Repository)

In the Threshold file, you define certain system parameters for the Manager Service, such as matching thresholds, EUID properties, and the blocking query to use for match processing. The Manager Service is the main interface of the indexing system. This interface coordinates all components of the master index application, including the database, master index project, Enterprise Data Manager, runtime environment, and match engine. The main interface is a stateless session bean, though some methods return objects that have handles to stateful beans.

The following topics describe the Manager Service and the Threshold file.

Manager Service Components (Repository)

In the Threshold file, you define certain properties of the match process, such as duplicate and match thresholds, the query to use for matching, logic for automatic merges, and properties of the EUIDs assigned by the master index application (such as their length and whether a checksum value is used). This file is also used to define the update mode (optimistic or pessimistic) and merged record updates.

The following topics describe the configurable components in the Threshold file:

Custom Logic Classes

Custom logic classes specify any custom plug-ins created for the master index project that define custom processing for the execute match methods. If no classes are specified, execute match processing is carried out using the default logic (this is described in Understanding Oracle Java CAPS Master Index Processing (Repository)).

Update Mode

The update mode specifies whether a record’s potential duplicate list is reevaluated when key fields are updated in the record. Performing the reevaluation helps keep the potential duplicate list current, but requires more system resources.

There are two update modes.

Merged Record Updates

The merge update status determines whether changes can be made to records that have a status of “merged”. These are the EUID records that are not retained after a merge. For example, when an incoming record is an assumed match with an SBR that has a status of “merged”, the master index application checks the value of the merged-record-update element. If the element is set to “Enabled”, the merged SBR is updated with the new information. If the element is set to “Disabled”, an exception is thrown and the update is not performed. Typically, it is recommended that merged records not be updated.

Blocking Query

The blocking query, specified by the query-builder element, identifies one of the queries defined in the Candidate Select file as the query to use for match processing. This query is used by the master index application when searching for a candidate pool of possible matches to an incoming record. If the query takes any parameters, they are defined using the option element.

Decision Maker

The DecisionMakerConfig element of the Threshold file allows you to specify how the Manager Service evaluates query results. When the master index application processes an incoming record, it compares the new record against existing records in the database and assigns a matching weight between possible matches with the incoming record. The master index application uses the values that you specify in this section to determine how to handle records that fall within certain matching weight ranges. Records with a matching weight above the duplicate threshold are treated as potential duplicates; records with a matching weight above the match threshold are treated as potential duplicates or assumed matches, depending on the value of the OneExactMatch parameter and the number of records with a matching weight above the match threshold.

For the default Decision Maker, you can configure the parameters described below.

EUID Generator

The EUID generator controls how EUIDs are created for each unique record in the master index database. For the default EUID generator, you can define three parameters.

IdLength

This parameter defines the length of the EUIDs created by the master index application. By default, the length of the EUID columns in the master index database is 20. If you choose an ID length larger than 20, make sure to manually modify the length of the EUID columns in the database creation scripts.

ChecksumLength

The ChecksumLength parameter allows you to specify the length of a checksum value. Checksum values help validate EUIDs to ensure accurate identification of records as they are transmitted throughout the system. The checksum process attaches a number, generated through an algorithm, to the end of a new EUID. When a host system receives this number, it strips off the checksum digits to obtain the EUID, and then recalculates the checksum using the same algorithm process. If the checksum values agree, the host system knows the EUID number is correct. Specify “0” (zero) if you do not want to use the checksum function.

Using a checksum value affects the IdLength parameter. If you specify a checksum length greater than 0, the EUID generator creates sequential EUIDs based on the sbyn_seq_table table, and then appends the checksum value to the end of the EUID to determine the final EUID number. For example, if you set IdLength to 8 and CheckSum to 2, then the EUIDs assigned by the master index application will be 10 characters long. If the next sequence number is 10908000, the EUID assigned to the next record is 10908000 plus the checksum (it might be 1090800034, for example). The next EUID would be 10908001 plus the checksum (1090800125, for example). The first eight digits are sequential, but the last two digits are seemingly arbitrary.

If you use a checksum value, make sure to take into consideration the total length of the EUIDs (IdLength plus ChecksumLength) when determining the length of the EUID columns in the database.

ChunkSize

For efficiency, the default EUID generator does not need to query the sbyn_seq_table table in the database each time a new EUID is created. Instead, you can specify a number of EUIDs to be allocated in chunks to the EUID generator. For example, if you specify a chunk size of 1000, EUIDs are allocated to the generator 1000 ID numbers at a time. The generator can process up to 1000 new records and assign all 1000 numbers without needing to query sbyn_seq_table. When all 1000 EUIDs are used, another 1000 are allocated. If the server running the master index application is reset before all 1000 numbers are used, the unused numbers are discarded and never used, meaning that EUIDs might not always be assigned sequentially.

Specifying a chunk size affects the numbering of the EUID column in the sbyn_seq_table. If you specify a chunk size of 1, then each time a new EUID is assigned, the value of the EUID column increases by one. If you specify a larger chunk size, then the value of the EUID column increases by the value of the chunk size each time the allocated EUIDs are used. For example, if you specify a chunk size of 1000, the beginning EUID sequence number is 1000, even though EUIDs are assigned beginning with 0001, then 0002, and so on. When the first 1000 EUIDs are assigned, another 1000 EUID numbers are allocated to the generator and the EUID column changes from 1000 to 2000.

The Threshold File (Repository)

The properties of the Manager Service are defined in the Threshold file in XML format. The information entered into the default configuration file is standard across all implementations, so the file will require some customization.

The following topics provide information about working with the Threshold file:

Modifying the Threshold File

You can modify the Threshold file at any time, but you must regenerate the application and redeploy the project after making any changes to the file. Use caution when updating this file after moving into production, since changing certain properties, such as the blocking query, can cause unexpected matching and weighting results. Most of the configuration options in this file cannot be modified using the Configuration Editor. The exceptions are the match and duplicate thresholds. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.

Threshold File Description

Table 10 lists each element in the Threshold file and provides a description of each element along with any requirements or constraints for each element.

Table 10 Threshold File Structure

Element/Attribute
Description
MasterControllerConfig
The configuration class for the Manager Service. The attributes define the module name and Java class. The default values should not be changed.
logic-class
A custom plug-in that defines custom processing logic for the execute match functions that can be called from Collaborations and Business Processes. This element is optional.
logic-class-gui
A custom plug-in that defines custom processing logic for the execute match function that is called from the Enterprise Data Manager (EDM). This element is optional.
update-mode
An indicator of whether to recalculate potential duplicates when a record is updated. Specify Pessimistic to recalculate potential duplicates; specify Optimistic to prevent potential duplicate recalculation on updates.
merged-record-update
An indicator of whether records with a status of Merged can be updated. Specify Enabled to allow updates of merged records; specify Disabled to ensure that records with a Merged status are not updated.
execute-match
Specifies the blocking query to use for match processing.
query-builder
The name of the blocking query to use for match processing. The name must match a query defined in the Candidate Select file.
option
Optional parameters for the blocking query. Currently parameters are not used by any predefined blocking queries.
option/key
A parameter for the blocking query.
option/value
The value of the key specified by the corresponding key attribute.
DecisionMakerConfig
The configuration class for the Decision Maker. The attributes define the module name and Java class. The default values should not be changed.
decision-maker-class
The Java class that contains the methods used by the Decision Maker class. The default value, com.stc.eindex.decision.impl.DefaultDecisionMaker, should not need to be changed, but you can implement a custom Decision Maker class. The default class accepts the parameters described below.
parameters
A list of parameters for the Decision Maker class.
parameter
A definition of a Decision Maker parameter. The parameters element can contain multiple parameter elements, each defining one parameter.
description
A brief description of the parameter. This element is optional.
parameter-name
The name of the parameter. The default Decision Maker class takes the following parameters (see Decision Makerfor more information about these parameters).
  • OneExactMatch - A Boolean indicator of whether an assumed match is made when there are more than one record above the match threshold.

  • SameSystemMatch - A Boolean indicator of whether an assumed match can be made between two records that originate from the same external system.

  • DuplicateThreshold - The lowest match weight at which two records are considered to be potential duplicates.

  • MatchThreshold - The lowest match weight at which two records are assumed to be a match of one another.

parameter-type
The type of parameter. Valid values are java.lang.Long, java.lang.Short, java.lang.Byte, java.lang.String, java.lang.Integer, java.lang.Boolean, java.lang.Double, or java.lang.Float.
parameter-value
The value of the parameter. For OneExactMatch and SameSystemMatch, this must be a Boolean value. For MatchThreshold and DuplicateThreshold, this must be a Float value.
EuidGeneratorConfig
The configuration class for the EUID Generator. The attributes define the module name and Java class. The default values should not be changed.
euid-generator-class
The Java class used by the master index application to generate new EUIDs. The default class is com.stc.eindex.idgen.impl.DefaultEuidGenerator, which assigns sequential EUIDs based on the three parameters described below.
parameters
A list of parameters for the EUID Generator class.
parameter
A parameter definition. The parameters element can contain multiple parameter elements, each defining one parameter.
description
A brief description of the parameter. This element is optional.
parameter-name
The name of the parameter. The default EUID Generator class takes the following parameters (see EUID Generator for more information about these parameters).
  • IdLength - The length of the EUIDs generated by the master index application.

  • CheckSum - The length of the checksum value used to validate EUIDs.

  • ChunkSize - The number of EUIDs allocated to the server at one time.

parameter-type
The type of parameter. Valid values are java.lang.Long, java.lang.Short, java.lang.Byte, java.lang.String, java.lang.Integer, java.lang.Boolean, java.lang.Double, or java.lang.Float.
parameter-value
The value of the parameter. For the default parameters, the values are all integers.

Threshold File Example

Below is a sample of the Threshold file configuration.

<MasterControllerConfig module-name="MasterController" 
parser-class="com.stc.eindex.configurator.impl.master.MasterControllerConfiguration">
   <logic-class>CustomMatchLogic</logic-class>
   <logic-class-gui>CustomMatchLogicEDM</logic-class-gui>
   <update-mode>Pessimistic</update-mode>
   <merged-record-update>Disabled</merged-record-update>
   <execute-match>
      <query-builder name="BLOCKER-SEARCH"></query-builder>
   </execute-match>
</MasterControllerConfig>
<DecisionMakerConfig module-name="DecisionMaker" 
parser-class="com.stc.eindex.configurator.impl.decision.DecisionMakerConfiguration">
   <decision-maker-class>
      com.stc.eindex.decision.impl.DefaultDecisionMaker
   </decision-maker-class>
   <parameters>
      <parameter>
         <parameter-name>OneExactMatch</parameter-name>
         <parameter-type>java.lang.Boolean</parameter-type>
         <parameter-value>false</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>SameSystemMatch</parameter-name>
         <parameter-type>java.lang.Boolean</parameter-type>
         <parameter-value>true</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>DuplicateThreshold</parameter-name>
         <parameter-type>java.lang.Float</parameter-type>
         <parameter-value>7.25</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>MatchThreshold</parameter-name>
         <parameter-type>java.lang.Float</parameter-type>
         <parameter-value>29.0</parameter-value>
      </parameter>            
   </parameters>
</DecisionMakerConfig>
<EuidGeneratorConfig module-name="EuidGenerator" parser-class=
"com.stc.eindex.configurator.impl.idgen.EuidGeneratorConfiguration">
   <euid-generator-class>
      com.stc.eindex.idgen.impl.DefaultEuidGenerator
   </euid-generator-class>
   <parameters>
      <parameter>
         <parameter-name>IdLength</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>10</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>ChecksumLength</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>0</parameter-value>
      </parameter>
      <parameter>
         <parameter-name>ChunkSize</parameter-name>
         <parameter-type>java.lang.Integer</parameter-type>
         <parameter-value>1000</parameter-value>
      </parameter>
   </parameters>
</EuidGeneratorConfig>