Skip Navigation Links | |
Exit Print View | |
Understanding Oracle Java CAPS Master Index Configuration Options (Repository) Java CAPS Documentation |
Understanding Oracle Java CAPS Master Index Configuration Options (Repository)
About Oracle Java CAPS Master Index (Repository)
Oracle Java CAPS Master Index Configuration
Features of Oracle Java CAPS Master Index
Configuration Overview for Oracle Java CAPS Master Index (Repository)
About the Configuration Files for Oracle Java CAPS Master Index (Repository)
Master Index Object Definition File
Master Index Candidate Select File
Master Index Field Validation File
Master Index Enterprise Data Manager File
Match and Standardization Engine Configuration Files
Using the Editors for Oracle Java CAPS Master Index (Repository)
Configuration Editor - Repository
Master Index Object Definition Configuration (Repository)
Master Index Object Definition Components (Repository)
Master Index Object Definition Objects
Master Index Object Definition Fields
Master Index Object Definition Relationships
The Master Index Object Definition File (Repository)
Modifying the Master Index Object Definition
Object Definition File Description
Object Definition File Example
Candidate Select Configuration (Repository)
Query Builder Components (Repository)
Basic Queries in a Master Index (Repository)
Blocking Queries in a Master Index (Repository)
Phonetic Queries in a Master Index (Repository)
The Candidate Select File (Repository)
Modifying the Candidate Select File
Candidate Select File Description
Range Search Processing (Repository)
Blocking Query Range Searching
Blocking Query Offset and Constant Combinations
Threshold Configuration (Repository)
Manager Service Components (Repository)
Match Field Configuration (Repository)
Matching Service Components (Repository)
Match and Standardization Engines
Block Picker and Pass Controller
Sample Standardization and Matching Sequence (Repository)
The Match Field File (Repository)
Modifying the Match Field File
Best Record Configuration (Repository)
The Survivor Calculator and the SBR (Repository)
Update Manager Components (Repository)
Survivor Helper Default Strategy
Survivor Helper Weighted Strategy
Survivor Helper Union Strategy
Weighted Calculator SourceSystem Strategy
Weighted Calculator SystemAgreement Strategy
Weighted Calculator MostRecentModified Strategy
Update Manager Update Policies
Update Manager Update Policy Flag
The Best Record File (Repository)
Modifying the Best Record File
Field Validation Configuration (Repository)
The Field Validation File (Repository)
Modifying the Field Validation File
Field Validation File Structure
Enterprise Data Manager Configuration
The Enterprise Data Manager File Structure
Modifying the Enterprise Data Manager File
Enterprise Data Manager File Description
In the Threshold file, you define certain system parameters for the Manager Service, such as matching thresholds, EUID properties, and the blocking query to use for match processing. The Manager Service is the main interface of the indexing system. This interface coordinates all components of the master index application, including the database, master index project, Enterprise Data Manager, runtime environment, and match engine. The main interface is a stateless session bean, though some methods return objects that have handles to stateful beans.
The following topics describe the Manager Service and the Threshold file.
In the Threshold file, you define certain properties of the match process, such as duplicate and match thresholds, the query to use for matching, logic for automatic merges, and properties of the EUIDs assigned by the master index application (such as their length and whether a checksum value is used). This file is also used to define the update mode (optimistic or pessimistic) and merged record updates.
The following topics describe the configurable components in the Threshold file:
Custom logic classes specify any custom plug-ins created for the master index project that define custom processing for the execute match methods. If no classes are specified, execute match processing is carried out using the default logic (this is described in Understanding Oracle Java CAPS Master Index Processing (Repository)).
The update mode specifies whether a record’s potential duplicate list is reevaluated when key fields are updated in the record. Performing the reevaluation helps keep the potential duplicate list current, but requires more system resources.
There are two update modes.
Pessimistic – In this mode, a record’s potential duplicates are reevaluated whenever updates are made to the record’s key fields. Key fields are fields involved in blocking and matching.
Optimistic – In this mode, potential duplicates are not reevaluated when key fields are updated in a record. After an update, the potential duplicate list for a record remains the same as before the update occurred.
The merge update status determines whether changes can be made to records that have a status of “merged”. These are the EUID records that are not retained after a merge. For example, when an incoming record is an assumed match with an SBR that has a status of “merged”, the master index application checks the value of the merged-record-update element. If the element is set to “Enabled”, the merged SBR is updated with the new information. If the element is set to “Disabled”, an exception is thrown and the update is not performed. Typically, it is recommended that merged records not be updated.
The blocking query, specified by the query-builder element, identifies one of the queries defined in the Candidate Select file as the query to use for match processing. This query is used by the master index application when searching for a candidate pool of possible matches to an incoming record. If the query takes any parameters, they are defined using the option element.
The DecisionMakerConfig element of the Threshold file allows you to specify how the Manager Service evaluates query results. When the master index application processes an incoming record, it compares the new record against existing records in the database and assigns a matching weight between possible matches with the incoming record. The master index application uses the values that you specify in this section to determine how to handle records that fall within certain matching weight ranges. Records with a matching weight above the duplicate threshold are treated as potential duplicates; records with a matching weight above the match threshold are treated as potential duplicates or assumed matches, depending on the value of the OneExactMatch parameter and the number of records with a matching weight above the match threshold.
For the default Decision Maker, you can configure the parameters described below.
OneExactMatchThis parameter specifies logic for assumed matches. If OneExactMatch is set to true and there is more than one record above the match threshold, then none of the records are considered an assumed match and all are flagged as potential duplicates. If OneExactMatch is set to false and there is more than one record above the match threshold, then the record with the highest matching weight is considered an assumed match and the rest are flagged as potential duplicates.
SameSystemMatchThis parameter indicates whether the master index application will match two records that originated from the same system whose matching weight falls above the match threshold. If SameSystemMatch is set to true, no assumed matches are made between records associated with the same system. If SameSystemMatch is set to false, assumed matches can be made between records associated with the same system.
DuplicateThresholdThe duplicate threshold specifies the matching probability weight at or above which two records are considered to potentially represent the same object. Records with matching weights between the duplicate and match thresholds are always flagged as potential duplicates. A thorough data analysis combined with testing will help determine the best value for the duplicate and match thresholds.
MatchThresholdThe match threshold specifies the matching probability weight at or above which two records are assumed to be a match and are automatically merged in the master index database.
The EUID generator controls how EUIDs are created for each unique record in the master index database. For the default EUID generator, you can define three parameters.
IdLength
This parameter defines the length of the EUIDs created by the master index application. By default, the length of the EUID columns in the master index database is 20. If you choose an ID length larger than 20, make sure to manually modify the length of the EUID columns in the database creation scripts.
ChecksumLength
The ChecksumLength parameter allows you to specify the length of a checksum value. Checksum values help validate EUIDs to ensure accurate identification of records as they are transmitted throughout the system. The checksum process attaches a number, generated through an algorithm, to the end of a new EUID. When a host system receives this number, it strips off the checksum digits to obtain the EUID, and then recalculates the checksum using the same algorithm process. If the checksum values agree, the host system knows the EUID number is correct. Specify “0” (zero) if you do not want to use the checksum function.
Using a checksum value affects the IdLength parameter. If you specify a checksum length greater than 0, the EUID generator creates sequential EUIDs based on the sbyn_seq_table table, and then appends the checksum value to the end of the EUID to determine the final EUID number. For example, if you set IdLength to 8 and CheckSum to 2, then the EUIDs assigned by the master index application will be 10 characters long. If the next sequence number is 10908000, the EUID assigned to the next record is 10908000 plus the checksum (it might be 1090800034, for example). The next EUID would be 10908001 plus the checksum (1090800125, for example). The first eight digits are sequential, but the last two digits are seemingly arbitrary.
If you use a checksum value, make sure to take into consideration the total length of the EUIDs (IdLength plus ChecksumLength) when determining the length of the EUID columns in the database.
ChunkSize
For efficiency, the default EUID generator does not need to query the sbyn_seq_table table in the database each time a new EUID is created. Instead, you can specify a number of EUIDs to be allocated in chunks to the EUID generator. For example, if you specify a chunk size of 1000, EUIDs are allocated to the generator 1000 ID numbers at a time. The generator can process up to 1000 new records and assign all 1000 numbers without needing to query sbyn_seq_table. When all 1000 EUIDs are used, another 1000 are allocated. If the server running the master index application is reset before all 1000 numbers are used, the unused numbers are discarded and never used, meaning that EUIDs might not always be assigned sequentially.
Specifying a chunk size affects the numbering of the EUID column in the sbyn_seq_table. If you specify a chunk size of 1, then each time a new EUID is assigned, the value of the EUID column increases by one. If you specify a larger chunk size, then the value of the EUID column increases by the value of the chunk size each time the allocated EUIDs are used. For example, if you specify a chunk size of 1000, the beginning EUID sequence number is 1000, even though EUIDs are assigned beginning with 0001, then 0002, and so on. When the first 1000 EUIDs are assigned, another 1000 EUID numbers are allocated to the generator and the EUID column changes from 1000 to 2000.
The properties of the Manager Service are defined in the Threshold file in XML format. The information entered into the default configuration file is standard across all implementations, so the file will require some customization.
The following topics provide information about working with the Threshold file:
You can modify the Threshold file at any time, but you must regenerate the application and redeploy the project after making any changes to the file. Use caution when updating this file after moving into production, since changing certain properties, such as the blocking query, can cause unexpected matching and weighting results. Most of the configuration options in this file cannot be modified using the Configuration Editor. The exceptions are the match and duplicate thresholds. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.
Table 10 lists each element in the Threshold file and provides a description of each element along with any requirements or constraints for each element.
Table 10 Threshold File Structure
|
Below is a sample of the Threshold file configuration.
<MasterControllerConfig module-name="MasterController" parser-class="com.stc.eindex.configurator.impl.master.MasterControllerConfiguration"> <logic-class>CustomMatchLogic</logic-class> <logic-class-gui>CustomMatchLogicEDM</logic-class-gui> <update-mode>Pessimistic</update-mode> <merged-record-update>Disabled</merged-record-update> <execute-match> <query-builder name="BLOCKER-SEARCH"></query-builder> </execute-match> </MasterControllerConfig> <DecisionMakerConfig module-name="DecisionMaker" parser-class="com.stc.eindex.configurator.impl.decision.DecisionMakerConfiguration"> <decision-maker-class> com.stc.eindex.decision.impl.DefaultDecisionMaker </decision-maker-class> <parameters> <parameter> <parameter-name>OneExactMatch</parameter-name> <parameter-type>java.lang.Boolean</parameter-type> <parameter-value>false</parameter-value> </parameter> <parameter> <parameter-name>SameSystemMatch</parameter-name> <parameter-type>java.lang.Boolean</parameter-type> <parameter-value>true</parameter-value> </parameter> <parameter> <parameter-name>DuplicateThreshold</parameter-name> <parameter-type>java.lang.Float</parameter-type> <parameter-value>7.25</parameter-value> </parameter> <parameter> <parameter-name>MatchThreshold</parameter-name> <parameter-type>java.lang.Float</parameter-type> <parameter-value>29.0</parameter-value> </parameter> </parameters> </DecisionMakerConfig> <EuidGeneratorConfig module-name="EuidGenerator" parser-class= "com.stc.eindex.configurator.impl.idgen.EuidGeneratorConfiguration"> <euid-generator-class> com.stc.eindex.idgen.impl.DefaultEuidGenerator </euid-generator-class> <parameters> <parameter> <parameter-name>IdLength</parameter-name> <parameter-type>java.lang.Integer</parameter-type> <parameter-value>10</parameter-value> </parameter> <parameter> <parameter-name>ChecksumLength</parameter-name> <parameter-type>java.lang.Integer</parameter-type> <parameter-value>0</parameter-value> </parameter> <parameter> <parameter-name>ChunkSize</parameter-name> <parameter-type>java.lang.Integer</parameter-type> <parameter-value>1000</parameter-value> </parameter> </parameters> </EuidGeneratorConfig>