Skip Navigation Links | |
Exit Print View | |
Understanding Oracle Java CAPS Master Index Configuration Options (Repository) Java CAPS Documentation |
Understanding Oracle Java CAPS Master Index Configuration Options (Repository)
About Oracle Java CAPS Master Index (Repository)
Oracle Java CAPS Master Index Configuration
Features of Oracle Java CAPS Master Index
Configuration Overview for Oracle Java CAPS Master Index (Repository)
About the Configuration Files for Oracle Java CAPS Master Index (Repository)
Master Index Object Definition File
Master Index Candidate Select File
Master Index Field Validation File
Master Index Enterprise Data Manager File
Match and Standardization Engine Configuration Files
Using the Editors for Oracle Java CAPS Master Index (Repository)
Configuration Editor - Repository
Master Index Object Definition Configuration (Repository)
Master Index Object Definition Components (Repository)
Master Index Object Definition Objects
Master Index Object Definition Fields
Master Index Object Definition Relationships
The Master Index Object Definition File (Repository)
Modifying the Master Index Object Definition
Object Definition File Description
Object Definition File Example
Candidate Select Configuration (Repository)
Query Builder Components (Repository)
Basic Queries in a Master Index (Repository)
Blocking Queries in a Master Index (Repository)
Phonetic Queries in a Master Index (Repository)
The Candidate Select File (Repository)
Modifying the Candidate Select File
Candidate Select File Description
Range Search Processing (Repository)
Blocking Query Range Searching
Blocking Query Offset and Constant Combinations
Threshold Configuration (Repository)
Manager Service Components (Repository)
The Threshold File (Repository)
Match Field Configuration (Repository)
Matching Service Components (Repository)
Match and Standardization Engines
Block Picker and Pass Controller
Sample Standardization and Matching Sequence (Repository)
The Match Field File (Repository)
Modifying the Match Field File
Best Record Configuration (Repository)
The Survivor Calculator and the SBR (Repository)
Update Manager Components (Repository)
Survivor Helper Default Strategy
Survivor Helper Weighted Strategy
Survivor Helper Union Strategy
Weighted Calculator SourceSystem Strategy
Weighted Calculator SystemAgreement Strategy
Weighted Calculator MostRecentModified Strategy
Update Manager Update Policies
Update Manager Update Policy Flag
The Best Record File (Repository)
Field Validation Configuration (Repository)
The Field Validation File (Repository)
Modifying the Field Validation File
Field Validation File Structure
Enterprise Data Manager Configuration
The Enterprise Data Manager File Structure
Modifying the Enterprise Data Manager File
Enterprise Data Manager File Description
The Update Manager contains the logic used to generate the single best record (SBR) for a given object. The SBR is defined by a mapping of fields from external systems to the SBR, allowing you to define the fields from each system that are kept in the SBR. For each field in the SBR, an ePath denotes the location in the external system records from which the value is retrieved. Since there can be many external systems, you can optionally specify a strategy to select the SBR field from the list of external values. You can also specify any additional fields that might be required by the selection strategy to determine which external system contains the best data (by default, the record’s update date and time is always taken into account). The Update Manager also specifies any custom Java classes to be used for different types of update transactions, such as merges, unmerges, changes to existing records, and new record inserts.
The Update Manager is configured in the Best Record file. The following topics describe the Update Manager and the Best Record file.
The survivor calculator generates and updates the SBR for each record. The SBR for an enterprise object is created from what is considered to be the most reliable information contained in each system record for a particular object. The information used from each local system to populate the SBR is determined by the survivor calculator defined in the Update Manager. The fields defined in the survivor calculator are also the fields contained in the SBR. You can configure the survivor calculator to determine the best fields for the SBR from a combination of all the source system records. The survivor calculator can consider factors such as the relative reliability of a system, how recent the data is, and whether data entered from the EDM overwrites data entered from any other system.
The survivor calculator consists of the rules defined for the survivor helper and the weighted calculator.
Note - Phonetic and standardized fields do not need to be defined in the Best Record file since their field values are determined by the standardization engine for the SBR.
The logic that determines how the fields in the SBR are populated and how certain updates are performed is highly configurable in a master index application, allowing you to design and develop the match strategy that best suits your processing requirements.
The survivor helper defines a list of fields on which survivor calculation is performed, and thus the list of fields included in the SBR. Each field is called a candidate field. For each candidate field, you specify whether to use the default survivor calculation strategy or a custom strategy. The survivor helper must list each field contained in the SBR; any fields that are not listed here will not be populated in the SBR.
For each field, you can specify system fields to be taken into consideration as well as a specific survivorship strategy. There are three basic strategies provided by Oracle Java CAPS Master Index to determine survivorship for each field. You can define and implement custom strategies.
Default Strategy
Weighted Strategy
Union Strategy
This strategy maps fields directly from the local system records to the SBR. When you specify the default survivor strategy for a field, you must also specify the parameter that defines the source system. For example, if you specify the default survivor calculator for the field “Person.LastName” and define the preferred system as “SystemA”, the last name field in the SBR is always taken from SystemA (unless the value is overridden in the EDM).
The default survivor strategy is com.stc.eindex.survivor.impl.DefaultSurvivorStrategy.
This strategy is the most complex survivor strategy, and uses a combination of weighted calculations to determine the most reliable source of data for each field. This strategy is highly customizable and you can define which calculation or set of calculations to use for each field. The calculations can be based on the update date of the data, system reliability, and agreement between systems. In the default configuration of the file, the calculations are defined in the WeightedCalculator section of the file.
The weighted survivor strategy is com.stc.eindex.survivor.impl.WeightedSurvivorStrategy. You can define general weighted calculations to be performed by default for each field, and you can define specialized calculations to be performed for specific fields.
This strategy combines the data from all source systems to populate the fields in the SBR for which this strategy is specified. For example, if you store aliases for person names in the database, you want to store all possible alias records and not just the “best” alias information. In order to do this, specify the union strategy for the alias object. This means that all alias information from all source systems is stored in the SBR.
The union strategy is applied to entire objects rather than to fields. This strategy combines all child objects from an enterprise objects source systems to populate the SBR. If the source systems contain two or more instances of a child object with the same unique key (such as two home telephone numbers), the union strategy only populates the most current child object in the SBR. For example, if the union strategy is assigned to the address object and each address object is identified by a unique key (such as the address type), the SBR only contains the most current address record of each address type (for example, one home address, one office address, and so on).
The union strategy is com.stc.eindex.survivor.impl.UnionSurvivorStrategy.
By default, the weighted calculator implements the weighted strategy defined above. Use the WeightedCalculator section to define conditions and weights that determine the best information with which to populate the SBR. The weighted calculator selects a single value for the SBR from a set of system fields. The selection process is based on the different qualities defined for each field.
The weighted calculator defines two sets of rules. The default rules apply to all fields in a record except those fields for which rules are specifically defined. The candidate rules only apply to those fields for which they are specifically defined. If you modify the default rules, the changes will apply to all fields except the fields for which candidate rules are defined.
You can define several strategies to help the weighted calculator determine the best information to populate into each field of the SBR. Each of these strategies is defined by a quality, a preference, and a utility. The quality defines the type of weighted calculation to perform, the preference indicates the source being rated, and the utility indicates the reliability. You can define multiple strategies for each field, and a linear summation on the utility score of each strategy determines the best value to populate in the SBR field.
The weighted calculator strategies include:
SourceSystem
SystemAgreement
MostRecentModified
This strategy indicates the best source system for a field, and is used when the quality of the field in question depends on its origin. For example, to indicate that the data from SystemA for a specific field is of a higher quality than SystemB, define a SourceSystem quality for “SystemA” and one for “SystemB”. Then assign SystemA a higher utility value (85.0, for example) and SystemB a lower utility value (30.0, for example). This indicates that SystemA is a more reliable source for the field. If both SystemA and SystemB contain the specified field, the value from SystemA is populated into the SBR. If the field is empty in SystemA but the field in SystemB contains a value, then the value from SystemB is used.
This strategy prorates the utility score based on the number of systems whose values for the specified field are in agreement. For example, if the first name field for SystemA is “John”, for SystemB is “John”, and for SystemC is “Jon”, SystemA and SystemB together receive two-thirds of the utility score, while SystemC only receives one-third. The value populated into the SBR is “John”. You do not need to define a preference for the SystemAgreement strategy, but you must define source systems.
This strategy ranks the field values from the source systems in descending order according to the time that the object was last modified. The value populated in the SBR comes from the most recently modified object. You do not need to define a preference for the MostRecentModified strategy, but you must define a utility.
The Update Manager policies specify custom Java classes that provide additional processing logic for each type of update transaction. By default, this additional processing is not defined in a standard master index application. You can define custom update policies using the Custom Plug-ins function in the master index project, which appears after the project is generated. The Custom Plug-in function also provides the ability to build and compile the custom Java code, and Oracle Java CAPS Master Index automatically incorporates the classes when you generate the application. The Java classes defining the update policies are specified for the master index application in the UpdateManagerConfig element of the Best Record file.
There are seven types of update policies defined in the Update Manager.
Enterprise Merge Policy – The enterprise merge policy defines additional processing to perform when two enterprise objects are merged. This policy is defined by the EnterpriseMergePolicy element.
Enterprise Unmerge Policy – The enterprise unmerge policy defines additional processing to perform when an unmerge transaction occurs. This policy is defined by the EnterpriseUnmergePolicy element.
Enterprise Update Policy – The enterprise update policy defines additional processing to perform when a record is updated. This policy is defined by the EnterpriseUpdatePolicy element.
Enterprise Create Policy – The enterprise create policy defines additional processing to perform when a new record is inserted into the master index database. This policy is defined by the EnterpriseCreatePolicy element.
System Merge Policy – The system merge policy defines additional processing to perform when two system objects are merged. This policy is defined by the SystemMergePolicy element.
System Unmerge Policy – The system unmerge policy defines additional processing to perform when system objects are unmerged. This policy is defined by the SystemUnmergePolicy element.
UndoAssumeMatchPolicy – The undo assume match policy defines additional processing to perform when an assumed match transaction is reversed. This policy is defined by the UndoAssumeMatchPolicy element.
The update policy section includes a flag that can prevent the update policies from being carried out if no changes were made to the existing record. When set to “true”, the SkipUpdateIfNoChange flag prevents the update policies from being performed when no changes are made to an existing record. Setting the flag to true helps increase performance when processing a large number of updates.
The properties for the update process are defined in the Best Record file in XML format. Some of the information entered into the default configuration file is based on the fields defined in the wizard and some is standard across all implementations. For most implementations, this file will require customization.
The following topics provide information about working with the Best Record file:
You can customize the configuration of the Update Manager by modifying the Best Record file. This file cannot be modified using the Configuration Editor; you need to modify the file directly. You can modify this file at any time, but it is not recommended after moving into production. The configuration controls how the SBR for each object is created, and modifying the file can cause discrepancies in how SBRs are formed before and after the modifications. It might also cause discrepancies in match results, since matching is performed against the SBR. You must regenerate the application and redeploy the project after modifying this file. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.
Table 12 lists each element in the Best Record file and provides a description of each element along with any requirements or constraints for each element.
Table 12 Best Record File Structure
|
Below is a sample of the Best Record file using a very small object structure based on person data. Note that standardized and phonetic fields are included in the candidate fields to ensure that they are also included in the SBR. In this sample, all fields use the default strategy except those included in the Alias object, which uses the union strategy. The value that is populated in the LastName field of the SBR is dependent on the SSN field of the system objects. In addition, custom logic is defined only for the SSN field; the remaining fields use the default logic defined in the default-parameters element.
<SurvivorHelperConfig module-name="SurvivorHelper" parser-class="com.stc.eindex.configurator.impl.SurvivorHelperConfig"> <helper-class>com.stc.eindex.survivor.impl.DefaultSurvivorHelper </helper-class> <default-survivor-strategy> <strategy-class> com.stc.eindex.survivor.impl.WeightedSurvivorStrategy </strategy-class> <parameters> <parameter> <parameter-name>ConfigurationModuleName</parameter-name> <parameter-type>java.lang.String</parameter-type> <parameter-value>WeightedSurvivorCalculator </parameter-value> </parameter> </parameters> </default-survivor-strategy> <candidate-definitions> <candidate-field name="Person.LastName"> <system-fields> <field-name>Person.SSN</field-name> </system-fields> </candidate-field> <candidate-field name="Person.FirstName"/> <candidate-field name="Person.MiddleName"/> <candidate-field name="Person.DOB"/> <candidate-field name="Person.Gender"/> <candidate-field name="Person.SSN"/> <candidate-field name="Person.FnamePhoneticCode"/> <candidate-field name="Person.LnamePhoneticCode"/> <candidate-field name="Person.StdFirstName"/> <candidate-field name="Person.StdLastName"/> <candidate-field name="Person.Alias[*].*"> <survivor-strategy> <strategy-class> com.stc.eindex.survivor.impl.UnionSurvivorStrategy </strategy-class> </survivor-strategy> </candidate-field> </candidate-definitions> </SurvivorHelperConfig> <WeightedCalculator module-name="WeightedSurvivorCalculator" parser-class="com.stc.eindex.configurator.impl.WeightedCalculatorConfig"> <candidate-field name="Person.SSN"> <parameter> <quality>SourceSystem</quality> <preference>SBYN</preference> <utility>100.0</utility> </parameter> <parameter> <quality>MostRecentModified</quality> <utility>75.0</utility> </parameter> </candidate-field> <default-parameters> <parameter> <quality>MostRecentModified</quality> <utility>80.0</utility> </parameter> <parameter> <quality>SourceSystem</quality> <preference>SBYN</preference> <utility>100.0</utility> </parameter> </default-parameters> </WeightedCalculator> <UpdateManagerConfig module-name="UpdateManager" parser-class="com.stc.eindex.configurator.impl.UpdateManagerConfig"> <EnterpriseMergePolicy>com.stc.eindex.user.CustomMergePolicy </EnterpriseMergePolicy> <EnterpriseUnmergePolicy>com.stc.eindex.user.CustomUnmergePolicy </EnterpriseUnmergePolicy> <EnterpriseUpdatePolicy>com.stc.eindex.user.CustomUpdatePolicy </EnterpriseUpdatePolicy> <EnterpriseCreatePolicy>com.stc.eindex.user.CustomCreatePolicy </EnterpriseCreatePolicy> <SystemMergePolicy>com.stc.eindex.user.CustomSystemMergePolicy </SystemMergePolicy> <SystemUnmergePolicy>com.stc.eindex.user.CustomSystemUnmergePolicy </SystemUnmergePolicy> <UndoAssumeMatchPolicy>com.stc.eindex.user.CustomUndoMatchPolicy </UndoAssumeMatchPolicy> <SkipUpdateIfNoChange>true</SkipUpdateIfNoChange> </UpdateManagerConfig>
The following sample illustrates how the weighted calculator uses the parameters you define to determine which field values to use in the SBR. Using this sample, if there is a value in only one of the system records but not in the other, that value is used in the SBR regardless of update date. If there is a value in both system records and they were updated at the same time, the SAP field value is used (80.0>30.0). If there is a value in both system records, but CDW was the most recently modified, the value from CDW is populated into the SBR ((30.0+70.0)>80.0)
<default-parameters> <parameter> <quality>SourceSystem</quality> <preference>SAP</preference> <utility>80.0</utility> </parameter> <parameter> <quality>MostRecentModified</quality> <utility>70.0</utility> </parameter> <parameter> <quality>SourceSystem</quality> <preference>CDW</preference> <utility>30.0</utility> </parameter> </default-parameters>