Skip Navigation Links | |
Exit Print View | |
![]() |
Oracle Java CAPS Master Index Configuration Reference Java CAPS Documentation |
Oracle Java CAPS Master Index Configuration Reference
About Oracle Java CAPS Master Index
Oracle Java CAPS Master Index Configuration
Features of Oracle Java CAPS Master Index
Configuration Overview for Oracle Java CAPS Master Index
About the Configuration Files for Oracle Java CAPS Master Index
Master Index validation.xml File
Master Index security.xml File
Match and Standardization Engine Configuration Files
Using the Editors for Oracle Java CAPS Master Index
Master Index Configuration Editor
Master Index Object Definition Configuration
Master Index Object Definition Components
Master Index Object Definition Objects
Master Index Object Definition Fields
Master Index Object Definition Relationships
The Master Index object.xml File
Modifying the Master Index Object Definition
Basic Queries in a Master Index
Blocking Queries in a Master Index
Blocking Query Block Processing
Phonetic Queries in a Master Index
Blocking Query Range Searching
Blocking Query Offset and Constant Combinations
Master Controller Configuration
Custom Logic Classes in master.xml
Merged Record Updates in master.xml
Match and Standardization Engines
Block Picker and Pass Controller
Survivor Strategy Configuration
The Survivor Calculator and the SBR
Survivor Helper Default Strategy
Survivor Helper Weighted Strategy
Survivor Helper Union Strategy
Weighted Calculator SourceSystem Strategy
Weighted Calculator SystemAgreement Strategy
Weighted Calculator MostRecentModified Strategy
Update Manager Update Policies
Update Manager Update Policy Flag
SBR, Matching, and Blocking Filter Configuration
Field Validation Configuration
The Matching Service, configured in mefa.xml, contains the matching and standardization engines used in the match process, as well as the phonetic encoders used for phonetically encoding data. You can configure the match and standardization engines for the master index application in mefa.xml, and also specify special standardization, matching, and weighting logic used by the engines. This file also defines the strategy for identifying unique records and finding the best matches in the master index database. For optimization, the Match Field components are configurable, allowing you to choose the strategy that best fits your requirements or to implement your own custom components.
The following topics describe the components of the Matching Service and the structure of mefa.xml:
The Matching Service is configured by mefa.xml, which defines the configurable properties for standardizing data and matching records. These processes are highly configurable for the master index application, allowing you to design and develop the match strategy that best suits your processing requirements.
The following components make up the Matching Service:
Standardization of incoming data applies three functions to the data processed by the master index application: reformatting (or parsing), normalization, and phonetic encoding. These functions help prepare data for matching and searching. Some fields might require all three steps, some just normalization and phonetic conversion, and other data might only need phonetic encoding. You can specify which fields require any of these steps in the standardization configuration section of mefa.xml. In addition, you can specify the nationality of the data being standardized by the Master Index Standardization Engine.
If incoming records contain data that is not formatted properly, it must be reformatted before it can be normalized. One good example of this is free-form text address fields. If you are matching or searching on street addresses that are contained in one or more free-form text fields (that is, the street address is contained in one field, apartment number in another, and so on), that field must be parsed into its individual components (house number, street name, street type, and so on) before the data can be normalized.
When you normalize data, the data is converted into a standard form. A common use for normalization is to convert nicknames into their standard names, such as converting “Rich” to “Richard” or “Meg” to “Margaret”. Another example is normalizing street address components. For example, “Dr.” or “Drv” in a street address might be normalized to “Drive”. Normalized values are obtained from lookup tables.
Once data has gone through any necessary reformatting and normalization, it can be phonetically encoded. Phonetic values are generally used in blocking queries in order to obtain all possible matches to an incoming record. They are also used to perform searches from the MIDM that allow for misspellings and typographic errors. Typically, first names use Soundex encoding and last names and street names use NYSIIS encoding.
The MatchingConfig section of mefa.xml allows you to define the data fields that are sent to the match engine (called the match string). Probabilistic weighting is performed only against the fields you specify as the match columns. You can specify any field in the object structure as a match column as long as the is configured to use all fields specified. You must specify at least one match field. You can further configure the match string by removing known default or invalid values from the matching process. For more information, see SBR, Matching, and Blocking Filter Configuration.
The configuration of this section of mefa.xml is specific to the you are using and the types of fields on which you are matching. For more information about how the matching should be configured for the Master Index Match Engine, see Oracle Java CAPS Master Index Match Engine Reference.
The MEFAConfig section specifies the Java classes to be used by components of the Matching Service, including the match and standardization engines, block picker, and pass controller. The match and standardization engines control the processes of standardizing data and generating matching probability weights between records. The block picker and pass controller define how the blocking query is executed during the match process.
Oracle Java CAPS Master Index provides the ability to use the standardization and match engines that best suit your indexing requirements. You can configure the master index application to use the Master Index Match Engine and the Master Index Standardization Engine, or you can configure the index to use a customized engine of your choice.
These engines perform two functions:
Standardize data to a common format
Calculate the likelihood that two objects match
The engines are called during match processing, when the master index application retrieves the best matches during a weighted search from the MIDM or when the master index application checks for duplicate records during an insert or update from the MIDM or an external system.
By default, the matching process is executed in multiple stages. Each configured block that defines query criteria is executed and evaluated separately (each query block execution and evaluation is referred to as a match pass). After a block is evaluated, the pass controller determines whether the results found are sufficient or matching should continue by performing another match pass.
The block picker chooses the block definition to use for each match pass. Block definitions define the criteria for each query that checks the database for a subset of the records to be used for matching. The block picker has access to the match results from previous match passes, as well as lists of applicable block definitions that have been executed and of those that have not been executed.
Oracle Java CAPS Master Index provides extensible phonetic encoding capabilities, which are typically used to retrieve records with similar field values from the database for matching. By default, several phonetic encoders are defined to be used in the master index application. Typically, Soundex is used to encode first names (or SoundexFR for first names in the France national domain) and NYSIIS to encode last names. When using the Master Index Standardization Engine, you can specify different types of phonetic encoders, such as Metaphone, Double Metaphone, and Refined Soundex. When you specify the fields in the standardization configuration to be phonetically encoded, you can select one of the encoders defined in the phonetic encoders section.
The following steps illustrate one possible processing sequence that occurs when data is received from an external system and processed by the master index application.
A record is received from an external system.
The local ID does not yet exist in the master index application; initiate the standardization and matching process.
Standardize the record to a common format.
Standardize free-form text.
Normalize fields that need to be converted to a common format.
Phonetically encode fields that are commonly misspelled or spelled in different ways.
Match the record against entries in the database.
Use the selected blocking query (specified in master.xml) to retrieve a block of records that might match the new record.
Build and execute the query according to the input record.
Calculate match scores comparing the incoming record against existing records (this is done by the match engine).
Determine whether to repeat the matching process with another block of records, based on the MEFAConfig element in mefa.xml.
Return match scores for further processing.
Determine whether to add the system record to an existing EUID record or to insert the system record as a new EUID record (based on the parameters defined in the DecisionMaker element of master.xml).
The properties for the match and standardization process are defined in mefa.xml. Some of the information entered into the default configuration file is taken from the wizard, but the file might require additional customization in order to meet your data processing needs.
The following topics provide information about working with the mefa.xml:
You can modify mefa.xml at any time, but modifying the file is not recommended once you move to production because this file defines how records are processed and data integrity is maintained. You must regenerate the application and redeploy the project after making any changes to this file. Modifying this file once you are in production might cause weighting and standardization to be handled differently, causing unexpected match weight results.
Most of the components configured by this file can be modified using the Configuration Editor. The editor provides a graphical interface that simplifies defining normalization, standardization, matching, and phonetic encoding. It also maintains referential integrity between files in cases where standardization, normalization, or phonetic encoding requires additional fields to be added to the object structure. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes.
This topic describes the structure of the XML file, general requirements, and constraints. It also provides a sample implementation.
Table 11 lists each element in mefa.xml and provides a description of each element along with any requirements or constraints for each element.
Table 11 mefa.xml File Structure
|
Below is a short sample of mefa.xml based on a master index application processing person data. This sample covers the basic elements of mefa.xml, but a production environment would contain several more fields to standardize as well as several additional match string fields.
<StandardizationConfig module-name="Standardization" parser-class= "com.sun.mdm.index.configurator.impl.standardization.StandardizationConfiguration"> <standardize-system-object> <system-object-name>Person</system-object-name> <structures-to-normalize> <group standardization-type="PersonName" domain-selector= ”com.sun.mdm.index.matching.impl.SingleDomainSelectorUS"> <unnormalized-source-fields> <source-mapping> <unnormalized-source-field-name> Person.Alias[*].FirstName </unnormalized-source-field-name> <standardized-object-field-id>FirstName </standardized-object-field-id> </source-mapping> <source-mapping> <unnormalized-source-field-name> Person.Alias[*].LastName </unnormalized-source-field-name> <standardized-object-field-id>LastName </standardized-object-field-id> </source-mapping> </unnormalized-source-fields> <normalization-targets> <target-mapping> <standardized-object-field-id>FirstName </standardized-object-field-id> <standardized-target-field-name> Person.Alias[*].StdFirstName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>LastName </standardized-object-field-id> <standardized-target-field-name> Person.Alias[*].StdLastName </standardized-target-field-name> </target-mapping> </normalization-targets> </group> <group standardization-type="PersonName" domain-selector= "com.sun.mdm.index.matching.impl.SingleDomainSelectorUS”> <unnormalized-source-fields> <source-mapping> <unnormalized-source-field-name>Person.FirstName </unnormalized-source-field-name> <standardized-object-field-id>FirstName </standardized-object-field-id> </source-mapping> <source-mapping> <unnormalized-source-field-name>Person.LastName </unnormalized-source-field-name> <standardized-object-field-id>LastName </standardized-object-field-id> </source-mapping> </unnormalized-source-fields> <normalization-targets> <target-mapping> <standardized-object-field-id>FirstName </standardized-object-field-id> <standardized-target-field-name>Person.StdFirstName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>LastName </standardized-object-field-id> <standardized-target-field-name>Person.StdLastName </standardized-target-field-name> </target-mapping> </normalization-targets> </group> </structures-to-normalize> <free-form-texts-to-standardize> <group standardization-type="Address" domain-selector= "com.sun.mdm.index.matching.impl.MultiDomainSelector"> <locale-field-name>Person.Country</locale-field-name> <locale-maps> <locale-codes> <value>Default</value> <locale>US</locale> </locale-codes> </locale-maps> <unstandardized-source-fields> <unstandardized-source-field-name> Person.Address[*].AddressLine1 </unstandardized-source-field-name> <unstandardized-source-field-name> Person.Address[*].AddressLine2 </unstandardized-source-field-name> </unstandardized-source-fields> <standardization-targets> <target-mapping> <standardized-object-field-id>HouseNumber </standardized-object-field-id> <standardized-target-field-name> Person.Address[*].HouseNumber </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>MatchStreetName </standardized-object-field-id> <standardized-target-field-name> Person.Address[*].StreetName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id> StreetNamePrefDirection </standardized-object-field-id> <standardized-target-field-name> Person.Address[*].StreetDir </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>StreetNameSufType </standardized-object-field-id> <standardized-target-field-name> Person.Address[*].StreetType </standardized-target-field-name> </target-mapping> </standardization-targets> </group> </free-form-texts-to-standardize> <phoneticize-fields> <phoneticize-field> <unphoneticized-source-field-name>Person.FirstName_Std </unphoneticized-source-field-name> <phoneticized-target-field-name>Person.FirstName_Phon </phoneticized-target-field-name> <encoding-type>Soundex</encoding-type> </phoneticize-field> <phoneticize-field> <unphoneticized-source-field-name>Person.LastName_Std </unphoneticized-source-field-name> <phoneticized-target-field-name>Person.LastName_Phon </phoneticized-target-field-name> <encoding-type>NYSIIS</encoding-type> </phoneticize-field> <phoneticize-field> <unphoneticized-source-field-name> Person.Address[*].StreetName </unphoneticized-source-field-name> <phoneticized-target-field-name> Person.Address[*].StreetNamePhoneticCode </phoneticized-target-field-name> <encoding-type>NYSIIS</encoding-type> </phoneticize-field> </phoneticize-fields> </standardize-system-object> </StandardizationConfig> <MatchingConfig module-name="Matching" parser-class= "com.sun.mdm.index.configurator.impl.matching.MatchingConfiguration"> <match-system-object> <object-name>Person</object-name> <match-columns> <match-column> <column-name>Enterprise.SystemSBR.Person.StdFirstName </column-name> <match-type>FirstName</match-type> </match-column> <match-column> <column-name>Enterprise.SystemSBR.Person.StdLastName </column-name> <match-type>LastName</match-type> </match-column> <match-column> <column-name>Enterprise.SystemSBR.Person.DOB</column-name> <match-type>DOB</match-type> </match-column> </match-columns> </match-system-object> </MatchingConfig> <MEFAConfig module-name="MEFA" parser-class= "com.sun.mdm.index.configurator.impl.MEFAConfiguration"> <block-picker> <class-name>com.sun.mdm.index.matching.impl.PickAllBlocksAtOnce </class-name> </block-picker> <pass-controller> <class-name>com.sun.mdm.index.matching.impl.PassAllBlocks </class-name> </pass-controller> <class-name> com.sun.mdm.index.matching.adapter.SbmeStandardizerAdapter </class-name> </standardizer-api> <standardizer-config> <class-name> com.sun.mdm.index.matching.adapter.SbmeStandardizerAdapterConfig </class-name> </standardizer-config> <matcher-api> <class-name>com.sun.mdm.index.matching.adapter.SbmeMatcherAdapter </class-name> </matcher-api> <matcher-config> <class-name> com.sun.mdm.index.matching.adapter.SbmeMatcherAdapterConfig </class-name> </matcher-config> </MEFAConfig> <PhoneticEncodersConfig module-name="PhoneticEncoders" parser-class= "com.sun.mdm.index.configurator.impl.PhoneticEncodersConfig"> <encoder> <encoding-type>NYSIIS</encoding-type> <encoder-implementation-class> com.sun.mdm.index.phonetic.impl.Nysiis </encoder-implementation-class> </encoder> <encoder> <encoding-type>Soundex</encoding-type> <encoder-implementation-class> com.sun.mdm.index.phonetic.impl.Soundex </encoder-implementation-class> </encoder> </PhoneticEncodersConfig>