To configure a Sun Match Engine application for specific data types and for the Sun Match Engine, you must customize the Matching Service by modifying the Match Field file in the master index project. Configuring the matching service consists of the following four tasks.
The StandardizationConfig section of the Match Field file determines which fields are normalized, parsed, or phonetically encoded and defines the nationality of the data being processed. The standardization section includes the following structures.
The StandardizationConfig section defines fields that will be normalized, fields that will be parsed and normalized, and fields that will be phonetically encoded. The standardization types you specify in this section correspond to the match configuration file; the field IDs you can specify are listed in Table 3.
The normalization structure defines fields that are already parsed, but need to be normalized. It also tells the Sun Match Engine where to place the normalized data in the object structure. Matching on any of these fields is determined by the match string and the logic is defined in the match configuration file.
Of the three data types processed by the Sun Match Engine, only the person name data type is expected to provide information in fields that are already parsed; that is, the first, last, and middle names appear in separate fields, as do the suffix, title, and so on. The person standardization files define logic for normalizing person name fields. By default, only the names you specify for matching in the wizard are defined for normalization. You can define normalization for additional name fields, such as maiden name, spouse’s name, and so on. For each normalization structure, you must specify the national domains for the data you are processing.
Defining New Fields for Normalization
The fields you define for normalization in the Match Field file can include any name fields. If you define normalization for fields that are not currently defined for normalization in the Match Field file, make the following additional changes.
In the Match Field file, define the normalization structure, using the appropriate standardization type (PersonName), domain selector, and field IDs (FirstName, MiddeName, or LastName).
Add the new fields that will store the normalized field value to the appropriate objects in the Object Definition file.
If any of the normalized fields are to be used for blocking, modify the Candidate Select file by adding the new fields to the blocking query.
Regenerate the master index application in NetBeans to include the new fields in the database creation script, the outbound Object Type Definition (OTD), and the method OTD.
To specify that the new normalized fields be used for matching, do the following:
Determine the match type or the match comparison function you want to use to match the normalized data, and modify the match configuration file (matchConfigFile.cfg) if needed.
Add the new normalized field to the match-columns element of the MatchingConfig section of the Match Field file, making sure to use the appropriate match type from the match configuration file.
The fields that must be parsed, and possibly normalized, are defined in a standardization structure in the StandardizationConfig section of the Match Field file. The standardization structure tells the Sun Match Engine where to place the standardized information extracted from the parsed fields. The target fields you specify for standardization facilitate searching by the parsed values. Matching on any of these fields is determined by the match string and the logic is defined in the match configuration file.
The Sun Match Engine expects business names and street address information in free-form text fields that must be parsed and normalized prior to matching. The logic for parsing and normalizing street address information is contained in the address standardization files; the logic for parsing and normalizing business names is contained in the business standardization files. You can customize the standardization of these data types by modifying the appropriate patterns file. For each standardization structure, you must specify the national domains for the data being processed.
Defining New Fields for Standardization
The fields you define for standardization in the Match Field file can include any street address or business name field. Perform the following steps if you need to define one of these field types for standardization.
If necessary, modify the patterns file for the type of data you are standardizing.
You can define new input and output patterns or modify existing ones.
Define the standardization structure, using the appropriate standardization type (BusinessName or Address), domain selector, and field IDs (described in Table 3).
Add the new fields that will store the parsed or normalized data to the appropriate objects in the Object Definition file.
If any of the parsed or normalized fields are to be used for blocking, modify the Candidate Select file by adding the new fields to the blocking query.
Regenerate the master index application in NetBeans to include the new fields in the database creation script, the outbound Object Type Definition (OTD), and the method OTD.
To specify that the new standardized fields be used for matching, do the following:
Determine the match type or the match comparison function you want to use to match the parsed data, and modify the match configuration file (matchConfigFile.cfg) if needed.
Add the new standardized field to the match-columns element of the MatchingConfig section of the Match Field file, making sure to use the appropriate match type from the match configuration file.
The fields to be phonetically encoded are defined in a phonetic encoding structure in the StandardizationConfig section ofthe Match Field file. The phonetic encoding structure tells the Sun Match Engine where to place the phonetic data created from the fields that are encoded. You can define any field in the object structure for phonetic encoding.
Defining New Fields for Phonetic Encoding
The fields you define for phonetic encoding in the Match Field file can include any field.
Determine the type of phonetic encoder to use to convert the field.
You can use any of the encoders described in Table 7.
Define the phonetic encoding structure, using the appropriate encoders.
Add the new fields that will store the phonetic values to the appropriate objects in the Object Definition file.
If any the phonetic fields are to be used for blocking, modify the Candidate Select file by adding the new fields to the blocking query.
Regenerate the master index application in NetBeans to include the new fields in the database creation script, the outbound OTD, and the method OTD.
The MatchingConfig section of the Match Field file determines which fields are passed to the Sun Match Engine for matching (the match string). If you are matching on fields parsed from a free-form text field, define each individual parsed field you want to use for matching. The default fields listed in the MatchingConfig section depend on the fields you specified for matching in the wizard (for Sun Master Patient Index, the default fields are FirstName, LastName, DOB, Gender, and SSN).
The match types you can use for each field in this section are defined in the first column of the match configuration file. Make sure the match type you specify has the correct matching logic defined in the match configuration file.
The MEFAConfig section of the Match Field file defines which standardization and match engines will be used by the master index application. By default, the master index application is already configured to use the Sun Match Engine for matching and standardization. For more information, see Understanding Sun Master Index Configuration Options (Repository).
Table 6 lists the elements in the Match Field file that define the match and standardization engine, along with the appropriate values for the Sun Match Engine.
Table 6 Sun Match Engine Standardization and Match Classes
Match Field File Element |
Sun Match Engine Value |
---|---|
standardizer-api |
com.stc.eindex.matching.adapter.SbmeStandardizerAdapter |
standardizer-config |
com.stc.eindex.matching.adapter.SbmeStandardizerAdapter Config |
matcher-api |
com.stc.eindex.matching.adapter.SbmeMatcherAdapter |
matcher-config |
com.stc.eindex.matching.adapter.SbmeMatcherAdapter Config |
The Sun Match Engine supports several phonetic encoders, which are defined in the PhoneticEncodersConfig section of the Match Field file. Any encoders specified in the phonetic encoding structures (see Phonetic Encoding Structures) must also be defined in the PhoneticEncodersConfig section. The classes for the encoders are listed in Table 7.
Soundex - This algorithm is an industry standard for phonetically encoding first names.
French Soundex - This algorithm is based on the Soundex algorithm, but is customized for French characters and names.
Refined Soundex - This algorithm is similar to the Soundex algorithm, but is optimized for spell checking.
NYSIIS - This algorithm is an industry standard for phonetically encoding last names.
Metaphone - This algorithm is similar to the Soundex algorithm, but is better at identifying words that sound similar. This encoder is limited to encoding a single word in ASCII format containing only characters in the A - Z range. No punctuation or numbers can be in the input string.
Double Metaphone - This algorithm is an improvement on the Metaphone algorithm, at times returning two encodings for a word that could have multiple pronunciations.
Encoder |
Java Class |
---|---|
Soundex |
com.stc.eindex.phonetic.impl.Soundex |
NYSIIS |
com.stc.eindex.phonetic.impl.NYSIIS |
Metaphone |
com.stc.eindex.phonetic.impl.Metaphone |
Double Metaphone |
com.stc.eindex.phonetic.impl.DoubleMetaphone |
Refined Soundex |
com.stc.eindex.phonetic.impl.RefinedSoundex |
French Soundex |
com.stc.eindex.phonetic.impl.SoundexFR |