The Matching Service is configured by the Match Field file, which defines the configurable properties for standardizing data and matching records. These processes are highly configurable for the master index application, allowing you to design and develop the match strategy that best suits your processing requirements.
The following components make up the Matching Service:
Standardization of incoming data applies three functions to the data processed by the master index application: reformatting (or parsing), normalization, and phonetic encoding. These functions help prepare data for matching and searching. Some fields might require all three steps, some just normalization and phonetic conversion, and other data might only need phonetic encoding. You can specify which fields require any of these steps in the standardization configuration section of the Match Field file. In addition, you can specify the nationality of the data being standardized by the Sun Match Engine.
The three stages of standardization include the following:
Data ReformattingIf incoming records contain data that is not formatted properly, it must be reformatted before it can be normalized. One good example of this is free-form text address fields. If you are matching or searching on street addresses that are contained in one or more free-form text fields (that is, the street address is contained in one field, apartment number in another, and so on), that field must be parsed into its individual components (house number, street name, street type, and so on) before the data can be normalized.
Data NormalizationWhen you normalize data, the data is converted into a standard form. A common use for normalization is to convert nicknames into their standard names, such as converting “Rich” to “Richard” or “Meg” to “Margaret”. Another example is normalizing street address components. For example, “Dr.” or “Drv” in a street address might be normalized to “Drive”. Normalized values are obtained from lookup tables.
Phonetic EncodingOnce data has gone through any necessary reformatting and normalization, it can be phonetically encoded. Phonetic values are generally used in blocking queries in order to obtain all possible matches to an incoming record. They are also used to perform searches from the EDM that allow for misspellings and typographic errors. Typically, first names use Soundex encoding and last names and street names use NYSIIS encoding.
The MatchingConfig section of the Match Field file allows you to define the data fields that are sent to the match engine (called the match string). Probabilistic weighting is performed only against the fields you specify as the match columns. You can specify any field in the object structure as a match column as long as the is configured to use all fields specified. You must specify at least one match field.
The configuration of this section of the Match Field file is specific to the you are using and the types of fields on which you are matching. For more information about how the matching should be configured for the Sun Match Engine, see Understanding the Sun Match Engine.
The match and standardization engines control the processes of standardizing data and generating matching probability weights between records. Sun Master Index provides the ability to use the standardization and match engines that best suit your indexing requirements. You can configure the master index application to use the Sun Match Engine, or you can configure the index to use a customized engine of your choice.
These engines perform two functions:
Standardize data to a common format
Calculate the likelihood that two objects match
The engines are called during match processing, when the master index application retrieves the best matches during a weighted search from the EDM or when the master index application checks for duplicate records during an insert or update from the EDM or an external system.
The block picker and pass controller define how the blocking query is executed during the match process. By default, the matching process is executed in multiple stages. Each configured block that defines query criteria is executed and evaluated separately (each query block execution and evaluation is referred to as a match pass). After a block is evaluated, the pass controller determines whether the results found are sufficient or matching should continue by performing another match pass.
The block picker chooses the block definition to use for each match pass. Block definitions define the criteria for each query that checks the database for a subset of the records to be used for matching. The block picker has access to the match results from previous match passes, as well as lists of applicable block definitions that have been executed and of those that have not been executed.
Sun Master Index provides extensible phonetic encoding capabilities, which are typically used to retrieve records with similar field values from the database for matching. By default, several phonetic encoders are defined to be used in the master index application. Typically, Soundex is used to encode first names (or SoundexFR for first names in the France national domain) and NYSIIS to encode last names. When using the Sun Match Engine, you can specify different types of phonetic encoders, such as Metaphone, Double Metaphone, and Refined Soundex. When you specify the fields in the standardization configuration to be phonetically encoded, you can select one of the encoders defined in the phonetic encoders section.