The Sun Match Engine compares two records and returns a match weight indicating the likelihood of a match between the two records. The three primary components of the Sun Match Engine are the configuration files, the standardization engine, and the match engine.
Configuration Files - The Sun Match Engine includes several sets of files that define standardization and matching logic for all supported data types. One set of standardization files is common to all national domains, and one additional set is provided for the following national domains: Australia, France, Great Britain, and the United States. You can customize these files to adapt the standardization and matching logic to your specific needs. The matching configuration file defines the configuration of the matching comparator functions.
Standardization Engine - Standardization involves converting nonstandard data into a standardized form for more accurate and efficient processing. Standardization consists of any one or more of the following actions:
Parsing - Separating a free-text field into its individual components, such as street address information or a business name.
Normalization - Changing the value of a field to a standard version, such as changing a nickname to a common name.
Phonetic Encoding - Changing the value of a field to its phonetic version. The field to be converted can be the original field, a parsed field, a normalized field, or a parsed and normalized field.
Using the person data type, for example, first names such as “Bill” and “Will” are normalized to “William”, which is then phonetically converted. Using the street address data type, street addresses are parsed into their component parts, such as house numbers, street names, and so on. The street name is then phonetically converted. Standardization logic is defined in the standardization engine configuration files and in the StandardizationConfig section of the Match Field file, and is performed prior to assigning match weights.
Match Engine– Matching involves comparing two standardized records and returning a weight that indicates the likelihood of a match between the two records. A higher weight indicates a greater likelihood of a match. Matching criteria and logic are defined in the match engine configuration file. The data fields that are sent to the Sun Match Engine for matching, known as the match string, are defined in the MatchingConfig section of the Match Field file. The match engine configuration files define how the match string is standardized and which matching rules to use to process each match field.