Understanding the Master Index Match Engine

How the Master Index Match Engine Works

The Master Index Match Engine compares records containing similar data types by calculating how closely certain fields in the records match. The resulting comparison weight is either a positive or negative numeric value that represents the degree to which the two sets of data are similar. The match engine relies on probabilistic algorithms to compare data of a given type using a comparison function specific to the type of data being compared. The comparison functions for each matching field are defined in a match configuration file that you can customize for the type of data you are indexing. You can also define custom comparison functions to plug in to the match engine. The formula used to determine the matching weight is based on either matching and unmatching probabilities or on agreement and disagreement weight ranges (described in Probabilities and Direct Weights).

The following topics provide additional information about how the Master Index Match Engine works:

Master Index Match Engine Structure

The Master Index Match Engine was designed to be very flexible and generic, allowing you to customize existing matching rules and to define additional rules using Java. The match engine framework allows you to create and plug in custom matching comparison functions, or comparators, to the match engine to enable matching against any type of data. The Master Index Match Engine framework includes two main modules. The real-time module stores the predefined and user–defined Java classes that define the matching comparator logic. The design-time modules stores the configuration and validation classes for the comparators.

The Master Index Match Engine provides a wide variety of customizable comparators for you to choose from. You can also create comparators in the real-time module, and create new validation and configuration rules in the design-time module. The structure of the design-time module supports validations, weighting curves, and class dependencies. There is also an option that allows you load information from a data file and use that information to calculate a matching weight.

Master Index Match Engine Configuration Files

The Master Index Match Engine compares two records and returns a match weight indicating the likelihood of a match between the two records based on information provided in configuration files. In a master index application, the match engine is configured by these two files in the Match Engine node of the master index project: the matching configuration file (matchConfigFile.cfg) and the comparators list (comparatorsList.xml). The matching configuration file defines the configuration and parameters for the matching comparator functions and the comparators list defines each comparator available to the match engine.

Matching criteria and logic are defined in the match configuration file in the master index project (matchConfigFile.cfg). The data fields that are sent to the Master Index Match Engine for matching, known as the match string, are defined in the MatchingConfig section of mefa.xml in the master index project. The match engine configuration files define which matching rules to use to process each match field. The match engine provides a comprehensive set of comparator functions, and you can create custom comparators if needed.

Master Index Match Engine Matching Weight Formulation

The Master Index Match Engine determines the matching weight between two records by comparing the match string fields between the two records using the rules defined in the match configuration file and taking into account the matching logic specified for each field. The Master Index Match Engine can use either matching (m) and unmatching (u) conditional probabilities or agreement and disagreement weight ranges to fine-tune the match process. It uses the underlying algorithm to arrive at a match weight for each match string field. The weight generated for each field in the match string indicates the level of match between each field. The weights assigned to each field are then summed together for a total, composite matching weight between the two records. Agreement and disagreement weight ranges or m-probabilities and u-probabilities are defined in the match configuration file.

m-probabilities and u-probabilities are expressed as double values between one and zero (excluding one and zero) and can have up to 16 decimal points. Agreement and disagreement weights are expressed as double values and can have up to 16 decimal points. When using agreement and disagreement weights, the Master Index Match Engine assigns a matching weight to each field that falls between the agreement and disagreement weights specified for the field. Thus, the maximum agreement weight between two records is the sum of the defined agreement weights for each field. The minimum disagreement weight is the sum of the defined disagreement weights for each field. For more information about weight calculation, see Determining the Weight Range.

Master Index Match Engine Data Types

The Master Index Match Engine is built on a flexible framework that allows you to customize and create matching rules for various types of data. The match engine provides an extensive set of comparison functions for matching on various types of fields, such as numbers, dates, single characters, and so on. The match engine also provides more specialized comparison functions for searching on specific types of data, such as person names, address fields, social security numbers, genders. You can define custom comparison functions and custom standardization logic for different data types or variants on data types. These customizations are easily incorporated into a master index application, allowing you to completely customize the match and standardization process for your specific data format.

The Master Index Match Engine and the Master Index Standardization Engine

The Master Index Match Engine works with the Master Index Standardization Engine to provide an accurate comparison of two records. The standardization engine reads input data and determines how to parse, normalize, and phonetically encode the data in order to create a standard set of values to use for match comparison. The standardization engine can standardize free-form text fields, such as street address fields or business names, and separate them into their individual parts, such as house numbers, street names, and so on, allowing the match engine to generate a more accurate weight for free-form data.