Understanding the Master Index Standardization Engine

Rules-Based Framework

In the rules-based framework, the standardization process is define in the underlying Java code. You can configure several aspects of the standardization process, such as the detectable patterns for each data type, how values are normalized, and how the input string is cleansed and parsed. You can define custom rules-based data types and variants by creating custom Java packages that define processing.

About the Rules-Based Framework

In the rules-based framework, individual field components are recognized by the patterns defined for each data type and by information provided in configurable files about how to preprocess, match, and postprocess each field components. The rules-based framework processes data in the following stages.

Using the street address data type, for example, street addresses are parsed into their component parts, such as house numbers, street names, and so on. Certain fields are normalized, such as street name, street type, and street directions. The street name is then phonetically converted. Standardization logic is defined in the standardization engine configuration files and in the Master Index Configuration Editor or mefa.xml in a master index project.

Rules-Based Configuration

The rules-based standardization configuration files are stored in the master index project and appear as nodes in the Standardization Engine node of the project. These files are separated into groups based on the primary data types and variants being processed. Rules-based data types and variants, such as the default Address and Business Name types, use the following configuration file types.