Understanding the Master Index Standardization Engine

Finite State Machine Framework

A finite state machine (FSM) is composed of one or more states and the transitions between those states. The Master Index Standardization Engine FSM framework is designed to be highly configurable and can be easily extended with no Java coding. The following topics describe the FSM framework and the configuration files that define FSM–based standardization.

About the Finite State Machine Framework

In an FSM framework, the standardization process is defined as one or more states. In a state, only the input symbols defined for that state are recognized. When one of those symbols is recognized, the following action or transition is based on configurable processing rules. For example, when an input symbol is recognized, it might be preprocessed by removing punctuation, matched against a list of tokens, and then postprocessed by normalizing the input value. Once this has been completed for all input symbols, the standardization engine determines which token is the most likely match.

FSM-based processing includes the following steps:

Using the person data type, for example, first names such as “Bill” and “Will” are normalized to “William”, which is then converted to its phonetic equivalent. Standardization logic is defined in the standardization engine configuration files and in the Master Index Configuration Editor or mefa.xml in a master index project.

FSM-Based Configuration

The FSM-based standardization configuration files are stored in the master index project and appear in the Standardization Engine node of the project. These files are separated into groups based on the primary data types being processed. Data type groups have further subsets of configuration files based on the variants for each data type. FSM-based data types and variants, such as PersonName and PhoneNumber, include the following configuration file types.