Master Index Standardization Engine Data Types and Variants (Understanding the Master Index Standardization Engine)

Understanding the Master Index Standardization Engine

Master Index Standardization Engine Data Types and Variants

A data type is the primary kind of data you are processing, such as person names, addresses, business names, automotive parts, and so on. A variant is a subset of a data type that is designed to standardize a specific kind of data. For example, for addresses and names, the variants typically define rules for the different countries in which the data originates. For automotive parts, the variants might be different manufacturers. Each data type and variant uses its own configuration files to define how fields in incoming records are parsed, standardized, and classified for processing. Data types are sometimes referred to as standardization types.

In the default implementation with a master index application, the engine supports data standardization on the following types of data:

Person Information (described in FSM–Based Person Name Configuration)
Telephone Numbers (described in FSM–Based Telephone Number Configuration)
Street Addresses (described in Rules–Based Address Data Configuration)
Business Names (described in Rules-Based Business Name Configuration)

In the default configuration, the standardization engine expects street address and business names to be in free-form text fields that need to be parsed prior to normalization and phonetic encoding. Person and phone information can also be contained in free-form text fields, but theses types of information can also be processed if the data is already parsed into its individual components. Each data type requires specific customization to mefa.xml in the master index project. This can be done by modifying the file directly or by using the Master Index Configuration Editor.