Understanding Sun Master Index Configuration Options

Standardization Configuration

Standardization of incoming data applies three functions to the data processed by the master index application: reformatting (or parsing), normalization, and phonetic encoding. These functions help prepare data for matching and searching. Some fields might require all three steps, some just normalization and phonetic conversion, and other data might only need phonetic encoding. You can specify which fields require any of these steps in the standardization configuration section of mefa.xml. In addition, you can specify the nationality of the data being standardized by the Master Index Standardization Engine.

Data Reformatting

If incoming records contain data that is not formatted properly, it must be reformatted before it can be normalized. One good example of this is free-form text address fields. If you are matching or searching on street addresses that are contained in one or more free-form text fields (that is, the street address is contained in one field, apartment number in another, and so on), that field must be parsed into its individual components (house number, street name, street type, and so on) before the data can be normalized.

Data Normalization

When you normalize data, the data is converted into a standard form. A common use for normalization is to convert nicknames into their standard names, such as converting “Rich” to “Richard” or “Meg” to “Margaret”. Another example is normalizing street address components. For example, “Dr.” or “Drv” in a street address might be normalized to “Drive”. Normalized values are obtained from lookup tables.

Phonetic Encoding

Once data has gone through any necessary reformatting and normalization, it can be phonetically encoded. Phonetic values are generally used in blocking queries in order to obtain all possible matches to an incoming record. They are also used to perform searches from the MIDM that allow for misspellings and typographic errors. Typically, first names use Soundex encoding and last names and street names use NYSIIS encoding.