1 Master Person Index Standardization Engine Reference

This chapter provides an overview of the Oracle Healthcare Master Person Index (OHMPI) Standardization Engine and introduces you to standardization concepts.

This chapter includes the following sections:

Introducing the OHMPI Standardization Engine
Understanding Standardization Concepts

Introducing the OHMPI Standardization Engine

The OHMPI Standardization Engine works together with the OHMPI Match Engine to provide data parsing, data standardization, phonetic encoding, and record matching capabilities for external applications, such as master person index applications. Before records can be compared to evaluate the possibility of a match, the data contained in those records must be normalized and in certain cases standardized and phonetically encoded. Once the data is conditioned, the match engine determines a match weight for each field defined for matching. The standardization engine is built on a flexible framework that allows you to customize the standardization process and extend standardization rules.

The Master Person Index Standardization Engine is designed to work with the master person index applications created by Oracle Healthcare Master Person Index. The standardization engine can also be called from other applications, web services, web applications, and so on. It is highly configurable in the Oracle Healthcare Master Person Index environment and can be used to standardize various types of data. The OHMPI Standardization Engine works in conjunction with the OHMPI Match Engine to improve the quality of your data.

Understanding Standardization Concepts

Data standardization transforms input data into common representations of values to give you a single, consistent view of the data stored in and across organizations. Standardizing the data stored in disparate systems provides a common representation of the data so you can easily and accurately compare data between systems.

Data standardization applies multiple transformations against the data: parsing into individual components, cleansing, normalization, and data typing. These actions help cleanse data to prepare it for matching and searching. Some fields might require all the steps, some just normalization, and other data might only need phonetic encoding that is performed in tandem with standardization. Typically data is first parsed, then normalized, and then typed using patterns analysis, though some cleansing might be needed prior to parsing.

Standardization can include the following phases.

Data Parsing or Reformatting
Data Normalization

Phonetic Encoding can also be included to complete the data preparation process for matching.

Data Parsing or Reformatting

If incoming records contain data that is not formatted properly, it must be reformatted before it can be normalized. This process identifies and separates each component of a free-form text field that contains multiple pieces of information. Reformatting can also include removing characters or strings from a field that are not relevant to the data. A good example is standardizing free-form text address fields. If you are comparing or searching on street addresses that are contained in one or more free-form text fields (that is, the street address is contained in one field, apartment number in another, and so on), those fields need to be parsed into their individual components, such as house number, street name, street type, and street direction. Then certain components of the address, such as the street name and type, can be normalized. Field components are also known as tokens, and the process of separating data into its tokens is known as tokenization.

Data Normalization

Normalizing data converts it into a standard or common form. A common use for normalization is to convert nicknames into their standard names, such as converting ”Rich” to ”Richard” or ”Meg” to ”Margaret.” Another example is normalizing street address components. For example, both ”Dr.” or ”Drv” in a street address might be normalized to ”Drive.” Normalized values are obtained from lookup tables. Once a field value is normalized, that value can be more accurately compared against values in other records to determine whether they are a match.

Phonetic Encoding

Once data has gone through any necessary reformatting and normalization, it can be phonetically encoded. In a master person index application, phonetic values are generally used in blocking queries in order to obtain all possible matches to an incoming record. They are also used to perform searches from the Master Index Data Manager (MIDM) that allow for misspellings and typographic errors. Typically, first names use Soundex encoding and last names and street names use NYSIIS encoding, but the OHMPI Standardization Engine supports several additional phonetic encoders as well.