Data Normalization Definitions (Understanding the Master Index Standardization Engine)

Understanding the Master Index Standardization Engine

Data Normalization Definitions

If the data you are standardizing does not need to be parsed, but does require normalization, you can define data normalization rules to be used instead of the state model defined earlier in the process definition file. These rules would be used in the case of person names where the field components are already contained in separate fields and do no need to be parsed. In this case, the standardization engine processes one field at a time according to the rules defined in the normalizer section of standardizer.xml. In this section, you can define preprocessing rules to be applied to the fields prior to normalization.

Below is an excerpt from the PersonName data type. These rules convert the input string to all uppercase, and then processes the FirstName and MiddleName fields based on the givenName input symbol and processes the LastName field based on the surname input symbol.

 <normalizer>
   <preProcessing>
      <uppercase/>
   </preProcessing>
   <for field="FirstName" use="givenName"/>
   <for field="MiddleName" use="givenName"/>
   <for field="LastName" use="surname"/>
</normalizer>

The following table lists and describes the XML elements and attributes for the normalization definitions.

Element	Attribute	Description
normalizer		A container element for the normalization rules to use when field components do not require parsing, but do require normalization.
preProcessing		A container element for any preprocessing rules to apply to the input strings prior to normalization. For more information about preprocessing rules, see Standardization Processing Rules Reference.
for		The input symbol to use for a given field. This is defined in the following attributes.
	field	The name of a field to be normalized.
	use	The name of the input symbol to associate with the field. The processing logic defined for the input symbol earlier in the file is used to normalize the data contained in that field.