Normalization files list nonstandard values for a field along with their corresponding normalized value. The standardization engine uses these files to convert nonstandard values into a standard form. These files are referenced from the process definition file when defining normalization rules. The normalization files are located in the resource folder for the data type or variant from which they are referenced.
The most common example of normalization is a nickname file that provides a list of nicknames along with the standard version of each name. For example, “Beth” and “Liz” might both be standardized to “Elizabeth”. Each row in the file contains a nickname and its corresponding standardized version separated by a pipe character (|). You can modify these files as needed to suit your data processing needs, or you can create new normalization files to reference from the process definition file.
Below is an excerpt of the given names normalization file:
BEV|BEVERLY BIANCA|BLANCHE BILLIE|WILLIAM BILLYE|WILLIAM BILLY|WILLIAM BILL|WILLIAM BIRGIT|BRIDGET BLANCA|BLANCHE BLANCH|BLANCHE BOBBIE|ROBERT BOBBI|ROBERT BOBBYE|ROBERT BOBBY|ROBERT BOB|ROBERT BONNY|BONNIE BRADLY|BRADLEY |