Lexicon files list the possible values for a specific field that the standardization engine uses to recognize input data. A lexicon file can be defined for each field on which standardization is performed. These files are referenced from the process definition file when defining matching or processing rules. The lexicon files are located in the resource folder for the data type or variant from which they are referenced.
Lexicon files are simply text files with a single column that lists the possible field values. They are typically given the same name as the token type, or standardization component, that they define. For example, the lexicon files for first and last names are givenNames.txt and surnames.txt. You can modify these files as needed to suit your data requirements and you can create new lexicon files to reference from the process definition file.
Below is an excerpt of the given names lexicon file:
ALIA ALICA ALICAI ALICE ALICEMARIE ALICEN ALICIA ALICJA ALID ALIDA ALIHAN ALINA ALINE ALIS ALISA ALISE ALISHA ALISHIA ALISIA ALISON |