Understanding the Master Index Standardization Engine

ProcedureTo Define the State Model and Processing Rules

  1. In /WorkingDirectory/resource, create a new XML file named standardizer.xml.


    Tip –

    You can copy the file from an existing variant in the data type to which you are adding the custom variant. Then you can modify the file for the new variant.


  2. If the data you are processing does not need to be parsed, but only needs to be normalized, define normalization rules in the normalizer section of the file.

    For more information, see Data Normalization Definitions and Standardization Processing Rules Reference.

  3. If the data you are processing needs to be parsed and normalized, define the state model in the upper portion of the file.

    For information about the state model and the elements that define it, see Standardization State Definitions.


    Note –

    The next several steps use the processing rules described in Standardization Processing Rules Reference. Some of these rules might require that you create normalization and lexicon files.


  4. In the inputSymbols section of the file, define each input symbol along with any processing rules.

    For more information, see Input Symbol Definitions.

  5. In the outputSymbols section of the file, define each output symbol along with any processing rules.

    For more information, see Output Symbol Definitions.

  6. In the cleanser section of the file, define any cleansing rules that should be performed against the data prior to tokenization.

    For more information, see Data Cleansing Definitions.

  7. If you created any rules that reference normalization or lexicon files, continue to Creating Normalization and Lexicon Files.