Understanding the Sun Match Engine

Normalization Structures

The normalization structure defines fields that are already parsed, but need to be normalized. It also tells the Sun Match Engine where to place the normalized data in the object structure. Matching on any of these fields is determined by the match string and the logic is defined in the match configuration file.

Of the three data types processed by the Sun Match Engine, only the person name data type is expected to provide information in fields that are already parsed; that is, the first, last, and middle names appear in separate fields, as do the suffix, title, and so on. The person standardization files define logic for normalizing person name fields. By default, only the names you specify for matching in the wizard are defined for normalization. You can define normalization for additional name fields, such as maiden name, spouse’s name, and so on. For each normalization structure, you must specify the national domains for the data you are processing.

Defining New Fields for Normalization

    The fields you define for normalization in the Match Field file can include any name fields. If you define normalization for fields that are not currently defined for normalization in the Match Field file, make the following additional changes.

  1. In the Match Field file, define the normalization structure, using the appropriate standardization type (PersonName), domain selector, and field IDs (FirstName, MiddeName, or LastName).

  2. Add the new fields that will store the normalized field value to the appropriate objects in the Object Definition file.

  3. If any of the normalized fields are to be used for blocking, modify the Candidate Select file by adding the new fields to the blocking query.

  4. Regenerate the master index application in NetBeans to include the new fields in the database creation script, the outbound Object Type Definition (OTD), and the method OTD.

  5. To specify that the new normalized fields be used for matching, do the following:

    1. Determine the match type or the match comparison function you want to use to match the normalized data, and modify the match configuration file (matchConfigFile.cfg) if needed.

    2. Add the new normalized field to the match-columns element of the MatchingConfig section of the Match Field file, making sure to use the appropriate match type from the match configuration file.