Configuring Sun Master Indexes

Configuring the Standardization Engine

You can configure the standardization engine by specifying the standardization engine to use, configuring the files that define data standardization, and plugging in custom standardization and matching rules. You only need to specify the standardization engine to use if you are using an engine other than the Master Index Standardization Engine.

Perform any of these steps to configure the standardization engine:

Specifying a Standardization Engine for the Master Index

Sun Master Index can support standardization engines from different vendors depending on the adapter configured to communicate with the engine. Default classes are provided for using the Master Index Standardization Engine. You can implement a custom standardization engine along with customized adapters. The standardization engine configuration is defined by standardizer-api and standardizer-config elements.


Note –

The default adapters for the Master Index Standardization Engine are com.sun.mdm.index.matching.adapter.SbmeStandardizerAdapter and com.sun.mdm.index.matching.adapter.SbmeStandardizerAdapterConfig.


ProcedureTo Specify the Standardization Engine

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click mefa.xml.

    The file opens in the NetBeans XML editor.

  2. Scroll to the standardizer-api element in the MatchingConfig section.

  3. Specify the Java class for the standardization adapter to use, using the fully qualified class name as shown below.


    <standardizer-api>
       <class-name>
        com.sun.mdm.index.matching.adapter.MyStandardizerAdapter
       </class-name>
    </standardizer-api>
  4. In the standardizer-config element, specify the Java class for the configuration of the standardization adapter, using the fully qualified class name as shown below.


    <standardizer-config>
       <class-name>
         com.sun.mdm.index.matching.adapter.SbmeStandardizerAdapterConfig
       </class-name>
    </standardizer-config>
  5. Save and close the file.

Modifying Master Index Standardization Files

You can fine-tune the standardization process by modifying the standardization files. For example, you can insert additional names or terms into the normalization or lexicon files, such as giventNames.txt and givenNameNormalizatin.txt. Depending on your data requirements, you might need to modify additional standardization files. Some of the patterns files (most notably the address patterns files) are very complex and should only be modified by personnel who thoroughly understand the defined patterns and tokens. If you modify standardization files, make sure you modify them for each variant specified in mefa.xml.

You can modify the data configuration files (lexicon and normalization files), and you can also modify the process configuration files that define the data types, variants, and how data is standardized. The process files are more complex, and should only be modified by one who is familiar with standardization concepts and with the Master Index Standardization Engine. Instructions for modifying these files are not included here. For information about these files, see Understanding the Master Index Standardization Engine.

ProcedureTo Modify Standardization Data Configuration Files

  1. In the Projects window, expand the master index project to configure and then expand Standardization Engine.

  2. Expand instance, expand the variant to modify, and then expand resources.

  3. Open the file you want to modify in the NetBeans text editor.

  4. Modify the file in accordance with the information presented for each data type in Understanding the Master Index Standardization Engine.

  5. Save and close the file.

Importing Standardization Data Types and Variants

The Master Index Standardization Engine is based on a very flexible framework that allows you to define new data types and variants so you can standardize any type of data in a custom manner. You can create new data types and variants based on the finite state machine and new variants for the existing rules-based data types. You need to import the data type or variant package into NetBeans to make it available to all master index applications or only the current one.

This section only describes importing custom data types and variants after they have been created. For information about creating a custom data type or variant, see Understanding the Master Index Standardization Engine

ProcedureTo Import a Data Type or Variant

  1. In the Projects window, expand the main master index project.

  2. Right-click Standardization Engine, and select Import Standardization Plug-in.

  3. In the dialog box that appears, navigate to the location of the plug-in package.

  4. Select the file containing the plug-in, and then click Open.

  5. Do one of the following:

    • To import the plug-in and make it available to all future master index application, click Yes.

    • To import the plug-in and make it only available to the current master index application, click No.

    The data type or variant is imported into the Standardization Engine node. Data types add folders just beneath the Standardization Engine node; variants add folders under the appropriate data type (as specified in the variant package).

  6. In the Standardization Engine node, navigate to the new data type or variant you added and verify that all of the required files are there.

Deleting a Standardization Variant or Data Type

If you add a data type or variant to a master index application in error, you can remove it from the Standardization Engine node. You can also delete any of the existing data types or variants if they are not in use. Use caution when removing variants or data types; this action cannot be undone.

ProcedureTo Delete a Variant or Data Type

  1. Back up the source files for the data type or variant in case you need them at a later time.


    Note –

    The default data types and variants are stored in NetBeansHome/soa2/modules/ext/mdm/standardizer/deployment.


  2. In the Project window, navigate to the Standardization Engine node in the master index project and then to the data type or variant you want to remove.

  3. Right-click the folder containing the files to remove, and then select Delete.

    A confirmation dialog appears.

  4. Click Yes.

    The data type or variant is removed from the project.