Configuring Sun Master Indexes

Configuring the Match Engine

You can configure the match engine by specifying the match engine to use and configuring the predefined comparison functions. You can also plug in custom standardization and matching rules. You only need to specify the match engine to use if you are using an engine other than the Master Index Match Engine.

Perform any of these steps to configure the match engine:

Specifying a Match Engine for the Master Index

Sun Master Index can support different match engines depending on the adapter configured to communicate with the engine. Default classes are provided for using the Master Index Match Engine. You can implement a custom match engine along with custom adapters. The match engine configuration is defined by the matcher-api and matcher-config elements.


Note –

The default adapters for the Master Index Match Engine are com.sun.mdm.index.matching.adapter.SbmeMatcherAdapter and com.sun.mdm.index.matching.adapter.SbmeMatcherAdapterConfig.


ProcedureTo Configure the Match Engine

  1. In the Projects window, expand the Configuration node in the project you want to modify, and then double-click mefa.xml.

    The file opens in the NetBeans XML editor.

  2. Scroll to the matcher-api element in the MatchingConfig section.

  3. Specify the Java class for the matching adapter to use, using the fully qualified class name as shown below.


    <matcher-api>
       <class-name>com.sun.mdm.index.matching.adapter.SbmeMatcherAdapter
       </class-name>
    </matcher-api>
  4. In the matcher-config element, specify the Java class for the configuration of the matching adapter, using the fully qualified class name as shown below.


    <matcher-config>
       <class-name>
        com.sun.mdm.index.matching.adapter.SbmeMatcherAdapterConfig
       </class-name>
    </matcher-config>
  5. Save and close the file.

Configuring the Comparison Functions for a Master Index Application

The match configuration file in the Match Engine node of the master index project lists and defines the configuration for each match type based on the predefined comparison function for the Master Index Match Engine. These match types can be applied to each field in the match string. You can modify the configuration of the existing matches types, add new match types, and specify whether the match engine should use agreement and disagreement weights or m-probabilities and u-probabilities.

For more information about the structure of the match configuration file and the comparison functions you can use, see Understanding the Master Index Match Engine .

ProcedureTo Configure the Comparison Functions (Configuration Editor)

  1. In the Projects window, right-click the Configuration node in the project you want to modify, and then click Edit.

    The Configuration Editor appears.

  2. Click the Matching tab.

    The Matching page appears with a list of fields defined for matching and a list of comparators that you can modify.

  3. In the Probability Type field, select one of the following:

    • Use Agree/Disagreement Weight Ranges – Uses agreement and disagreement weights for matching. If agreement and disagreement weights are used, the m-probability and u-probability fields are ignored and do not appear on the Matching page.

    • Use M-Probabilities/U-Probabilities – Uses m-probabilities and u-probabilities for matching. If m-probabilities and u-probabilities are used, the agreement and disagreement weight fields are ignored and do not appear on the page.

  4. To add a new matching rule:

    1. Click Add in the lower right portion of the window.

      The Edit Matching Rules dialog box appears.

    2. Fill in the fields described in Match Comparator Configuration Properties for Sun Master Index.

    3. Click OK.

  5. To edit an existing matching rule:

    1. Click Edit in the lower right portion of the window.

      The Edit Matching Rules dialog box appears.

    2. Change the value of any of the fields described in Match Comparator Configuration Properties for Sun Master Index.

    3. Click OK.

  6. To remove an existing matching rule:

    1. In the matching rule table, select the rule you want to delete.

    2. Click Remove.

  7. On the Configuration Editor toolbar, click Save.

ProcedureTo Configure the Comparison Functions (Text Editor)

  1. In the project window, expand the master index project, and then expand the master index application.

  2. In the Match Engine folder, double-click matchConfigFile.cfg.

  3. For the Probability Type, enter one of the following values:

    • 0 – Uses m-probabilities and u-probabilities for matching. If m-probabilities and u-probabilities are used, the agreement and disagreement weight fields are ignored.

    • 1 – Uses agreement and disagreement weights for matching. If agreement and disagreement weights are used, the m-probability and u-probability fields are ignored.

  4. For each comparison function you want to configure, modify the value of any of the columns described in Match Comparator Configuration Properties for Sun Master Index.

  5. Save and close the file.

Match Comparator Configuration Properties for Sun Master Index

The following table lists and describes the Configuration Editor fields used to define the comparison functions. It also lists the corresponding column in the match configuration file if you want to modify the file directly.

Configuration Editor Field 

Match Configuration File Element and Column Number 

Description 

Match Type

match-type (column 1) 

A value that indicates to the Master Index Match Engine how each field should be weighted. Each field included in the match string (the MatchingConfig section of mefa.xml) must have a match type corresponding to a match type defined in this file.

Match Size

size (column 2) 

The number of characters on which matching is performed, beginning with the first character. For example, to match on only the first four characters in a 10-digit field, the value of this column should be “4”. 

Null Field

null-field (column 3) 

An index that specifies how to calculate the total weight for null fields or fields that only contain spaces. You can specify any of the following values. The Configuration Editor value is given first, followed by the match configuration file value in parenthesis.

  • Zero weight (0) - If one or both fields are empty, the weight used for the field is 0 (zero).

  • Full Combination weight (1) - If both fields are empty, the agreement weight is used; if only one field is empty, the disagreement weight is used.

  • Full Agreement weight (a1) - Specifies to use the full agreement weight if both fields are null.

  • 1/x of the Agreement weight (ax) - Specifies to use the a fraction of the agreement weight if both fields are empty. The agreement weight is multiplied by the fraction 1/x to obtain the match weight for that field. When modifying the match configuration file directly, the default is “2” if no number is specified. You can specify any number from 1 through 10.

  • Full disagreement weight (d1) - Specifies to use the full disagreement weight if both fields are null.

  • 1/x of the disagreement weight (dx) - Specifies to use the disagreement weight if only one field is empty. The disagreement weight is multiplied by the fraction 1/x to obtain the match weight for the field. When modifying the match configuration file directly, the default is “2” if no number is specified. You can specify any number from 1 through 10.

In the above descriptions, the agreement and disagreement weights are either specified in this file or calculated using a logarithmic formula based on the m and u-probabilities (depending on the probability type). 

Function

function (column 4) 

The type of comparison to perform when weighting the field. For information about the available comparison functions, see Master Index Match Engine Comparison Functions, in Understanding the Master Index Match Engine .

Agreement Weight

agreement-weight (column 7) 

The matching weight to be assigned to a field given that the fields match between two records; that is, the maximum match weight for a field. This number can be between 0 and 100 and can have up to 16 decimal points. Only set this value if the Probability Type is set to use agreement and disagreement weights.

Disagreement Weight

disagreement-weight (column 8) 

The matching weight to be assigned to a field given that the fields do not match between two records; that is, the minimum match weight for a field. This number can be between 0 and -100 and can have up to 16 decimal points. Only set this value if the Probability Type is set to use agreement and disagreement weights. 

M-Probability

m-prob (column 5) 

The initial probability that the specified field in two records will match if the records match. The probability is a double value between 0 and 1, and can have up to 16 decimal points. Only set this value if the Probability Type is set to use probabilities.

U-Probability

u-prob (column 6) 

The initial probability that the specified field in two records will match if the records do not match. The probability is a double value between 0 and 1, and can have up to 16 decimal points. Only set this value if the Probability Type is set to use probabilities. 

Extra Parameters

parameters (column 9) 

Parameters correspond to the comparison function specified in the Function field or column. Some comparison functions do not take any parameters and some take multiple parameters. For information about which functions take parameters and the parameters they take, see Master Index Match Engine Comparison Functions, in Understanding the Master Index Match Engine .

Importing Custom Comparison Functions

The Master Index Match Engine is based on a very flexible framework that allows you to define new algorithms in the form of comparison functions that compare field values between two records. You need to import the comparison function into NetBeans to make it available to all master index applications or only the current application.

This section only describes importing custom comparison functions after they have been created. For information about creating a custom comparison function, see Creating Custom Comparators for the Master Index Match Engine, in Understanding the Master Index Match Engine

ProcedureTo Import a Comparison Function

  1. In the Projects window, expand the main master index project.

  2. Right-click Match Engine, and select Import Comparator Plug-in.

  3. In the dialog box that appears, navigate to the location of the plug-in ZIP file.

  4. Select the file containing the plug-in, and then click Open.

  5. Do one of the following:

    • To import the plug-in and make it available to all future master index application, click Yes.

    • To import the plug-in and make it only available to the current master index application, click No.

    The contents of the ZIP file are imported into the Match Engine node and the new comparators are added to the list of comparator definitions in comparatorsList.xml.

  6. In the Match Engine node, navigate to the /lib folder that was added and verify that all of the required files are there.

  7. Open comparatorsList.xml and verify the new comparator definitions are included.

Deleting a Custom Comparison Function

If you add a custom comparison function to a master index application in error, you can remove it from the Match Engine node. Optionally, you can simply make the comparison function unavailable by removing it from the comparators list in comparatorsList.xml.

ProcedureTo Delete a Custom Comparison Function

  1. Back up the source files for the function in case you need them at a later time.

  2. In the Project window, navigate to the Match Engine node in the master index project and then to the /lib folder.

  3. Right-click the folder containing the files to remove, and then select Delete.

    A confirmation dialog appears.

  4. Click Yes.

    The source files are removed from the project.

  5. Open comparatorsList.xml and remove the comparator definitions from the list.

  6. Save and close the file.