Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Master Index Match Engine Reference Java CAPS Documentation |
Master Index Match Engine Reference
About the Master Index Match Engine
Master Index Match Engine Overview
Deterministic and Probabilistic Data Matching
Probabilities and Direct Weights
Matching and Unmatching Probabilities
Agreement and Disagreement Weight Ranges
How the Master Index Match Engine Works
Master Index Match Engine Structure
Master Index Match Engine Configuration Files
Master Index Match Engine Matching Weight Formulation
Master Index Match Engine Data Types
The Master Index Match Engine and the Master Index Standardization Engine
Oracle Java CAPS Master Index Standardization and Matching Process
Master Index Match Engine Matching Configuration
The Master Index Match Engine Match Configuration File
Master Index Match Engine Match Configuration File Format
Match Configuration File Sample
Master Index Match Engine Matching Comparison Functions At a Glance
Master Index Match Engine Comparator Definition List
Master Index Match Engine Comparison Functions
Advanced Bigram Comparator (b2)
Uncertainty String Comparators
Advanced Jaro String Comparator (u)
Winkler-Jaro String Comparator (ua)
Condensed String Comparator (us)
Advanced Jaro Adjusted for First Names (uf)
Advanced Jaro Adjusted for Last Names (ul)
Advanced Jaro Adjusted for House Numbers (un)
Advanced Jaro AlphaNumeric Comparator (ujs)
Unicode String Comparator (usu)
Unicode AlphaNumeric Comparator (usus)
Exact Character-to-Character Comparator (c)
Condensed AlphaNumeric SSN Comparator (nS)
Date Comparator With Years as Units (dY)
Date Comparator With Months as Units (dM)
Date Comparator With Days as Units (dD)
Date Comparator With Hours as Units (dH)
Date Comparator With Minutes as Units (dm)
Date Comparator With Seconds as Units (ds)
Creating Custom Comparators for the Master Index Match Engine
Step 1: Create the Custom Comparator Java Class
Step 2: Register the Comparator in the Comparators List
Step 3: Define Parameter Validations (Optional)
To Define Parameter Validations
Step 4: Define Data Source Handling (Optional)
To Define Data Source Handling
Step 5: Define Curve Adjustment or Linear Fitting (Optional)
To Define Curve Adjustment or Linear Fitting
Step 6: Compile and Package the Comparator
Step 7: Import the Comparator Package Into Oracle Java CAPS Master Index
To Import a Comparison Function
Step 8: Configure the Comparator in the Match Configuration File
Master Index Match Engine Configuration for Common Data Types
Master Index Match Engine Match String Fields
Person Data Match String Fields
Address Data Match String Fields
Business Name Match String Fields
Master Index Match Engine Match Types
Configuring the Match String for a Master Index Application
Configuring the Match String for Person Data
Configuring the Match String for Address Data
Configuring the Match String for Business Names
Fine-Tuning Weights and Thresholds for Oracle Java CAPS Master Index
Customizing the Match Configuration and Thresholds
Customizing the Match Configuration
Probabilities or Agreement Weights
Weight Ranges Using Agreement Weights
Weight Ranges Using Probabilities
Determining the Weight Thresholds
The following topics provide instructions for each step of creating custom comparators. You might need to create multiple Java files and Java packages for the comparator, depending on the validations, data sources, dependency classes, and curve adjustments you use. Create them in the same directory structure because you will need to package them up into a ZIP file when you are through.
Step 5: Define Curve Adjustment or Linear Fitting (Optional)
Step 7: Import the Comparator Package Into Oracle Java CAPS Master Index
Step 8: Configure the Comparator in the Match Configuration File
Before you create your custom comparators, take into account the following requirements for the comparators.
Determine how many comparators you need to create and whether each will require a different Java class or some can use the same Java class.
Determine what parameters, if any, you need to define for each comparator.
Determine what validations, if any, need to be created.
Determine whether you need to use a data source.
Decide if the comparators you create will have a dependency on any other comparator classes.
Decide whether you will use curve adjustment, linear fitting, or neither.
The first step to creating custom comparators is defining the matching logic in custom comparator Java classes that are stored in the real-time module of the Master Index Match Engine. Follow these guidelines when creating the class:
Create a working directory that will contain all the Java packages and the comparators list file for the new comparators.
The Java classes need to implement com.sun.mdm.matcher.comparators.MatchComparator.java interface, located in Matcher.jar. This class includes the methods described below.
Once you create the Java classes, continue to Step 2: Register the Comparator in the Comparators List.
The initialize method initializes the values for the parameters, data sources, and dependency class used for each custom comparator. It provides the necessary information to access the comparator's configuration in the match configuration file and the comparators list file.
void initialize(Map<String, Map> params, Map<String, Map> dataSources, Map<String, Map> dependClassList)
|
None.
None.
The compareFields method contains all the comparison logic needed to compare two field values and calculate a matching weight that shows how similar the values are.
double compareFields(String recordA, String recordB, Map context)
|
A number between zero and one that indicates how closely two field values match.
MatchComparatorException
The setRTParameters method sets the runtime parameters for the comparator, providing the ability to customize every call to the parameter.
void setRTParameters(String key, String value)
|
None.
None.
The stop method closes any related connections to the data sources used by the comparator.
void stop()
None.
None.
None.
In order to include new comparators in a master index application, you need to create a comparators list file defining the configuration of the comparators. When you import the comparator package into the master index application, this file is read and the entries are added to the comparators list for the project.
Below is a sample comparators list file. Note that the first comparator includes all possible configurations (parameters, dependency classes, data sources, and curve adjust). Most comparators will not be that complex. The second comparator class defines two comparators, Approx and Adjust.
<?xml version="1.0" encoding="UTF-8"?> <comparators-list xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="comparatorsList.xsd"> <group description="New group of comparators" path="com.mycomparators.matchcomparators"> <comparator description="New Exact Comparator"> <className>NewExactComparator</className> <codes> <code description="New Exact Comparator" name="Exact" /> </codes> <params> <param description="Fixed length" name="length" type="java.lang.Integer" /> <param description="Data type" name="dataType" type="java.lang.String" /> </params> <data-sources> <datasource description="Serial numbers" type="java.io.File" /> </data-sources> <dependency-classes> <dependency-class matchfield="Serial" name="com.genericcomparaotrs.StringComparator" /> </dependency-classes> <curve-adjust status="true" /> </comparator> <comparator description="New Approximate Comparator"> <className>NewApproxComparator</className> <codes> <code description="New approximate comparator" name="Approx" /> <code description="New adjustable comparator" name="Adjust" /> </codes> </comparator> </group> </comparators-list>
Tip - The comparators list file needs to be in the same working directory you created for the custom comparator Java classes.
<?xml version="1.0" encoding="UTF-8"?> <comparators-list xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="comparatorsList.xsd"> ... </comparators-list>
The group description and Java package for the group.
A description for each comparator.
The Java class name for each comparator or comparator subgroup.
The unique identifying name for each comparator.
A list of static parameters for each comparator or comparator subgroup (optional). If you define parameters, you must also perform the steps under Step 3: Define Parameter Validations (Optional).
A list of data sources for each comparator or comparator subgroup (optional). If you define data sources, you must also perform the steps under Step 4: Define Data Source Handling (Optional).
A list of dependency classes for each comparator or comparator subgroup (optional).
Whether to use curve adjustment for each comparator or comparator subgroup (optional). If you set curve adjustment to true, you must perform the steps under Step 5: Define Curve Adjustment or Linear Fitting (Optional).
If your custom comparators take parameters, you should create a Java class that validates the parameter properties. You need to perform this step if you defined parameters for the comparator in comparatorsList.xml. You do not need to create this file in the same package as the Java comparator class, but for packaging purposes, create it in the same working folder.
For example, if the comparator is defined by a class named ExactComparator, the parameter validation class would be ExactComparatorParamsValidator.
The method contained in this class is described below.
The ParametersValidator class contains one method, validateComparatorsParameters, that allows you to validate parameter types, ranges, and other properties. For logging purposes, you can use net.java.hulp.i18n, which is used within matcher.jar, or you can use your own logger.
void validateComparatorsParameters(Map<String, Object> params)
|
None.
MatcherException
If your custom comparators use external data sources to provide additional information for matching weight calculations, you need to create a Java class that lets you load the file to memory or have real-time access to the data file content. You can also define validations to perform. You do not need to create this file in the same package as the Java comparator class, but for packaging purposes, create it in the same working folder.
You need to perform this step if you defined lines similar to the following in comparatorsList.xml:
<data-sources> <datasource description="Serial numbers" type="java.io.File" /> </data-sources>
For example, if the comparator is defined by a class named ExactComparator, the parameter validation class would be ExactComparatorSourcesHandler.
The method in this class is described below.
The DataSourcesHandler class contains one method, handleComparatorsDataSources, that allows you to define properties for the data source. This method takes one parameter that is a DataSourcesProperties object. This class and its methods are described in DataSourcesProperties Class.
Object handleComparatorsDataSources(DataSourcesProperties dataSources)
|
Object
MatcherException
IOException
The DataSourcesProcerties interface is used as a parameter to the handleComparatorsDataSources described in Step 4: Define Data Source Handling (Optional). The methods in the class are listed and described below.
The getDataSourcesList returns the comparator's list of associated data source paths.
List getDataSourcesList(String codeName)
|
A list of paths and filenames as specified in comparatorsList.xml.
None.
The isDataSourceLoaded method checks whether a specific file has already been loaded or opened.
boolean isDataSourceLoaded(String sourcePath)
|
A boolean indicator of whether the specified file has already been loaded or opened.
None.
The setDataSourceLoaded method sets the loading status of a data source.
void setDataSourceLoaded(String sourcePath, boolean status)
|
None.
None.
The getDataSourceObject method returns the file located at the specified source path.
Object getDataSourceObject(String sourcePath)
|
An object containing the data source information.
None.
If your custom comparators use curve adjustment or linear fitting to adjust matching weight calculations, you need to create a Java class that defines the curve. You do not need to create this file in the same package as the Java comparator class, but for packaging purposes, create it in the same working folder.
You need to perform this step if you defined the following line in comparatorsList.xml for the comparator:
<curve-adjust status="true" />
For example, if the comparator is defined by a class named ExactComparator, the parameter validation class would be ExactComparatorCurveAdjustor.
The method in this class is described below.
The processCurveAdjustment method provides handling for curve adjustment within a specific match comparator.
double[] processCurveAdjustment(String compar, double[] cap)
|
An array of curve adjustment values.
MatcherException
Before you perform these steps, make sure you have completed Step 1: Create the Custom Comparator Java Class through Step 5: Define Curve Adjustment or Linear Fitting (Optional).
When you are finished defining all the Java classes for the comparators and have registered each comparator in your comparators list file, you can compile the Java code and package the files into a ZIP file that you can then import into a master index application. Compile the classes using the compiler of your choice.
To package the files, create a temporary directory and copy the comparators list file to the directory. Copy all the class folders and files to the same directory. The top level of the temporary directory should include comparatorsList.xml and a com folder (which contains all the Java classes). Create a ZIP file of the directory. For more information about the ZIP package, see About the Comparator Package.
After you compile and package the comparator, continue to Step 7: Import the Comparator Package Into Oracle Java CAPS Master Index.
You need to import the your new comparators into NetBeans to make them available to all master index applications or only the current application.
The contents of the ZIP file are imported into the Match Engine node and the new comparators are added to the list of comparator definitions in comparatorsList.xml.
After you import custom comparators, you need to add them to the match configuration file (matchConfigFile.cfg) and define the matching configuration. This makes the comparator available for use in the master index match string. For information about this file, see The Master Index Match Engine Match Configuration File. For instructions on modifying the file, see Configuring the Comparison Functions for a Master Index Application in Oracle Java CAPS Master Index Configuration Guide.