JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Understanding the Oracle Java CAPS Match Engine     Java CAPS Documentation
search filter icon
search icon

Document Information

Understanding the Oracle Java CAPS Match Engine

Related Topics

About the Oracle Java CAPS Match Engine

Oracle Java CAPS Match Engine Overview

About the Oracle Java CAPS Match Engine Matching Algorithm

Oracle Java CAPS Match Engine Standardization and Matching Process

Oracle Java CAPS Match Engine Data Types

How the Oracle Java CAPS Match Engine Works

Oracle Java CAPS Match Engine Matching Weight Formulation

Matching and Unmatching Probabilities

Agreement and Disagreement Weight Ranges

Oracle Java CAPS Match Engine Standardization Configuration

Oracle Java CAPS Match Engine Standardization File Types

Oracle Java CAPS Match Engine Internationalization

Oracle Java CAPS Match Engine Matching Configuration

The Oracle Java CAPS Match Engine Match Configuration File

Oracle Java CAPS Match Engine Match Configuration File Format

Match Configuration File Sample

Probability Type

Matching Rules

Oracle Java CAPS Match Engine Matching Comparison Functions

The Match Constants File

Oracle Java CAPS Match Engine and the Oracle Java CAPS Match Engine

Master Index Components and the Oracle Java CAPS Match Engine

Searching and Matching in Oracle Java CAPS Match Engine Applications (Repository)

Standardization and Matching Process in Master Index Applications (Repository)

The Master Index Match String (Repository)

Oracle Java CAPS Match Engine Field Identifiers

Oracle Java CAPS Match Engine Match and Standardization Types

Oracle Java CAPS Match Engine Configuration File Modifications

Configuring the Master Index Matching Service (Repository)

Master Index Standardization Configuration (Repository)

Normalization Structures

Standardization Structures (Parsing and Normalization)

Phonetic Encoding Structures

Master Index Match String Configuration (Repository)

Match and Standardization Engine Configuration

Master Index Phonetic Encoder Configuration (Repository)

Oracle Java CAPS Match Engine Person Data Type Configuration

Oracle Java CAPS Match Engine Person Matching Overview

Oracle Java CAPS Match Engine Person Data Processing Fields

Person Data Match String Fields

Person Data Standardized Fields

Person Data Object Structure

Oracle Java CAPS Match Engine Match Configuration for Person Data

Oracle Java CAPS Match Engine Person Data Standardization Files

Oracle Java CAPS Match Engine Common Standardization Files for Person Data

The Hyphenated Name Category File (personFirstNameDash.dat)

The Person Name Patterns File (personNamePatt.dat)

The Special Characters Reference File (personRemoveSpecChars.dat)

Oracle Java CAPS Match Engine Domain-Specific Standardization Files for Person Data

The Conjunction Reference File (personConjon*.dat)

The Person Constants File (personConstants*.cfg)

The First Name Category File (personFirstName*.dat)

The Generational Suffix Category File (personGenSuffix*.dat)

Last Name Prefix Category File (personLastNamePrefix*.dat)

The Last Name Category File (personLastName*.dat)

The Occupational Suffix Category File (personOccupSuffix*.dat)

The Three-Character Suffix File (personThree*.dat)

The Title Category File (personTitle*.dat)

The Two-Character Suffix File (personTwo*.dat)

The Business-Related Category File (businessOrRelated*.dat)

Configuring the Oracle Java CAPS Match Engine Standardization Files for Person Data

Configuring the Master Index Matching Service for Person Data (Repository)

Configuring the Standardization Structure for Person Data (Repository)

Person Data Normalization Structures

Person Data Phonetic Encoding

Configuring the Match String for Person Data (Repository)

Oracle Java CAPS Match Engine Address Data Type Configuration

Oracle Java CAPS Match Engine Address Matching Overview

Oracle Java CAPS Match Engine Address Data Processing Fields

Address Data Match String Fields

Address Data Standardized Fields

Address Data Object Structure

Match Configuration for Address Data (Repository)

Oracle Java CAPS Match Engine Standardization Configuration for Address Data

The Address Constants File (addressConstants*.cfg)

The Address Clues File (addressClueAbbrev*.dat)

The Address Internal Constants File (addressInternalConstants*.cfg)

The Address Master Clues File (addressMasterClues*.dat)

The Address Patterns File (addressPatterns*.dat)

The Address Output Patterns File (addressOutPatterns*.dat)

Address Pattern File Components

Address Type Tokens

Pattern Classes

Pattern Modifiers

Priority Indicators

Modifying Oracle Java CAPS Match Engine Address Data Configuration Files

Configuring the Matching Service for Address Data (Repository)

Configuring the Standardization Structure for Address Data (Repository)

Address Standardization Structures

Address Phonetic Encoding

Configuring the Match String for Address Data (Repository)

Oracle Java CAPS Match Engine Business Names Data Type Configuration

Oracle Java CAPS Match Engine Business Name Matching Overview

Oracle Java CAPS Match Engine Business Name Processing Fields

Business Name Match String Fields

Business Name Standardized Fields

Business Name Object Structure

Oracle Java CAPS Match Engine Match Configuration for Business Names

Oracle Java CAPS Match Engine Standardization Configuration for Business Names

The Business Constants File (bizConstants.cfg)

The Adjectives Key Type File (bizAdjectivesTypeKeys.dat)

The Alias Key Type File (bizAliasTypeKeys.dat)

The Association Key Type File (bizAssociationTypeKeys.dat)

The General Terms Reference File (bizBusinessGeneralTerms.dat)

The City or State Key Type File (bizCityorStateTypeKeys.dat)

The Business Former Name Reference File (bizCompanyFormerNames.dat)

The Merged Business Name Category File (bizCompanyMergerNames.dat)

The Primary Business Name Reference File (bizCompanyPrimaryNames.dat)

The Connector Tokens Reference File (bizConnectorTokens.dat)

The Country Key Type File (bizCountryTypeKeys.dat)

The Industry Sector Reference File (bizIndustryCategoryCode.dat)

The Industry Key Type File (bizIndustryTypeKeys.dat)

The Organization Key Type File (bizOrganizationTypeKeys.dat)

The Business Patterns File (bizPatterns.dat)

Business Name Tokens

The Special Characters Reference File (bizRemoveSpecChars.dat)

Modifying Oracle Java CAPS Match Engine Business Name Configuration Files

Configuring the Matching Service for Business Names (Repository)

Configuring the Standardization Structure for Business Names (Repository)

Business Name Standardization Structures

Business Name Phonetic Encoding

Configuring the Match String for Business Names (Repository)

Fine-Tuning Weights and Thresholds for Oracle Java CAPS Match Engine (Repository)

Data Analysis Overview

Customizing the Match Configuration and Thresholds

Determining the Match Fields

Customizing the Match Configuration

Probabilities or Agreement Weights

Defining Relative Value

Determining the Weight Range

Weight Ranges Using Agreement Weights

Weight Ranges Using Probabilities

Comparison Functions

Determining the Weight Thresholds

Specifying the Weight Thresholds

Fine-tuning the Thresholds

Match Configuration Comparison Functions for Oracle Java CAPS Match Engine (Repository)

Oracle Java CAPS Match Engine Comparison Functions

Bigram Comparators

Bigram String Comparator (b1)

Advanced Bigram String Comparator (b2)

Uncertainty String Comparators

Generic String Comparator (u)

Advanced Generic String Comparator (ua)

Simplified String Comparator (us)

Simplified String Comparator - FirstName (uf)

Simplified String Comparator - LastName (ul)

Simplified String Comparator - House Numbers (un)

Language-specific String Comparator (usu)

Exact char-by-char Comparator (c)

Numeric Comparators

Generic Number Comparator (n)

Integer Comparator (nI)

Real Number Comparator (nR)

Alphanumeric Comparator (nS)

Date Comparators

Date Comparator - Year only (dY)

Date Comparator - Month-Year (dM)

Date Comparator - Day-Month-Year (dD)

Date Comparator - Hour-Day-Month-Year (dH)

Date Comparator - Min-Hour-Day- Month-Year (dm)

Date Comparator - Sec-Min-Hour-Day- Month-Year (ds)

Prorated Comparator (p)

Oracle Java CAPS Match Engine Comparison Function Options

Oracle Java CAPS Match Engine and the Oracle Java CAPS Match Engine

Implementing the Oracle Java CAPS Match Engine with a master index application requires some customization to the Match Field file in the master index project. You can also customize the Oracle Java CAPS Match Engine configuration files to better suit your data standardization and matching requirements.

The following topics provide information about the required customization and how the Match Field file corresponds to the configuration files.

Master Index Components and the Oracle Java CAPS Match Engine

Oracle Java CAPS Match Engine applications use the Oracle Java CAPS Match Engine specifically for standardization and probabilistic weighting, while the master index application determines survivorship. This process relies on the logic specified in the configuration files of the master index project and of the Oracle Java CAPS Match Engine.

The following topics provide information about how the Oracle Java CAPS Match Engine works with master index applications to standardize data and formulate matching weights.

Searching and Matching in Oracle Java CAPS Match Engine Applications (Repository)

When a new record is passed to the master index database, the master index application selects a subset of possible matches from the database. The master index application then uses the Oracle Java CAPS Match Engine matching algorithm to assign a matching probability weight for each record in this subset (known as the candidate selection pool). To create the candidate selection pool, the master index application makes a series of query passes of the existing data, searching for matches on specific combinations of data. These combinations are defined by the blocking query, which is defined in the Candidate Select file and specified in the Threshold file.

Matching is performed on the fields included in the match string defined in the Match Field file. Each field is assigned a matching weight. The weights for each field are summed to determine the matching probability weight for the entire record (known as the composite weight). Before matching on some fields, such as the first name, the index might standardize the field based on information in the standardization files. You can customize how each field is weighted by modifying the match configuration file.

Standardization and Matching Process in Master Index Applications (Repository)

The standardization and matching processes use logic that is defined by a combination of Oracle Java CAPS Match Engine configuration files and master index configuration files. During the standardization and match processes, the following occurs.

  1. The Oracle Java CAPS Match Engine receives an incoming record.

  2. The Oracle Java CAPS Match Engine standardizes the fields specified for parsing, normalization, and phonetic encoding. These fields are defined in the StandardizationConfig section of the Match Field file and the rules for standardization are defined in the Oracle Java CAPS Match Engine standardization configuration files.

  3. The master index application queries the database for a candidate selection pool (records that are possible matches) using the blocking query specified in the Threshold file. If the blocking query uses standardized or phonetic fields, the criteria values are obtained from the database.

  4. For each possible match, the master index application creates a match string (based on the match columns in the Match Field file) and sends the string to the Oracle Java CAPS Match Engine.

  5. The Oracle Java CAPS Match Engine checks the incoming record against each possible match, producing a matching weight for each. Matching is performed using the weighting rules defined in the match configuration file.

The Master Index Match String (Repository)

The data string that is passed to the Oracle Java CAPS Match Engine for match processing is called the match string and is defined in the MatchingConfig section of the Match Field file. The Oracle Java CAPS Match Engine configuration files, the blocking query, and the matching configuration are closely linked in the search and matching processes. The blocking query defines the select statements for creating the candidate selection pool during the matching process. The matching configuration defines the match string that is passed to the Oracle Java CAPS Match Engine from the records in the candidate selection pool. Finally, the Oracle Java CAPS Match Engine configuration files define how the match string is processed.

The Oracle Java CAPS Match Engine configuration files are dependent upon the match string, and it is very important when you modify the match string to ensure that the match type you specify corresponds to the correct row in the match configuration file (matchConfigFile.cfg). For example, if you are using person matching and add “MaritalStatus” as a match field, you need to specify a match type for the MaritalStatus field that is listed in the first column of the match configuration file. You must also make sure that the matching logic defined in the corresponding row of the match configuration file is defined appropriately for matching on the MaritalStatus field.

Oracle Java CAPS Match Engine Field Identifiers

The Oracle Java CAPS Match Engine breaks down fields into various components. For example, it breaks addresses into floor number, street number, street name, street direction, and so on. Some of these components are similar and are typically stored in the same field in the database. In the default configuration, for example, when the standardization engine finds a house number, rural route number, or PO box number, the value is stored in the HouseNumber database field. You can customize this as needed, as long as any field you specify to store a component is also included in the object structure defined for the master index application.

The Oracle Java CAPS Match Engine uses field identifiers to determine how to process fields that are defined for normalization or parsing. The IDs are defined internally in the match engine and are referenced in the Match Field file. The field IDs you specify for each field in the Match Field file determine how that field is processed by the standardization engine. The field IDs for person names determine how each name is normalized. The field IDs for business names specify which business type key file to use for standardization. The field IDs for addresses determine which database fields store each field component and how each component is standardized.

Table 3 lists each field component generated by the Oracle Java CAPS Match Engine along with their corresponding field IDs. You can only specify the predefined field IDs that are listed in this table.

Table 3 Standardization Field Identifiers

Field ID
Description
Person Name Standardization Field Identifiers
FirstName
Specifies a first name field for normalization.
LastName
Specifies a last name field for normalization.
Address Standardization Field Identifiers
HouseNumber
Specifies the parsed house number from a standardized address field. By default, this is stored in the field_name_HouseNo field (or the HouseNumber field for Oracle Java CAPS Master Patient Index).
RuralRouteIdentif
Specifies the parsed rural route identifier from a standardized address field. By default, this is stored in the field_name_HouseNo field (or the HouseNumber field for Oracle Java CAPS Master Patient Index).
BoxIdentif
Specifies the parsed PO box number from a standardized address field. By default, this is stored in the field_name_HouseNo field (or the HouseNumber field for Oracle Java CAPS Master Patient Index).
MatchStreetName
Specifies the parsed and standardized street name from a standardized address field and is used internally by the match engine. If you want to store the standardized street name in the database (recommended), map this field to the street name field in the database. By default, this is stored in the field_name_StName field (or the StreetName field for Oracle Java CAPS Master Patient Index).
OrigStreetName
Specifies the parsed street name from an address field. If you want to store the original street name in the database, map this field to the street name field in the database. This address component is not included in the default standardization structure, but you can add it if needed.
RuralRouteDescript
Specifies the parsed rural route description from a standardized address field. By default, this is stored in the field_name_StName field (or the StreetName field for Oracle Java CAPS Master Patient Index).
BoxDescript
Specifies the PO box type from a standardized address field. By default, this is stored in the field_name_StName field (or the StreetName field for Oracle Java CAPS Master Patient Index).
PropDesPrefDirection
Specifies the parsed property direction from a standardized address field. This field ID handles cases where the direction is a prefix to the property description. By default, this is stored in the field_name_StDir field (or the StreetDir field for Oracle Java CAPS Master Patient Index).
PropDesSufDirection
Specifies the parsed property direction from a standardized address field. This field ID handles cases where the direction is a suffix to the property description. By default, this is stored in the field_name_StDir field (or the StreetDir field for Oracle Java CAPS Master Patient Index).
StreetNamePrefDirection
Specifies the parsed street direction from a standardized address field. This field ID handles cases where the direction is a prefix to the street name. By default, this is stored in the field_name_StDir field (or the StreetDir field for Oracle Java CAPS Master Patient Index).
StreetNameSufDirection
Specifies the parsed street direction from a standardized address field. This field ID handles cases where the direction is a suffix to the street name. By default, this is stored in the field_name_StDir field (or the StreetDir field for Oracle Java CAPS Master Patient Index).
StreetNameSufType
Specifies the parsed street type from a standardized address field. This field ID handles cases where the street type is a suffix to the street name. By default, this is stored in the field_name_StType field (or the StreetType field for Oracle Java CAPS Master Patient Index).
StreetNamePrefType
Specifies the parsed street type from a standardized address field. This field ID handles cases where the street type is a prefix to the street name. By default, this is stored in the field_name_StType field (or the StreetType field for Oracle Java CAPS Master Patient Index).
PropDesSufType
Specifies the parsed property type from a standardized address field. This field ID handles cases where the street type is a suffix to the property description. By default, this is stored in the field_name_StType field (or the StreetType field for Oracle Java CAPS Master Patient Index).
PropDesPrefType
Specifies the parsed property type from a standardized address field. This field ID handles cases where the street type is a prefix to the property description. By default, this is stored in the field_name_StType field (or the StreetType field for Oracle Java CAPS Master Patient Index).
HouseNumPrefix
Specifies the parsed house number prefix from a standardized address field (such as the “A” in “A 1587 4th Street”). This address component is not included in the default standardization structure, but you can add it if needed.
SecondHouseNumberPrefix
Specifies the parsed second house number prefix from a standardized address field (such as “25” in “25 319 10th Ave.”). This address component is not included in the default standardization structure, but you can add it if needed.
SecondHouseNumber
Specifies the parsed second house number prefix from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
HouseNumSuffix
Specifies the parsed house number suffix from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
OrigSecondStreetName
Specifies the parsed second street name from a standardized address field (for example, an address might include a cross-street or a thoroughfare and dependent thoroughfare). This address component is not included in the default standardization structure, but you can add it if needed.
SecondStreetNameSufDirection
Specifies the parsed second street direction from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
SecondStreetNameSufType
Specifies the parsed second street type from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
StreetNameExtensionIndex
Specifies the parsed street name extension from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
WithinStructDescript
Specifies the parsed internal descriptor (such as “Floor”) from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
WithinStructIdentif
Specifies the parsed internal identifier (such as a floor number) from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
OrigPropertyName
Specifies the parsed original property name (such as the name of a complex or business park) from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
MatchPropertyName
Specifies the parsed match property name from a standardized address field and is used internally by the match engine for blocking and phonetic encoding. This address component is not included in the default standardization structure, but you can add it if needed.
CenterDescript
Specifies the parsed structure description from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
CenterIdentif
Specifies the parsed structure identifier from a standardized address field. This address component is not included in the default standardization structure, but you can add it if needed.
ExtraInfo
Specifies any extra information that was not included in any of the other parsed components. This address component is not included in the default standardization structure, but you can add it if needed.
Business Name Standardization Field Identifiers
PrimaryName
Specifies the field containing the parsed name in a free-form text business name field.
OrgTypeKeyword
Specifies the field containing the parsed organization type in a free-form text business name field.
AssocTypeKeyword
Specifies the field containing the parsed association type in a free-form text business name field.
IndustrySectorList
Specifies the field containing the parsed industry sector in a free-form text business name field.
IndustryTypeKeyword
Specifies the field containing the parsed industry type in a free-form text business name field (industry type is a subset of the sector).
AliasList
Specifies the field containing the parsed alias in a free-form text business name field.
Url
Specifies the field containing the parsed URL in a free-form text business name field.

Oracle Java CAPS Match Engine Match and Standardization Types

Indicators are used in the Match Field file to reference the type of matching and standardization to perform on each field. You must specify one of these indicators, called match types and standardization types, for the fields you define for standardization or matching. The match types correspond to the match types listed in the first column of the match configuration file (matchConfigFile.cfg). The standardization types are defined internally in the match engine. The Oracle Java CAPS Match Engine uses these types to determine how to process each field.

Table 4 lists the default standardization types; Table 5 lists the default match types. You can modify the match type names but not the standardization type names. For more information about match and standardization types, see Master Index Match Types and Field Names (Repository) in Understanding Oracle Java CAPS Master Index Processing (Repository). Note that the match types you can specify in the Match Field file (listed in Table 5) are not the same values you specify for the Match Type field drop-down list in the wizard when you create the master index application.

Table 4 Standardization Types

This indicator ...
processes this data type ...
Address
Free-form street address fields.
PersonName
Pre-parsed name fields (including any first, middle, last, or alias names).
BusinessName
Free-form business names.

The standardization types listed above correspond to the three categories of match types listed below. You can also specify miscellaneous match types, which do not correspond to any standardization types.

Table 5 Match Types

This indicator ...
processes this data type ...
Business Name Match Types
PrimaryName
The parsed name field of a business name.
OrgTypeKeyword
The parsed organization type field of a business name.
AssocTypeKeyword
The parsed association type field of a business name.
AliasList
The parsed alias type field of a business name.
IndustrySectorList
The parsed industry sector field of a business name.
IndustryTypeKeyword
The parsed industry type field of a business name.
Url
The parsed URL field of a business name.
Address Match Types
StreetName
The parsed street name field of a street address.
HouseNumber
The parsed house number field of a street address.
StreetDir
The parsed street direction field of a street address.
StreetType
The parsed street type field of a street address.
Person Name Match Types
FirstName
A first name field, including middle name, alias first name, and alias middle name fields.
LastName
A last name field, including alias last name fields.
Date Match Types
DateDays
The day, month, and year of a date field.
DateMonths
The month and year of a date field.
DateHours
The hour, day, month, and year of a date field.
DateMinutes
The minute, hour, day, month, and year of a date field.
DateSeconds
The seconds, minute, hour, day, month, and year of a date field.
Miscellaneous Match Types
String
A generic string field.
Numeric
A numeric field.
Integer
A field containing integers.
Real
A field containing real numbers.
SSN
A field containing a social security number.
Char
A field containing a single character.
pro
Any field on which you want the Oracle Java CAPS Match Engine to use prorated weights.
Exac
Any field you want the Oracle Java CAPS Match Engine to match character for character.

Oracle Java CAPS Match Engine Configuration File Modifications

The Oracle Java CAPS Match Engine configuration files are designed to perform very specific functions in the standardization and match processes. These files should only be modified by personnel with an understanding of the Oracle Java CAPS Match Engine and an understanding of the data integrity requirements of your organization. Modifications to both the master index configuration files and the Oracle Java CAPS Match Engine configuration files should be made while the master index application is in the preproduction stages. Modifying the files after master index application has moved into production might cause variances in matching weights and data processing.

The most common modifications to the Oracle Java CAPS Match Engine configuration files are generally in the match configuration file, where you can fine-tune the weighting process. This file defines probabilities used by the algorithm to determine a matching probability weight for each match field. You can use the match comparison functions provided by the Oracle Java CAPS Match Engine to fine-tune the matching logic in this file. Another common modification is inserting additional names or terms into category files, such as the first name category file (personFirstName*.dat).

Depending on your data requirements, you might need to modify additional standardization files. Some of the patterns files (most notably the address patterns files) are very complex and should only be modified by personnel who thoroughly understand the defined patterns and tokens. If you modify standardization files, make sure you modify them for each national domain specified in the Match Field file.

Configuring the Master Index Matching Service (Repository)

To configure a Oracle Java CAPS Match Engine application for specific data types and for the Oracle Java CAPS Match Engine, you must customize the Matching Service by modifying the Match Field file in the master index project. Configuring the matching service consists of the following four tasks.

Master Index Standardization Configuration (Repository)

The StandardizationConfig section of the Match Field file determines which fields are normalized, parsed, or phonetically encoded and defines the nationality of the data being processed. The standardization section includes the following structures.

The StandardizationConfig section defines fields that will be normalized, fields that will be parsed and normalized, and fields that will be phonetically encoded. The standardization types you specify in this section correspond to the match configuration file; the field IDs you can specify are listed in Table 3.

Normalization Structures

The normalization structure defines fields that are already parsed, but need to be normalized. It also tells the Oracle Java CAPS Match Engine where to place the normalized data in the object structure. Matching on any of these fields is determined by the match string and the logic is defined in the match configuration file.

Of the three data types processed by the Oracle Java CAPS Match Engine, only the person name data type is expected to provide information in fields that are already parsed; that is, the first, last, and middle names appear in separate fields, as do the suffix, title, and so on. The person standardization files define logic for normalizing person name fields. By default, only the names you specify for matching in the wizard are defined for normalization. You can define normalization for additional name fields, such as maiden name, spouse’s name, and so on. For each normalization structure, you must specify the national domains for the data you are processing.

Defining New Fields for Normalization

The fields you define for normalization in the Match Field file can include any name fields. If you define normalization for fields that are not currently defined for normalization in the Match Field file, make the following additional changes.

  1. In the Match Field file, define the normalization structure, using the appropriate standardization type (PersonName), domain selector, and field IDs (FirstName, MiddeName, or LastName).

  2. Add the new fields that will store the normalized field value to the appropriate objects in the Object Definition file.

  3. If any of the normalized fields are to be used for blocking, modify the Candidate Select file by adding the new fields to the blocking query.

  4. Regenerate the master index application in NetBeans to include the new fields in the database creation script, the outbound Object Type Definition (OTD), and the method OTD.

  5. To specify that the new normalized fields be used for matching, do the following:

    1. Determine the match type or the match comparison function you want to use to match the normalized data, and modify the match configuration file (matchConfigFile.cfg) if needed.

    2. Add the new normalized field to the match-columns element of the MatchingConfig section of the Match Field file, making sure to use the appropriate match type from the match configuration file.

Standardization Structures (Parsing and Normalization)

The fields that must be parsed, and possibly normalized, are defined in a standardization structure in the StandardizationConfig section of the Match Field file. The standardization structure tells the Oracle Java CAPS Match Engine where to place the standardized information extracted from the parsed fields. The target fields you specify for standardization facilitate searching by the parsed values. Matching on any of these fields is determined by the match string and the logic is defined in the match configuration file.

The Oracle Java CAPS Match Engine expects business names and street address information in free-form text fields that must be parsed and normalized prior to matching. The logic for parsing and normalizing street address information is contained in the address standardization files; the logic for parsing and normalizing business names is contained in the business standardization files. You can customize the standardization of these data types by modifying the appropriate patterns file. For each standardization structure, you must specify the national domains for the data being processed.

Defining New Fields for Standardization

The fields you define for standardization in the Match Field file can include any street address or business name field. Perform the following steps if you need to define one of these field types for standardization.

  1. If necessary, modify the patterns file for the type of data you are standardizing.

    You can define new input and output patterns or modify existing ones.

  2. Define the standardization structure, using the appropriate standardization type (BusinessName or Address), domain selector, and field IDs (described in Table 3).

  3. Add the new fields that will store the parsed or normalized data to the appropriate objects in the Object Definition file.

  4. If any of the parsed or normalized fields are to be used for blocking, modify the Candidate Select file by adding the new fields to the blocking query.

  5. Regenerate the master index application in NetBeans to include the new fields in the database creation script, the outbound Object Type Definition (OTD), and the method OTD.

  6. To specify that the new standardized fields be used for matching, do the following:

    1. Determine the match type or the match comparison function you want to use to match the parsed data, and modify the match configuration file (matchConfigFile.cfg) if needed.

    2. Add the new standardized field to the match-columns element of the MatchingConfig section of the Match Field file, making sure to use the appropriate match type from the match configuration file.

Phonetic Encoding Structures

The fields to be phonetically encoded are defined in a phonetic encoding structure in the StandardizationConfig section ofthe Match Field file. The phonetic encoding structure tells the Oracle Java CAPS Match Engine where to place the phonetic data created from the fields that are encoded. You can define any field in the object structure for phonetic encoding.

Defining New Fields for Phonetic Encoding

The fields you define for phonetic encoding in the Match Field file can include any field.

  1. Determine the type of phonetic encoder to use to convert the field.

    You can use any of the encoders described in Table 7.

  2. Define the phonetic encoding structure, using the appropriate encoders.

  3. Add the new fields that will store the phonetic values to the appropriate objects in the Object Definition file.

  4. If any the phonetic fields are to be used for blocking, modify the Candidate Select file by adding the new fields to the blocking query.

  5. Regenerate the master index application in NetBeans to include the new fields in the database creation script, the outbound OTD, and the method OTD.

Master Index Match String Configuration (Repository)

The MatchingConfig section of the Match Field file determines which fields are passed to the Oracle Java CAPS Match Engine for matching (the match string). If you are matching on fields parsed from a free-form text field, define each individual parsed field you want to use for matching. The default fields listed in the MatchingConfig section depend on the fields you specified for matching in the wizard (for Oracle Java CAPS Master Patient Index, the default fields are FirstName, LastName, DOB, Gender, and SSN).

The match types you can use for each field in this section are defined in the first column of the match configuration file. Make sure the match type you specify has the correct matching logic defined in the match configuration file.

Match and Standardization Engine Configuration

The MEFAConfig section of the Match Field file defines which standardization and match engines will be used by the master index application. By default, the master index application is already configured to use the Oracle Java CAPS Match Engine for matching and standardization. For more information, see Understanding Oracle Java CAPS Master Index Configuration Options (Repository).

Table 6 lists the elements in the Match Field file that define the match and standardization engine, along with the appropriate values for the Oracle Java CAPS Match Engine.

Table 6 Oracle Java CAPS Match Engine Standardization and Match Classes

Match Field File Element
Oracle Java CAPS Match Engine Value
standardizer-api
com.stc.eindex.matching.adapter.SbmeStandardizerAdapter
standardizer-config
com.stc.eindex.matching.adapter.SbmeStandardizerAdapter Config
matcher-api
com.stc.eindex.matching.adapter.SbmeMatcherAdapter
matcher-config
com.stc.eindex.matching.adapter.SbmeMatcherAdapter Config

Master Index Phonetic Encoder Configuration (Repository)

The Oracle Java CAPS Match Engine supports several phonetic encoders, which are defined in the PhoneticEncodersConfig section of the Match Field file. Any encoders specified in the phonetic encoding structures (see Phonetic Encoding Structures) must also be defined in the PhoneticEncodersConfig section. The classes for the encoders are listed in Table 7.

Table 7 Phonetic Encoder Classes for the Oracle Java CAPS Match Engine

Encoder
Java Class
Soundex
com.stc.eindex.phonetic.impl.Soundex
NYSIIS
com.stc.eindex.phonetic.impl.NYSIIS
Metaphone
com.stc.eindex.phonetic.impl.Metaphone
Double Metaphone
com.stc.eindex.phonetic.impl.DoubleMetaphone
Refined Soundex
com.stc.eindex.phonetic.impl.RefinedSoundex
French Soundex
com.stc.eindex.phonetic.impl.SoundexFR