JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Understanding the Oracle Java CAPS Match Engine     Java CAPS Documentation
search filter icon
search icon

Document Information

Understanding the Oracle Java CAPS Match Engine

Related Topics

About the Oracle Java CAPS Match Engine

Oracle Java CAPS Match Engine Overview

About the Oracle Java CAPS Match Engine Matching Algorithm

Oracle Java CAPS Match Engine Standardization and Matching Process

Oracle Java CAPS Match Engine Data Types

How the Oracle Java CAPS Match Engine Works

Oracle Java CAPS Match Engine Matching Weight Formulation

Matching and Unmatching Probabilities

Agreement and Disagreement Weight Ranges

Oracle Java CAPS Match Engine Standardization Configuration

Oracle Java CAPS Match Engine Standardization File Types

Oracle Java CAPS Match Engine Internationalization

Oracle Java CAPS Match Engine Matching Configuration

The Oracle Java CAPS Match Engine Match Configuration File

Oracle Java CAPS Match Engine Match Configuration File Format

Match Configuration File Sample

Probability Type

Matching Rules

Oracle Java CAPS Match Engine Matching Comparison Functions

The Match Constants File

Oracle Java CAPS Match Engine and the Oracle Java CAPS Match Engine

Master Index Components and the Oracle Java CAPS Match Engine

Searching and Matching in Oracle Java CAPS Match Engine Applications (Repository)

Standardization and Matching Process in Master Index Applications (Repository)

The Master Index Match String (Repository)

Oracle Java CAPS Match Engine Field Identifiers

Oracle Java CAPS Match Engine Match and Standardization Types

Oracle Java CAPS Match Engine Configuration File Modifications

Configuring the Master Index Matching Service (Repository)

Master Index Standardization Configuration (Repository)

Normalization Structures

Standardization Structures (Parsing and Normalization)

Phonetic Encoding Structures

Master Index Match String Configuration (Repository)

Match and Standardization Engine Configuration

Master Index Phonetic Encoder Configuration (Repository)

Oracle Java CAPS Match Engine Person Data Type Configuration

Oracle Java CAPS Match Engine Person Matching Overview

Oracle Java CAPS Match Engine Person Data Processing Fields

Person Data Match String Fields

Person Data Standardized Fields

Person Data Object Structure

Oracle Java CAPS Match Engine Match Configuration for Person Data

Oracle Java CAPS Match Engine Person Data Standardization Files

Oracle Java CAPS Match Engine Common Standardization Files for Person Data

The Hyphenated Name Category File (personFirstNameDash.dat)

The Person Name Patterns File (personNamePatt.dat)

The Special Characters Reference File (personRemoveSpecChars.dat)

Oracle Java CAPS Match Engine Domain-Specific Standardization Files for Person Data

The Conjunction Reference File (personConjon*.dat)

The Person Constants File (personConstants*.cfg)

The First Name Category File (personFirstName*.dat)

The Generational Suffix Category File (personGenSuffix*.dat)

Last Name Prefix Category File (personLastNamePrefix*.dat)

The Last Name Category File (personLastName*.dat)

The Occupational Suffix Category File (personOccupSuffix*.dat)

The Three-Character Suffix File (personThree*.dat)

The Title Category File (personTitle*.dat)

The Two-Character Suffix File (personTwo*.dat)

The Business-Related Category File (businessOrRelated*.dat)

Configuring the Oracle Java CAPS Match Engine Standardization Files for Person Data

Configuring the Master Index Matching Service for Person Data (Repository)

Configuring the Standardization Structure for Person Data (Repository)

Person Data Normalization Structures

Person Data Phonetic Encoding

Configuring the Match String for Person Data (Repository)

Oracle Java CAPS Match Engine Address Data Type Configuration

Oracle Java CAPS Match Engine Address Matching Overview

Oracle Java CAPS Match Engine Address Data Processing Fields

Address Data Match String Fields

Address Data Standardized Fields

Address Data Object Structure

Match Configuration for Address Data (Repository)

Oracle Java CAPS Match Engine Standardization Configuration for Address Data

The Address Constants File (addressConstants*.cfg)

The Address Clues File (addressClueAbbrev*.dat)

The Address Internal Constants File (addressInternalConstants*.cfg)

The Address Master Clues File (addressMasterClues*.dat)

The Address Patterns File (addressPatterns*.dat)

The Address Output Patterns File (addressOutPatterns*.dat)

Address Pattern File Components

Address Type Tokens

Pattern Classes

Pattern Modifiers

Priority Indicators

Modifying Oracle Java CAPS Match Engine Address Data Configuration Files

Configuring the Matching Service for Address Data (Repository)

Configuring the Standardization Structure for Address Data (Repository)

Address Standardization Structures

Address Phonetic Encoding

Configuring the Match String for Address Data (Repository)

Oracle Java CAPS Match Engine Business Names Data Type Configuration

Oracle Java CAPS Match Engine Business Name Matching Overview

Oracle Java CAPS Match Engine Business Name Processing Fields

Business Name Match String Fields

Business Name Standardized Fields

Business Name Object Structure

Oracle Java CAPS Match Engine Match Configuration for Business Names

Oracle Java CAPS Match Engine Standardization Configuration for Business Names

The Business Constants File (bizConstants.cfg)

The Adjectives Key Type File (bizAdjectivesTypeKeys.dat)

The Alias Key Type File (bizAliasTypeKeys.dat)

The Association Key Type File (bizAssociationTypeKeys.dat)

The General Terms Reference File (bizBusinessGeneralTerms.dat)

The City or State Key Type File (bizCityorStateTypeKeys.dat)

The Business Former Name Reference File (bizCompanyFormerNames.dat)

The Merged Business Name Category File (bizCompanyMergerNames.dat)

The Primary Business Name Reference File (bizCompanyPrimaryNames.dat)

The Connector Tokens Reference File (bizConnectorTokens.dat)

The Country Key Type File (bizCountryTypeKeys.dat)

The Industry Sector Reference File (bizIndustryCategoryCode.dat)

The Industry Key Type File (bizIndustryTypeKeys.dat)

The Organization Key Type File (bizOrganizationTypeKeys.dat)

The Business Patterns File (bizPatterns.dat)

Business Name Tokens

The Special Characters Reference File (bizRemoveSpecChars.dat)

Modifying Oracle Java CAPS Match Engine Business Name Configuration Files

Configuring the Matching Service for Business Names (Repository)

Configuring the Standardization Structure for Business Names (Repository)

Business Name Standardization Structures

Business Name Phonetic Encoding

Configuring the Match String for Business Names (Repository)

Fine-Tuning Weights and Thresholds for Oracle Java CAPS Match Engine (Repository)

Data Analysis Overview

Customizing the Match Configuration and Thresholds

Determining the Match Fields

Customizing the Match Configuration

Probabilities or Agreement Weights

Defining Relative Value

Determining the Weight Range

Weight Ranges Using Agreement Weights

Weight Ranges Using Probabilities

Comparison Functions

Determining the Weight Thresholds

Specifying the Weight Thresholds

Fine-tuning the Thresholds

Match Configuration Comparison Functions for Oracle Java CAPS Match Engine (Repository)

Oracle Java CAPS Match Engine Comparison Functions

Bigram Comparators

Bigram String Comparator (b1)

Advanced Bigram String Comparator (b2)

Uncertainty String Comparators

Generic String Comparator (u)

Advanced Generic String Comparator (ua)

Simplified String Comparator (us)

Simplified String Comparator - FirstName (uf)

Simplified String Comparator - LastName (ul)

Simplified String Comparator - House Numbers (un)

Language-specific String Comparator (usu)

Exact char-by-char Comparator (c)

Numeric Comparators

Generic Number Comparator (n)

Integer Comparator (nI)

Real Number Comparator (nR)

Alphanumeric Comparator (nS)

Date Comparators

Date Comparator - Year only (dY)

Date Comparator - Month-Year (dM)

Date Comparator - Day-Month-Year (dD)

Date Comparator - Hour-Day-Month-Year (dH)

Date Comparator - Min-Hour-Day- Month-Year (dm)

Date Comparator - Sec-Min-Hour-Day- Month-Year (ds)

Prorated Comparator (p)

Oracle Java CAPS Match Engine Comparison Function Options

Oracle Java CAPS Match Engine Person Data Type Configuration

Processing person data involves normalizing and phonetically encoding certain fields prior to matching. The following topics describe the default configuration files that define person processing logic and provide instructions for modifying the Match Field file for processing person data. The information presented in this in these topics is especially pertinent to the Oracle Java CAPS Master Patient Index application.

Oracle Java CAPS Match Engine Person Matching Overview

Matching on the person data type includes standardizing and matching a person’s demographic information. The Oracle Java CAPS Match Engine can create normalized and phonetic values for person data. Several configuration files designed specifically to handle person data are included to provide additional logic for the standardization and phonetic encoding process. The Oracle Java CAPS Match Engine can phonetically encode any field you specify, with some modification to the standardization files. It can also match on any field, as long as the match type for the field is defined in the match configuration file (matchConfigFile.cfg).

In addition, when storing person information, you might want to standardize addresses to enable searching against address information. This requires working with the address configuration files described in Oracle Java CAPS Match Engine Address Data Type Configuration.

The following topic provides information about the fields used in person data matching and the fields added to the object structure.

Oracle Java CAPS Match Engine Person Data Processing Fields

When matching on person data, not all fields in a record need to be processed by the Oracle Java CAPS Match Engine. The match engine only needs to process fields that must be parsed, normalized, or phonetically converted, and the fields against which matching is performed. These fields are defined in the Match Field file and processing logic for each field is defined in the Oracle Java CAPS Match Engine standardization and matching configuration files.

Person Data Match String Fields

The match string processed by the Oracle Java CAPS Match Engine is defined by the match fields specified in the Match Field file. The match engine can process any combination of fields you specify for matching. By default, the match configuration file (matchConfigFile.cfg) includes rows specifically for matching on first name, last name, social security number, and dates (such as a date of birth). It also includes a row for matching a single character, such as might be the case in a gender field. You can use any of the existing rows for matching or you can add rows for the fields you want to match. Any field for which you specify a match type in the wizard is added to the match string.

Person Data Standardized Fields

The Oracle Java CAPS Match Engine expects person data to be provided in separate fields within a single record, meaning that no parsing is required of the name fields prior to normalization. Typically, only first and last names are normalized and phonetically encoded when standardizing person date, but the match engine can normalize and phonetically encode any field you choose.

Person Data Object Structure

The fields you specify for person name matching in the wizard are automatically defined for standardization and phonetic encoding. If you specify the appropriate match types in the wizard, the following fields are automatically added to the object structure and database creation script.

where field_name is the name of the field for which you specified person name matching. For example, if you specify the PersonFirstName match type for the FirstName field, two fields, FirstName_Std and FirstName_Phon, are automatically added to the structure. You can also add these fields manually if you do not specify match types in the wizard. If you store additional names in the database, such as alias names, maiden names, parent names, and so on, you can modify the phonetic structure to phonetically encode those names as well.


Note - The object structure for Oracle Java CAPS Master Patient Index uses a slightly different naming convention.


Oracle Java CAPS Match Engine Match Configuration for Person Data

The default match configuration file, matchConfigFile.cfg, defines several match types for the kinds of data typically included in a person master index application. You can customize the existing match types or create new match types for the data being processed. The following match types are typical for matching on person data.

  • FirstName
  • Real
  • LastName
  • SSN
  • String
  • Gender
  • Date
  • pro
  • Numeric
  • Exac
  • Integer

This file appears under the Match Engine node of the master index project. For more information about the comparison functions used for each match type and how the weights are tuned, see Customizing the Match Configuration and Match Configuration Comparison Functions for Oracle Java CAPS Match Engine (Repository).

Oracle Java CAPS Match Engine Person Data Standardization Files

Several configuration files are used to define standardization logic for the Oracle Java CAPS Match Engine. You can customize any of the configuration files described in this section to fit your processing and standardization requirements for person data. There are two types of standardization files for person data: common and domain-specific. The common files appear under the Standardization Engine node of the master index project and are used for all national domains; the domain-specific files appear within sub-folders of the Standardization Engine node and each corresponds to a specific national domain.

The following topics provide information about each type of standardization file:

Oracle Java CAPS Match Engine Common Standardization Files for Person Data

The standardization files described in this section are common to all national domains. These files define special characters to remove from name fields and define hyphenated first names. A patterns file is also common, but is not currently used.

The Hyphenated Name Category File (personFirstNameDash.dat)

The hyphenated name category file defines first names that include hyphens (such as Anne-Marie) to help the Oracle Java CAPS Match Engine recognize and process these values as first names. The file also classifies each name into a gender category. This file is used to standardize all domains except Australia, which uses the personFirstNameDashAU.dat file located in the Australia folder, and France, which uses the personFirstNameDashFR.dat file located in the France folder.

The hyphenated name category files use the following syntax:

name gender-class

You can modify or add entries in this table as needed. Table 8 describes the columns in the personFirstNameDash.dat file.

Table 8 Hyphenated Name Category File

Column
Description
name
A hyphenated first name.
gender-class
An indicator of the gender with which the first name corresponds. The possible values are:
  • N - The name is neutral, and can be applied to male or female first names.

  • F - The name is used for females.

  • M - The name is used for males.

Following is an excerpt from the personFirstNameDash.dat file.

ANNE-MARIE          F
JEAN-NOEL           M
JEAN-MARIE          M
JEAN-BAPTISTE       M
JEAN-PIERRE         M
JEAN-YVES           M
The Person Name Patterns File (personNamePatt.dat)

The person name patterns file is not currently used, but is designed to standardize free-form text name fields.

The Special Characters Reference File (personRemoveSpecChars.dat)

The special characters reference file lists characters that might appear in person data, but that should be ignored. The Oracle Java CAPS Match Engine removes these characters from a field before making any comparisons or before normalizing data. You can define additional characters to remove from person data by simply adding the character to the list.

An excerpt from the personRemoveSpecChars.dat file appears below.

[
]
{
}
<
>
/
?
*
^
#
!

Oracle Java CAPS Match Engine Domain-Specific Standardization Files for Person Data

Most standardization files for person data are specific to each national domain. Each domain node within the Standardization node of the project includes the files defined in this section. The domain corresponding to each file is indicated at the end of the file name; for example, personConstantsUK.cfg and personConstantsFR.cfg. These domain abbreviations are indicated by an asterisk (*) in the descriptions.


Note - You can customize these files to add entries of other nationalities or languages, including those containing diacritical marks.


The Conjunction Reference File (personConjon*.dat)

The conjunction reference file is not currently used, but is designed to work with the person name patterns file during standardization.

The Person Constants File (personConstants*.cfg)

The person constants file defines certain information about the standardization files used for processing person data, primarily the number of lines contained in each file. The number of lines specified here must be equal to or greater than the number of lines actually contained in each file. The constants file for United States data is in the Standardization node of the project and is named personConstants.cfg; the person constants file for the other domains is located under the domain name node.

Table 9 lists and describes each parameter in the constants file. The files referenced by these parameters are described on the following pages.

Table 9 Person Constants File Parameters

Parameter
Description
words
The maximum number of words in a given free-form text field containing a person name. This parameter is not currently used.
conjmax
The maximum number of lines in the person conjunction reference file (personConjon*.dat).
jrsrmax
The maximum number of lines in the generational suffix category file (personGenSuffix*.dat).
nickmax
The maximum number of lines in the first name category file (personFirstName*.dat).
lastmax
The maximum number of lines in the last name category file (personLastName*.dat).
premax
The maximum number of lines in the last name prefix category file (personLastNamePrefix*.dat).
titlmax
The maximum number of lines in the title category file (personTitle*.dat).
sufmax
The maximum number of lines in the occupational suffix category file (personOccupSuffix*.dat).
skpmax
The maximum number of lines in the business name reference file (businessOrRelated*.dat).
ptrnmax1
The maximum number of lines in the person patterns file (personNamePatt.dat).
twomax
The maximum number of lines in the two-character reference file for occupational suffixes (personTwo*.dat).
thremax
The maximum number of lines in the three-character reference file for occupational suffixes (personThree*.dat).
blnkmax
The maximum number of lines in the special characters reference file (personRemoveSpecChars.dat).
dashSize
The maximum number of lines in the hyphenated name category file (personFirstNameDash.dat).
The First Name Category File (personFirstName*.dat)

The first name category file defines standardized versions of first names and assigns a gender classification for each name. This file is used to standardize first names when comparing person names. The gender classification helps to further clarify the match. The Oracle Java CAPS Match Engine uses this file when a first name field is defined for normalization or standardization in the Match Field file.

The syntax of this file is:

original-value standardized-form gender-class

You can modify or add entries in this table as needed. Table 10 describes the columns in the personFirstName*.dat file.

Table 10 First Name Category File

Column
Description
original-value
The original value of the first name.
standardized-form
The standardized version of the original value. A zero (0) in this field indicates that the original value is already in its standardized form.

If this column contains a name instead of a zero, that name must also be listed in a different entry as an original value with a standardized form of “0”.

gender-class
An indicator of the gender with which the first name corresponds. The possible values are:
  • N – The name is neutral, and can be applied to male or female first names.

  • F – The name is used for females.

  • M – The name is used for males.

Following is an excerpt from the personFirstNameUS.dat file. Certain rows contain a zero (0) for the standardized form, indicating that the name is already standard (for example, Stephen, Sterling, and Summer).

STEPHEN         0               M
STEPHENIE       STEPHANIE       F
STEPHIE         STEPHANIE       F
STEPHINE        STEPHANIE       F
STEPHNIE        STEPHANIE       F
STERLING        0               M
STEVE           STEPHEN         M
STEVEN          STEPHEN         M
STEVIE          STEPHEN         N
STEW            STUART          M
STEWART         STUART          M
STU             STUART          M
STUART          0               M
SU              SUSAN           F
SUE             SUSAN           F
SUHANTO         0               M
SULLIVAN        0               F
SULLY           SULLIVAN        F
SUMMER          0               F
The Generational Suffix Category File (personGenSuffix*.dat)

The generational suffix category file defines standardized versions of generational suffixes, such as Jr., III, and so on. This file is used to compare standard versions of the suffix field. You can define additional suffixes and their standardized form following the syntax below.

field-value standard-form

Table 11 describes each column of the personGenSuffix*.dat file.

Table 11 Generational Suffix Category File

Column
Description
field-value
The original value of the generational suffix in the record being processed.
standard-form
The standard form of the generational suffix. A zero (0) in this column indicates that the value listed in column one is already in its standardized form.

If this column contains a suffix instead of a zero, that suffix must also be listed in a different entry as an original value with a standard form of “0”.

An excerpt from the personGenSuffixUS.dat file appears below. In this excerpt, certain suffixes, such as 2ND, 3RD and JR, are already in their standardized form.

11          2ND
111         3RD
1V          4TH
2ND         0
3RD         0
4TH         0
FOURTH      4TH
II          2ND
III         3RD
IV          4TH
JR          0
JUNIOR      JR
SECOND      2ND
SENIOR      SR
Last Name Prefix Category File (personLastNamePrefix*.dat)

The last name prefix category file defines standardized versions of last name prefixes, such as “Van” or “Le”. This file is used to standardize these prefixes prior to standardizing the last name when comparing person names. The Oracle Java CAPS Match Engine uses this file when a last name field is defined for normalization or standardization in the Match Field file.

The syntax of this file is:

original-value standardized-form

You can modify or add entries in this table as needed. Table 12 describes the columns in the personLastNamePrefix*.dat file.

Table 12 Last Name Prefix Category File

Column
Description
original-value
The original value of the last name prefix.
standardized-form
The standardized version of the original value. A zero (0) in this field indicates that the original value is already in its standardized form.

If this column contains a prefix instead of a zero, that prefix must also be listed in a different entry as an original value with a standardized form of “0”.

Following is an excerpt from the personLastNamePrefixUS.dat file. Some of these prefixes are already in their standardized form, such as “Los” and “Mac”.

LOS                 0
MAC                 0
MC                  MAC
SAINT               0
ST                  SAINT
VAN                 0
VAN DER             0
VANDE               VAN DER
The Last Name Category File (personLastName*.dat)

The last name category file defines standardized versions of last names. This file is used to standardize last names when comparing person names. The Oracle Java CAPS Match Engine uses this file when a last name field is defined for normalization or standardization in the Match Field file.

The syntax of this file is:

original-value standardized-form

You can modify or add entries in this table as needed. Table 13 describes the columns in the personLastName*.dat file.

Table 13 Last Name Category File

Column
Description
original-value
The original value of the last name.
standardized-form
The standardized version of the original value. A zero (0) in this field indicates that the original value is already in its standardized form.

If this column contains a name instead of a zero, that name must also be listed in a different entry as an original value with a standardized form of “0”.

Following is an excerpt from the personLastNameUS.dat file.

FINK                          0
PHINQUE                       FINK
The Occupational Suffix Category File (personOccupSuffix*.dat)

The occupational suffix category file is not currently used, but is designed to work with the person name patterns file during standardization.

The Three-Character Suffix File (personThree*.dat)

This reference file is not currently used, but is designed to work with the person name patterns file during standardization.

The Title Category File (personTitle*.dat)

The title category file defines standard forms for titles and classifies each title into a gender category. For example, “Mister” is standardized to “MR” and is classified as male; “Doctor” is standardized to “DR” and is classified as gender neutral. You can add, modify, or delete entries in this file as needed. Use the following syntax.

original-value standardized-form gender-class

Table 14 describes each column of the personTitle*.dat file.

Table 14 Person Title Category File

Column
Description
original-value
The original value of the title in the person name field.
standardized-form
The standardized version of the original value. A zero (0) in this field indicates that the original value is already in its standardized form.

If this column contains a title instead of a zero, that title must also be listed in a different entry as an original value with a standardized form of “0”.

gender-class
An indicator of the gender with which the title corresponds. The default values are:
  • N – The title is neither male nor female.

  • F – The title is used for females.

  • M – The title is used for males.

An excerpt from the personTitleUS.dat file appears below. In this excerpt, certain titles, such as DR, GEN, and MISS, are already in their standardized form.

CTO                        0          N
DEAN                       0          N
DIR                        DIRECTOR   N
DIRECTOR                   0          N
DOC                        DR         N
DOCTOR                     DR         N
DR                         0          N
DRS                        0          N
EMERITUS                   0          N
FOUNDER                    0          N
GEN                        0          N
GENERAL                    GEN        N
MANAGER                    0          N
MGR                        MANAGER    N
MISS                       0          F
MISSUS                     MRS        F
The Two-Character Suffix File (personTwo*.dat)

This reference file is not currently used, but is designed to work with the person name patterns file during standardization.

The Business-Related Category File (businessOrRelated*.dat)

The business-related category file is used to identify business terms in person name information. Examples of when this could occur would be when indexing both person and business names or when business information is included within a person object structure. The Oracle Java CAPS Match Engine removes these terms for person matching. This file contains a list of common business terms that might be found in person data. You can modify this file by adding, changing, or deleting terms.

An excerpt from the businessOrRelatedUS.dat file appears below.

ACCOUNTANT
ACCT
ACDY
ACRE
ACREAGE
ACRES
ACS
ACT
AD
ADATU
ADM
ADMIN
ADMINISTRATIO
ADMINISTRATION
ADMINISTRATOR 

Configuring the Oracle Java CAPS Match Engine Standardization Files for Person Data

To customize the Oracle Java CAPS Match Engine configuration files for processing person data, you can modify any of the person data standardization files using the text editor provided in NetBeans. Before modifying the match configuration file, review the information provided in Oracle Java CAPS Match Engine Standardization Configuration and Match Configuration Comparison Functions for Oracle Java CAPS Match Engine (Repository). Make sure a thorough data analysis has been performed to determine the best fields for matching and the best comparison functions to use for each field.

Updating most standardization files is a straightforward process. Make sure to follow the syntax guidelines provided in Oracle Java CAPS Match Engine Person Data Standardization Files. If you add any lines to any of the standardization configuration files, be sure to adjust the corresponding parameter in the person constants file (personConstants*.cfg).

Configuring the Master Index Matching Service for Person Data (Repository)

To ensure correct processing of person information, you must customize the Matching Service. This includes modifying the Match Field file to support the fields on which you want to match, to standardize the appropriate fields, and to specify the Oracle Java CAPS Match Engine as the match and standardization engine (by default, the Oracle Java CAPS Match Engine is already specified so this does not need to be changed). Perform the following tasks to configure the Matching Service.

When configuring the Matching Service, keep in mind the information presented in Configuring the Master Index Matching Service (Repository).

Configuring the Standardization Structure for Person Data (Repository)

The standardization structure is configured in the StandardizationConfig section of the Match Field file, which is described in detail in Understanding Oracle Java CAPS Master Index Configuration Options (Repository). To configure the required fields for normalization and phonetic encoding, modify the normalization and phonetic encoding structures in the Match Field file. The following sections provide additional guidelines and samples specific to standardizing person data.


Note - In the current configuration, the rules defined for the person data type assume the incoming data to be parsed prior to processing. Therefore, you do not need to configure fields to parse unless you want to search on address information. In that case, you must configure the address fields to parse and normalize.


Person Data Normalization Structures

The fields defined for normalization for the person data type can include any name fields. By default this includes first, middle, and last name fields. Follow the instructions under Defining Master Index Normalization Rules (Repository) in Configuring Oracle Java CAPS Master Indexes (Repository) to define fields for normalization. For the standardization-type element, enter PersonName (for more information, see Oracle Java CAPS Match Engine Match and Standardization Types). For a list of field IDs to use in the standardized-object-field-id element, see Table 3.

A sample normalization structure for person data is shown below. This sample specifies that the PersonName standardization type is used to normalize the first name, alias first name, last name, and alias last name fields. For all name fields, both United States and United Kingdom domains are defined for standardization.

<structures-to-normalize>
   <group standardization-type="PersonName"
    domain-selector="com.stc.eindex.matching.impl.MultiDomainSelector">
      <locale-field-name>Person.PobCountry</locale-field-name>
      <locale-maps>
         <locale-codes>
            <value>UNST</value>
            <locale>US</locale>
         </locale-codes>
         <locale-codes>
            <value>GB</value>
            <locale>UK</locale>
            </locale-codes>
      </locale-maps>
      <unnormalized-source-fields>
         <source-mapping>
            <unnormalized-source-field-name>Person.FirstName
            </unnormalized-source-field-name>
            <standardized-object-field-id>FirstName
            </standardized-object-field-id>
         </source-mapping>
         <source-mapping>
            <unnormalized-source-field-name>Person.LastName
            </unnormalized-source-field-name>
            <standardized-object-field-id>LastName
            </standardized-object-field-id>
         </source-mapping>
      </unnormalized-source-fields>
         <normalization-targets>
            <target-mapping>
               <standardized-object-field-id>FirstName
               </standardized-object-field-id>
               <standardized-target-field-name>Person.FirstName_Std
               </standardized-target-field-name>
            </target-mapping>
            <target-mapping>
               <standardized-object-field-id>LastName
               </standardized-object-field-id>
               <standardized-target-field-name>Person.LastName_Std
               </standardized-target-field-name>
            </target-mapping>
         </normalization-targets>
      </group>
   <group standardization-type="PersonName" domain-selector=
     "com.stc.eindex.matching.impl.MultiDomainSelector">
      <locale-field-name>Person.PobCountry</locale-field-name>
      <locale-maps>
         <locale-codes>
            <value>UNST</value>
            <locale>US</locale>
         </locale-codes>
         <locale-codes>
            <value>GB</value>
            <locale>UK</locale>
         </locale-codes>
      </locale-maps>
      <unnormalized-source-fields>
         <source-mapping>
            <unnormalized-source-field-name>Person.Alias[*].FirstName
            </unnormalized-source-field-name>
            <standardized-object-field-id>FirstName
            </standardized-object-field-id>
         </source-mapping>
         <source-mapping>
            <unnormalized-source-field-name>Person.Alias[*].LastName
            </unnormalized-source-field-name>
            <standardized-object-field-id>LastName
            </standardized-object-field-id>
         </source-mapping>
      </unnormalized-source-fields>
      <normalization-targets>
         <target-mapping>
            <standardized-object-field-id>FirstName
            </standardized-object-field-id>
            <standardized-target-field-name>
            Person.Alias[*].FirstName_Std
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>LastName
            </standardized-object-field-id>
            <standardized-target-field-name>
            Person.Alias[*].LastName_Std
            </standardized-target-field-name>
         </target-mapping>
      </normalization-targets>
   </group>
</structures-to-normalize>
Person Data Phonetic Encoding

When you specify a name field for person name matching in the wizard, these fields are automatically defined for phonetic encoding. You can define additional names, such as maiden names or alias names, for phonetic encoding as well. Follow the instructions under Defining Phonetic Encoding for the Master Index (Repository) in Configuring Oracle Java CAPS Master Indexes (Repository) to define fields for phonetic encoding.

A sample of fields defined for phonetic encoding is shown below. This sample converts name and alias name fields, as well as the street name.

<phoneticize-fields>
   <phoneticize-field>
      <unphoneticized-source-field-name>Person.FirstName_Std
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Person.FirstName_Phon
      </phoneticized-target-field-name>
      <encoding-type>Soundex</encoding-type>
   </phoneticize-field>
   <phoneticize-field>
      <unphoneticized-source-field-name>Person.LastName_Std
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Person.LastName_Phon
      </phoneticized-target-field-name>
      <encoding-type>NYSIIS</encoding-type>
   </phoneticize-field>
   <phoneticize-field>
      <unphoneticized-source-field-name>Person.Alias[*].FirstName_Std
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Person.FirstName_Phon
      </phoneticized-target-field-name>
      <encoding-type>Soundex</encoding-type>
   </phoneticize-field>
   <phoneticize-field>
      <unphoneticized-source-field-name>Person.Alias[*].LastName_Std
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Person.LastName_Phon
      </phoneticized-target-field-name>
      <encoding-type>NYSIIS</encoding-type>
   </phoneticize-field>
   <phoneticize-field>
      <unphoneticized-source-field-name>
        Person.Address[*].AddressLine1_StName
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>
        Person.Address[*].AddressLine1_StPhon
      </phoneticized-target-field-name>
      <encoding-type>NYSIIS</encoding-type>
   </phoneticize-field></phoneticize-fields>

Configuring the Match String for Person Data (Repository)

You can include any fields on which you want to match in the match string. The match string is defined by the match-column elements in the MatchingConfig section of the Match Field file. If you specify a match type for a field in the wizard, that field (or any fields parsed from that field) is automatically defined in the match string.

To configure the match string, follow the instructions under Defining the Master Index Match String (Repository) in Configuring Oracle Java CAPS Master Indexes (Repository). For the Oracle Java CAPS Match Engine, each data type has a different match type (specified by the match-type element). The FirstName and LastName match types are specific to person matching. You can specify any of the other match types defined in the match configuration file as well. For more information, see Oracle Java CAPS Match Engine Match and Standardization Types.

A sample match string for person matching is shown below. This sample matches on first and last names, date of birth, social security number, gender, and the street name of the address.

<match-system-object>
   <object-name>Person</object-name>
   <match-columns>
      <match-column>
         <column-name>
            Enterprise.SystemSBR.Person.FirstName_Std
         </column-name>
         <match-type>FirstName</match-type>
      </match-column>
      <match-column>
         <column-name>Enterprise.SystemSBR.Person.LastName_Std
         </column-name>
         <match-type>LastName</match-type>
      </match-column>
      <match-column>
         <column-name>Enterprise.SystemSBR.Person.SSN
         </column-name>
         <match-type>SSN</match-type>
      </match-column>
      <match-column>
         <column-name>Enterprise.SystemSBR.Person.DOB
         </column-name>
         <match-type>DateDays</match-type>
      </match-column>
      <match-column>
         <column-name>Enterprise.SystemSBR.Person.Gender
         </column-name>
         <match-type>Char</match-type>
      </match-column>
      <match-column>
         <column-name>Enterprise.SystemSBR.Person.Address.StreetName
         </column-name>
         <match-type>StreetName</match-type>
      </match-column>
   </match-columns>
</match-system-object>