JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Understanding the Oracle Java CAPS Match Engine     Java CAPS Documentation
search filter icon
search icon

Document Information

Understanding the Oracle Java CAPS Match Engine

Related Topics

About the Oracle Java CAPS Match Engine

Oracle Java CAPS Match Engine Overview

About the Oracle Java CAPS Match Engine Matching Algorithm

Oracle Java CAPS Match Engine Standardization and Matching Process

Oracle Java CAPS Match Engine Data Types

How the Oracle Java CAPS Match Engine Works

Oracle Java CAPS Match Engine Matching Weight Formulation

Matching and Unmatching Probabilities

Agreement and Disagreement Weight Ranges

Oracle Java CAPS Match Engine Standardization Configuration

Oracle Java CAPS Match Engine Standardization File Types

Oracle Java CAPS Match Engine Internationalization

Oracle Java CAPS Match Engine Matching Configuration

The Oracle Java CAPS Match Engine Match Configuration File

Oracle Java CAPS Match Engine Match Configuration File Format

Match Configuration File Sample

Probability Type

Matching Rules

Oracle Java CAPS Match Engine Matching Comparison Functions

The Match Constants File

Oracle Java CAPS Match Engine and the Oracle Java CAPS Match Engine

Master Index Components and the Oracle Java CAPS Match Engine

Searching and Matching in Oracle Java CAPS Match Engine Applications (Repository)

Standardization and Matching Process in Master Index Applications (Repository)

The Master Index Match String (Repository)

Oracle Java CAPS Match Engine Field Identifiers

Oracle Java CAPS Match Engine Match and Standardization Types

Oracle Java CAPS Match Engine Configuration File Modifications

Configuring the Master Index Matching Service (Repository)

Master Index Standardization Configuration (Repository)

Normalization Structures

Standardization Structures (Parsing and Normalization)

Phonetic Encoding Structures

Master Index Match String Configuration (Repository)

Match and Standardization Engine Configuration

Master Index Phonetic Encoder Configuration (Repository)

Oracle Java CAPS Match Engine Person Data Type Configuration

Oracle Java CAPS Match Engine Person Matching Overview

Oracle Java CAPS Match Engine Person Data Processing Fields

Person Data Match String Fields

Person Data Standardized Fields

Person Data Object Structure

Oracle Java CAPS Match Engine Match Configuration for Person Data

Oracle Java CAPS Match Engine Person Data Standardization Files

Oracle Java CAPS Match Engine Common Standardization Files for Person Data

The Hyphenated Name Category File (personFirstNameDash.dat)

The Person Name Patterns File (personNamePatt.dat)

The Special Characters Reference File (personRemoveSpecChars.dat)

Oracle Java CAPS Match Engine Domain-Specific Standardization Files for Person Data

The Conjunction Reference File (personConjon*.dat)

The Person Constants File (personConstants*.cfg)

The First Name Category File (personFirstName*.dat)

The Generational Suffix Category File (personGenSuffix*.dat)

Last Name Prefix Category File (personLastNamePrefix*.dat)

The Last Name Category File (personLastName*.dat)

The Occupational Suffix Category File (personOccupSuffix*.dat)

The Three-Character Suffix File (personThree*.dat)

The Title Category File (personTitle*.dat)

The Two-Character Suffix File (personTwo*.dat)

The Business-Related Category File (businessOrRelated*.dat)

Configuring the Oracle Java CAPS Match Engine Standardization Files for Person Data

Configuring the Master Index Matching Service for Person Data (Repository)

Configuring the Standardization Structure for Person Data (Repository)

Person Data Normalization Structures

Person Data Phonetic Encoding

Configuring the Match String for Person Data (Repository)

Oracle Java CAPS Match Engine Address Data Type Configuration

Oracle Java CAPS Match Engine Address Matching Overview

Oracle Java CAPS Match Engine Address Data Processing Fields

Address Data Match String Fields

Address Data Standardized Fields

Address Data Object Structure

Match Configuration for Address Data (Repository)

Oracle Java CAPS Match Engine Standardization Configuration for Address Data

The Address Constants File (addressConstants*.cfg)

The Address Clues File (addressClueAbbrev*.dat)

The Address Internal Constants File (addressInternalConstants*.cfg)

The Address Master Clues File (addressMasterClues*.dat)

The Address Patterns File (addressPatterns*.dat)

The Address Output Patterns File (addressOutPatterns*.dat)

Address Pattern File Components

Address Type Tokens

Pattern Classes

Pattern Modifiers

Priority Indicators

Modifying Oracle Java CAPS Match Engine Address Data Configuration Files

Configuring the Matching Service for Address Data (Repository)

Configuring the Standardization Structure for Address Data (Repository)

Address Standardization Structures

Address Phonetic Encoding

Configuring the Match String for Address Data (Repository)

Oracle Java CAPS Match Engine Business Names Data Type Configuration

Oracle Java CAPS Match Engine Business Name Matching Overview

Oracle Java CAPS Match Engine Business Name Processing Fields

Business Name Match String Fields

Business Name Standardized Fields

Business Name Object Structure

Oracle Java CAPS Match Engine Match Configuration for Business Names

Oracle Java CAPS Match Engine Standardization Configuration for Business Names

The Business Constants File (bizConstants.cfg)

The Adjectives Key Type File (bizAdjectivesTypeKeys.dat)

The Alias Key Type File (bizAliasTypeKeys.dat)

The Association Key Type File (bizAssociationTypeKeys.dat)

The General Terms Reference File (bizBusinessGeneralTerms.dat)

The City or State Key Type File (bizCityorStateTypeKeys.dat)

The Business Former Name Reference File (bizCompanyFormerNames.dat)

The Merged Business Name Category File (bizCompanyMergerNames.dat)

The Primary Business Name Reference File (bizCompanyPrimaryNames.dat)

The Connector Tokens Reference File (bizConnectorTokens.dat)

The Country Key Type File (bizCountryTypeKeys.dat)

The Industry Sector Reference File (bizIndustryCategoryCode.dat)

The Industry Key Type File (bizIndustryTypeKeys.dat)

The Organization Key Type File (bizOrganizationTypeKeys.dat)

The Business Patterns File (bizPatterns.dat)

Business Name Tokens

The Special Characters Reference File (bizRemoveSpecChars.dat)

Modifying Oracle Java CAPS Match Engine Business Name Configuration Files

Configuring the Matching Service for Business Names (Repository)

Configuring the Standardization Structure for Business Names (Repository)

Business Name Standardization Structures

Business Name Phonetic Encoding

Configuring the Match String for Business Names (Repository)

Fine-Tuning Weights and Thresholds for Oracle Java CAPS Match Engine (Repository)

Data Analysis Overview

Customizing the Match Configuration and Thresholds

Determining the Match Fields

Customizing the Match Configuration

Probabilities or Agreement Weights

Defining Relative Value

Determining the Weight Range

Weight Ranges Using Agreement Weights

Weight Ranges Using Probabilities

Comparison Functions

Determining the Weight Thresholds

Specifying the Weight Thresholds

Fine-tuning the Thresholds

Match Configuration Comparison Functions for Oracle Java CAPS Match Engine (Repository)

Oracle Java CAPS Match Engine Comparison Functions

Bigram Comparators

Bigram String Comparator (b1)

Advanced Bigram String Comparator (b2)

Uncertainty String Comparators

Generic String Comparator (u)

Advanced Generic String Comparator (ua)

Simplified String Comparator (us)

Simplified String Comparator - FirstName (uf)

Simplified String Comparator - LastName (ul)

Simplified String Comparator - House Numbers (un)

Language-specific String Comparator (usu)

Exact char-by-char Comparator (c)

Numeric Comparators

Generic Number Comparator (n)

Integer Comparator (nI)

Real Number Comparator (nR)

Alphanumeric Comparator (nS)

Date Comparators

Date Comparator - Year only (dY)

Date Comparator - Month-Year (dM)

Date Comparator - Day-Month-Year (dD)

Date Comparator - Hour-Day-Month-Year (dH)

Date Comparator - Min-Hour-Day- Month-Year (dm)

Date Comparator - Sec-Min-Hour-Day- Month-Year (ds)

Prorated Comparator (p)

Oracle Java CAPS Match Engine Comparison Function Options

Oracle Java CAPS Match Engine Standardization Configuration

The standardization configuration files define additional logic used by the Oracle Java CAPS Match Engine to standardize specific data types. This logic helps define how fields in incoming records are parsed, standardized, and classified for processing. Standardization files include data patterns files, category files, clues files, key type tables, constants files, and reference files.

The standardization configuration files are stored in the master index project and appear as nodes in the Standardization Engine node of the project. Several standardization files are common to all implementations of the Oracle Java CAPS Match Engine, but each national domain uses a subset of unique files. The common files are listed directly under the Standardization Engine node of the master index project; the files unique to each national domain are listed in individual sub-folders under the Standardization Engine node.

The standardization configuration files for the Oracle Java CAPS Match Engine must follow certain rules for formatting and interdependencies. The following topics provide an overview of the types of configuration files provided for standardization.

Oracle Java CAPS Match Engine Standardization File Types

Several different types of configuration files are included with the Oracle Java CAPS Match Engine, each providing specific information to help the engine standardize and match data according to requirements. Several of these files are common to all supported nationalities, but a small subset is specific to each.

Oracle Java CAPS Match Engine Internationalization

By default, the Oracle Java CAPS Match Engine supports addresses and names originating from Australia, France, Great Britain, and the United States. Each national domain uses a set of common standardization files and a smaller set of unique, domain-specific files to account for international differences in address formats, names, and so on. You can process with your data using the standardization files for a single domain or you can use multiple domains depending on how the Match Field file is configured.