JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Java CAPS Master Index Standardization Engine Reference     Java CAPS Documentation
search filter icon
search icon

Document Information

Oracle Java CAPS Master Index Standardization Engine Reference

About the Master Index Standardization Engine

Related Topics

Master Index Standardization Engine Overview

Standardization Concepts

Data Parsing or Reformatting

Data Normalization

Phonetic Encoding

How the Master Index Standardization Engine Works

Master Index Standardization Engine Data Types and Variants

Master Index Standardization Engine Standardization Components

Finite State Machine Framework

About the Finite State Machine Framework

FSM-Based Configuration

Rules-Based Framework

About the Rules-Based Framework

Rules-Based Configuration

Oracle Java CAPS Master Index Standardization and Matching Process

Master Index Standardization Engine Internationalization

Finite State Machine Framework Configuration

FSM Framework Configuration Overview

Process Definition File

Standardization State Definitions

Input Symbol Definitions

Output Symbol Definitions

Data Cleansing Definitions

Data Normalization Definitions

Standardization Processing Rules Reference

dictionary

fixedString

lexicon

normalizeSpace

pattern

replace

replaceAll

transliterate

uppercase

Lexicon Files

Normalization Files

FSM-Based Person Name Configuration

Person Name Standardization Overview

Person Name Standardization Components

Person Name Standardization Files

Person Name Lexicon Files

Person Name Normalization Files

Person Name Process Definition Files

Person Name Standardization and Oracle Java CAPS Master Index

Person Name Processing Fields

Person Name Standardized Fields

Person Name Object Structure

Configuring a Normalization Structure for Person Names

Configuring a Standardization Structure for Person Names

Configuring Phonetic Encoding for Person Names

FSM-Based Telephone Number Configuration

Telephone Number Standardization Overview

Telephone Number Standardization Components

Telephone Number Standardization Files

Telephone Number Standardization and Oracle Java CAPS Master Index

Telephone Number Processing Fields

Telephone Number Standardized Fields

Telephone Number Object Structure

Configuring a Standardization Structure for Telephone Numbers

Rules-Based Address Data Configuration

Address Data Standardization Overview

Address Data Standardization Components

Address Data Standardization Files

Address Clues File

Address Master Clues File

Address Patterns File

Address Pattern File Components

Address Type Tokens

Pattern Classes

Pattern Modifiers

Priority Indicators

Address Standardization and Oracle Java CAPS Master Index

Address Data Processing Fields

Address Standardized Fields

Address Object Structure

Configuring a Standardization Structure for Address Data

Configuring Phonetic Encoding for Address Data

Rules-Based Business Name Configuration

Business Name Standardization Overview

Business Name Standardization Components

Business Name Standardization Files

Business Name Adjectives Key Type File

Business Alias Key Type File

Business Association Key Type File

Business General Terms Reference File

Business City or State Key Type File

Business Former Name Reference File

Merged Business Name Category File

Primary Business Name Reference File

Business Connector Tokens Reference File

Business Country Key Type File

Business Industry Sector Reference File

Business Industry Key Type File

Business Organization Key Type File

Business Patterns File

Business Name Tokens

Business Name Standardization and Oracle Java CAPS Master Index

Business Name Processing Fields

Business Name Standardized Fields

Business Name Object Structure

Configuring a Standardization Structure for Business Names

Configuring Phonetic Encoding for Business Names

Custom FSM-Based Data Types and Variants

About Custom FSM-Based Data Types and Variants

About the Standardization Packages

Creating Custom FSM-Based Data Types

Creating the Working Directory

To Create the Working Directory

Defining the Service Type

To Define the Service Type

Defining the Variants

To Define the Variants

Packaging and Importing the Data Type

To Package and Import the Data Type

Service Type Definition File

Creating Custom FSM-Based Variants

Creating the Working Directory

To Create the Working Directory

Defining the Service Instance

To Define the Service Instance

Defining the State Model and Processing Rules

To Define the State Model and Processing Rules

Creating Normalization and Lexicon Files

To Create Normalization and Lexicon Files

Packaging and Importing the Variant

To Package and Import the Variant

Service Instance Definition File

Business Name Standardization and Oracle Java CAPS Master Index

Master index applications rely on the Master Index Standardization Engine to process business data. To ensure correct processing of business information, you need to customize the Matching Service for the master index application according to the rules defines for the standardization engine. This includes modifying mefa.xml to define parsing and phonetic encoding of the appropriate fields. You can modify mefa.xml using the Master Index Configuration Editor.

Standardization is defined in the StandardizationConfig section of mefa.xml, which is described in detail in Match Field Configuration in Oracle Java CAPS Master Index Configuration Reference . To configure the required fields for parsing and normalization, modify the standardization structure in mefa.xml. To configure phonetic encoding, modify the phonetic encoding structure.

Generally, the BusinessName data type processes data that requires parsing prior to processing. You should not need to configure fields to normalize for business names. The following topics provide information about the fields used in processing business names and how to configure standardization for a master index application. The information provided in these topics is based on the default configuration.

Business Name Processing Fields

When standardizing free-form business names, not all fields in a record need to be processed by the Master Index Standardization Engine. The standardization engine only needs to process fields that must be parsed, normalized, or phonetically converted. For a master index application, these fields are defined in mefa.xml, and processing logic for each field is defined in the Standardization Engine node configuration files.

Business Name Standardized Fields

The Master Index Standardization Engine expects that business name data will be provided in a free-form text field containing several components that must be parsed. By default, the match engine is configured to parse these components, and to normalize and phonetically encode the business name. You can specify additional fields for phonetic encoding.

If you specify the BusinessName match type for any field in the wizard, a standardization structure for that field is defined in mefa.xml. The fields listed underBusiness Name Object Structure are automatically defined as the target fields. If you do not specify business name fields for matching in the wizard but want to standardize the fields, you can create a standardization structure in mefa.xml

Business Name Object Structure

For the default configuration of the BusinessName data type, the name field specified for standardization is parsed into several additional fields, one of which is also normalized. If you specify the BusinessName match type in the wizard, the following fields are automatically added to the object structure and database creation script.

You can add these fields manually if you do not specify a match type in the wizard.

Configuring a Standardization Structure for Business Names

For free–form business name fields, the source fields you define for parsing should include the standardization components that are predefined for parsing and normalization. For example, fields containing business information can include any of the field components listed in Business Name Standardization Components. The target fields can include any of these parsed fields. Follow the instructions under Defining Master Index Standardization Rules in Oracle Java CAPS Master Index Configuration Guide to define fields for standardization. For the standardization-type element, enter BusinessName. For a list of field IDs to use in the standardized-object-field-id element, see Business Name Standardization Components.


Note - In the default configuration, the rules defined for the address data type assume that all input fields must be parsed as well as normalized. Thus, there is no need to configure fields only for normalization.


A sample standardization structure for business names is shown below. This structure parses a business name field into these standard business name fields: name, organization type, association type, sector, industry, and URL. Note that there is no domain selector specified, which would normally default to the United States domain; however, since business names are not variant dependent, it is irrelevant here.

<free-form-texts-to-standardize>
   <group standardization-type="BusinessName">
      <unstandardized-source-fields>
         <unstandardized-source-field-name>Company.Name    
         </unstandardized-source-field-name>
      </unstandardized-source-fields>
      <standardization-targets>
         <target-mapping>
            <standardized-object-field-id>PrimaryName
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Name
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>OrgTypekeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_OrgType
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>AssocTypeKeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_AssocType
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>IndustrySectorList
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Sector
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>IndustryTypeKeyword
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_Industry
            </standardized-target-field-name>
         </target-mapping>
         <target-mapping>
            <standardized-object-field-id>Url
            </standardized-object-field-id>
            <standardized-target-field-name>Company.Name_URL
            </standardized-target-field-name>
         </target-mapping>
      </standardization-targets>
   </group>
</free-form-texts-to-standardize>

Configuring Phonetic Encoding for Business Names

When you match or standardize on business name fields, the business name field should be specified for phonetic conversion (by default, the wizard defines this for you). Follow the instructions under Defining Phonetic Encoding for the Master Index in Oracle Java CAPS Master Index Configuration Guide to define fields for phonetic encoding.

A sample of the phoneticize-fields element is shown below. This sample only converts the business name. You can define additional fields for phonetic encoding.

<phoneticize-fields>
   <phoneticize-field>
      <unphoneticized-source-field-name>Company.Name_Name
      </unphoneticized-source-field-name>
      <phoneticized-target-field-name>Company.Name_NamePhon
      </phoneticized-target-field-name>
      <encoding-type>NYSIIS</encoding-type>
   </phoneticize-field>
</phoneticize-fields>