JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Java CAPS Master Index Standardization Engine Reference     Java CAPS Documentation
search filter icon
search icon

Document Information

Oracle Java CAPS Master Index Standardization Engine Reference

About the Master Index Standardization Engine

Related Topics

Master Index Standardization Engine Overview

Standardization Concepts

Data Parsing or Reformatting

Data Normalization

Phonetic Encoding

How the Master Index Standardization Engine Works

Master Index Standardization Engine Data Types and Variants

Master Index Standardization Engine Standardization Components

Finite State Machine Framework

About the Finite State Machine Framework

FSM-Based Configuration

Rules-Based Framework

About the Rules-Based Framework

Rules-Based Configuration

Oracle Java CAPS Master Index Standardization and Matching Process

Master Index Standardization Engine Internationalization

Finite State Machine Framework Configuration

FSM Framework Configuration Overview

Process Definition File

Standardization State Definitions

Input Symbol Definitions

Output Symbol Definitions

Data Cleansing Definitions

Data Normalization Definitions

Standardization Processing Rules Reference

dictionary

fixedString

lexicon

normalizeSpace

pattern

replace

replaceAll

transliterate

uppercase

Lexicon Files

Normalization Files

FSM-Based Person Name Configuration

Person Name Standardization Overview

Person Name Standardization Components

Person Name Standardization Files

Person Name Lexicon Files

Person Name Normalization Files

Person Name Process Definition Files

Person Name Standardization and Oracle Java CAPS Master Index

Person Name Processing Fields

Person Name Standardized Fields

Person Name Object Structure

Configuring a Normalization Structure for Person Names

Configuring a Standardization Structure for Person Names

Configuring Phonetic Encoding for Person Names

FSM-Based Telephone Number Configuration

Telephone Number Standardization Overview

Telephone Number Standardization Components

Telephone Number Standardization Files

Telephone Number Standardization and Oracle Java CAPS Master Index

Telephone Number Processing Fields

Telephone Number Standardized Fields

Telephone Number Object Structure

Configuring a Standardization Structure for Telephone Numbers

Rules-Based Address Data Configuration

Address Data Standardization Overview

Address Data Standardization Components

Address Data Standardization Files

Address Clues File

Address Master Clues File

Address Patterns File

Address Pattern File Components

Address Type Tokens

Pattern Classes

Pattern Modifiers

Priority Indicators

Address Standardization and Oracle Java CAPS Master Index

Address Data Processing Fields

Address Standardized Fields

Address Object Structure

Configuring a Standardization Structure for Address Data

Configuring Phonetic Encoding for Address Data

Rules-Based Business Name Configuration

Business Name Standardization Overview

Business Name Standardization Components

Business Name Standardization Files

Business Name Adjectives Key Type File

Business Alias Key Type File

Business Association Key Type File

Business General Terms Reference File

Business City or State Key Type File

Business Former Name Reference File

Merged Business Name Category File

Primary Business Name Reference File

Business Connector Tokens Reference File

Business Country Key Type File

Business Industry Sector Reference File

Business Industry Key Type File

Business Organization Key Type File

Business Patterns File

Business Name Tokens

Business Name Standardization and Oracle Java CAPS Master Index

Business Name Processing Fields

Business Name Standardized Fields

Business Name Object Structure

Configuring a Standardization Structure for Business Names

Configuring Phonetic Encoding for Business Names

Custom FSM-Based Data Types and Variants

About Custom FSM-Based Data Types and Variants

About the Standardization Packages

Creating Custom FSM-Based Data Types

Creating the Working Directory

To Create the Working Directory

Defining the Service Type

To Define the Service Type

Defining the Variants

To Define the Variants

Packaging and Importing the Data Type

To Package and Import the Data Type

Service Type Definition File

Creating Custom FSM-Based Variants

Creating the Working Directory

To Create the Working Directory

Defining the Service Instance

To Define the Service Instance

Defining the State Model and Processing Rules

To Define the State Model and Processing Rules

Creating Normalization and Lexicon Files

To Create Normalization and Lexicon Files

Packaging and Importing the Variant

To Package and Import the Variant

Service Instance Definition File

Creating Custom FSM-Based Data Types

You can define new data types and their corresponding variants using the flexible FSM framework of the standardization engine. Data types are easily incorporated into a master index project and can be made globally available to all projects. Perform the following steps to define a custom data type for the standardization engine.

Creating the Working Directory

The working directory for custom data types requires a specific structure. At a minimum, the working directory will look similar to the following:

/WorkingDir
   serviceType.xml
   /lib
   /instance
      /Generic
         serviceInstance.xml
         /resource
            standardizer.xml

If the date type has several variants, the directory structure will not include the Generic folder, but will contain several folders named by the variants name in its place. Each variant folder must be of the same structure as the Generic folder shown above. The resource directory might also contain several normalization and lexicon files.

To Create the Working Directory

  1. Create a working directory and add a lib and an instance directory at the top level.
  2. Copy the files standardizer-api.jar and standardizer-impl.jar from /NetBeans_Home/soa2/modules/ext/mdm/standardizer/lib to the lib directory.
  3. Do one of the following:
    • If the data type only has one variant, create the following directory structure in the instance directory:

      /Generic/resource/

    • If the data type has several variants, create the following directory structure in the instance directory for each variant:

      /VariantName/resource/

  4. Continue to Defining the Service Type.

Defining the Service Type

The serviceType.xml file defines information about the data type, and is a required file for each data type.

To Define the Service Type

  1. Create a file named serviceType.xml in your working directory.

    Tip - You can copy the service type file from an existing data type and modify it for your use.


  2. Enter text similar to the following, where description is the name of the data type and the value elements list the tokens, or standardization components, of the data type.
    <serviceType configurationResource="standardizer.xml">
      <description>My Data Type Standardization</description>
      <parameter name="fields">
        <list>
          <value>Data Field1</value>
          <value>Data Field2</value>
          ...
        </list>
      </parameter>
    </serviceType>

    Note - For more information about the elements in this file, see Service Type Definition File.


  3. Save and close the file.
  4. Continue to Defining the Variants.

Defining the Variants

For each data type you create, you need to create one or more variants that define the logic for processing a specific type of data.

To Define the Variants

Perform the following steps for each variant that will be used for the data type you are creating.

  1. Define the service instance, as described in Defining the Service Instance.

    Create the serviceInstance.xml file in /WorkingDir/instance/VariantName.

  2. Define the state model and processing logic, as described in Defining the State Model and Processing Rules.

    Create the standardizer.xml file in /WorkingDir/instance/VariantName/resource.

  3. If needed, create normalization and lexicon files, as described in Creating Normalization and Lexicon Files.

    Create the files in /WorkingDir/instance/VariantName/resource.

  4. Continue to Packaging and Importing the Data Type.

Packaging and Importing the Data Type

Once you have created all the files for the data type, you need to package them into a ZIP file to be imported into a master index application.

To Package and Import the Data Type

  1. In the working directory, select the folders and files at the top level and add them to a ZIP file.
  2. Name the ZIP file the same name as the data type.

    The ZIP file structure should like similar to the following:


    Figure 1 Custom Data Type Zip File

    image:Figure shows the ZIP file package for a custom data type.
  3. Import the file into a master index application as described in Importing Standardization Data Types and Variants in Oracle Java CAPS Master Index Configuration Guide.

Service Type Definition File

Each data type is configured by a service type definition file, serviceType.xml. Service type files define the fields to be standardized for a data type. The following table lists and describes the elements in the service type file.

Element
Attribute
Description
serviceType
A description and any parameters for the data type.
configurationResource
The name of the standardization process file that defines the states and processing for the data type.
description
A brief description of the data type, such as “Address Standardization”.
parameter
A parameter for the configuration resource. By default, the name of the parameter is “fields”, and it is populated with a list of standardized field component names.
name
The name of the parameter.
value
One or more values for the parameter.