Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Master Index Standardization Engine Reference Java CAPS Documentation |
Oracle Java CAPS Master Index Standardization Engine Reference
About the Master Index Standardization Engine
Master Index Standardization Engine Overview
How the Master Index Standardization Engine Works
Master Index Standardization Engine Data Types and Variants
Master Index Standardization Engine Standardization Components
Finite State Machine Framework
About the Finite State Machine Framework
About the Rules-Based Framework
Oracle Java CAPS Master Index Standardization and Matching Process
Master Index Standardization Engine Internationalization
Finite State Machine Framework Configuration
FSM Framework Configuration Overview
Standardization State Definitions
Data Normalization Definitions
Standardization Processing Rules Reference
FSM-Based Person Name Configuration
Person Name Standardization Overview
Person Name Standardization Components
Person Name Standardization Files
Person Name Normalization Files
Person Name Process Definition Files
Person Name Standardization and Oracle Java CAPS Master Index
Person Name Standardized Fields
Configuring a Normalization Structure for Person Names
Configuring a Standardization Structure for Person Names
Configuring Phonetic Encoding for Person Names
FSM-Based Telephone Number Configuration
Telephone Number Standardization Overview
Telephone Number Standardization Components
Telephone Number Standardization Files
Telephone Number Standardization and Oracle Java CAPS Master Index
Telephone Number Processing Fields
Telephone Number Standardized Fields
Telephone Number Object Structure
Configuring a Standardization Structure for Telephone Numbers
Rules-Based Address Data Configuration
Address Data Standardization Overview
Address Data Standardization Components
Address Data Standardization Files
Address Pattern File Components
Address Standardization and Oracle Java CAPS Master Index
Address Data Processing Fields
Configuring a Standardization Structure for Address Data
Configuring Phonetic Encoding for Address Data
Rules-Based Business Name Configuration
Business Name Standardization Overview
Business Name Standardization Components
Business Name Standardization Files
Business Name Adjectives Key Type File
Business Association Key Type File
Business General Terms Reference File
Business City or State Key Type File
Business Former Name Reference File
Merged Business Name Category File
Primary Business Name Reference File
Business Connector Tokens Reference File
Business Country Key Type File
Business Industry Sector Reference File
Business Industry Key Type File
Business Organization Key Type File
Business Name Standardization and Oracle Java CAPS Master Index
Business Name Processing Fields
Business Name Standardized Fields
Business Name Object Structure
Configuring a Standardization Structure for Business Names
Configuring Phonetic Encoding for Business Names
Custom FSM-Based Data Types and Variants
About Custom FSM-Based Data Types and Variants
About the Standardization Packages
Creating Custom FSM-Based Data Types
Creating the Working Directory
To Create the Working Directory
Packaging and Importing the Data Type
Creating Custom FSM-Based Variants
Creating the Working Directory
To Create the Working Directory
To Define the Service Instance
Defining the State Model and Processing Rules
To Define the State Model and Processing Rules
Creating Normalization and Lexicon Files
To Create Normalization and Lexicon Files
Packaging and Importing the Variant
You can define new data types and their corresponding variants using the flexible FSM framework of the standardization engine. Data types are easily incorporated into a master index project and can be made globally available to all projects. Perform the following steps to define a custom data type for the standardization engine.
The working directory for custom data types requires a specific structure. At a minimum, the working directory will look similar to the following:
/WorkingDir serviceType.xml /lib /instance /Generic serviceInstance.xml /resource standardizer.xml
If the date type has several variants, the directory structure will not include the Generic folder, but will contain several folders named by the variants name in its place. Each variant folder must be of the same structure as the Generic folder shown above. The resource directory might also contain several normalization and lexicon files.
/Generic/resource/
/VariantName/resource/
The serviceType.xml file defines information about the data type, and is a required file for each data type.
Tip - You can copy the service type file from an existing data type and modify it for your use.
<serviceType configurationResource="standardizer.xml"> <description>My Data Type Standardization</description> <parameter name="fields"> <list> <value>Data Field1</value> <value>Data Field2</value> ... </list> </parameter> </serviceType>
For each data type you create, you need to create one or more variants that define the logic for processing a specific type of data.
Perform the following steps for each variant that will be used for the data type you are creating.
Create the serviceInstance.xml file in /WorkingDir/instance/VariantName.
Create the standardizer.xml file in /WorkingDir/instance/VariantName/resource.
Create the files in /WorkingDir/instance/VariantName/resource.
Once you have created all the files for the data type, you need to package them into a ZIP file to be imported into a master index application.
The ZIP file structure should like similar to the following:
Figure 1 Custom Data Type Zip File
Each data type is configured by a service type definition file, serviceType.xml. Service type files define the fields to be standardized for a data type. The following table lists and describes the elements in the service type file.
|