Skip Navigation Links | |
Exit Print View | |
Oracle Java CAPS Master Index Standardization Engine Reference Java CAPS Documentation |
Oracle Java CAPS Master Index Standardization Engine Reference
About the Master Index Standardization Engine
Master Index Standardization Engine Overview
How the Master Index Standardization Engine Works
Master Index Standardization Engine Data Types and Variants
Master Index Standardization Engine Standardization Components
Finite State Machine Framework
About the Finite State Machine Framework
About the Rules-Based Framework
Oracle Java CAPS Master Index Standardization and Matching Process
Master Index Standardization Engine Internationalization
Finite State Machine Framework Configuration
FSM Framework Configuration Overview
Standardization State Definitions
Data Normalization Definitions
Standardization Processing Rules Reference
FSM-Based Person Name Configuration
Person Name Standardization Overview
Person Name Standardization Components
Person Name Standardization Files
Person Name Normalization Files
Person Name Process Definition Files
Person Name Standardization and Oracle Java CAPS Master Index
Person Name Standardized Fields
Configuring a Normalization Structure for Person Names
Configuring a Standardization Structure for Person Names
Configuring Phonetic Encoding for Person Names
FSM-Based Telephone Number Configuration
Telephone Number Standardization Overview
Telephone Number Standardization Components
Telephone Number Standardization Files
Telephone Number Standardization and Oracle Java CAPS Master Index
Telephone Number Processing Fields
Telephone Number Standardized Fields
Telephone Number Object Structure
Configuring a Standardization Structure for Telephone Numbers
Rules-Based Address Data Configuration
Address Data Standardization Overview
Address Data Standardization Components
Address Data Standardization Files
Address Pattern File Components
Address Standardization and Oracle Java CAPS Master Index
Address Data Processing Fields
Rules-Based Business Name Configuration
Business Name Standardization Overview
Business Name Standardization Components
Business Name Standardization Files
Business Name Adjectives Key Type File
Business Association Key Type File
Business General Terms Reference File
Business City or State Key Type File
Business Former Name Reference File
Merged Business Name Category File
Primary Business Name Reference File
Business Connector Tokens Reference File
Business Country Key Type File
Business Industry Sector Reference File
Business Industry Key Type File
Business Organization Key Type File
Business Name Standardization and Oracle Java CAPS Master Index
Business Name Processing Fields
Business Name Standardized Fields
Business Name Object Structure
Configuring a Standardization Structure for Business Names
Configuring Phonetic Encoding for Business Names
Custom FSM-Based Data Types and Variants
About Custom FSM-Based Data Types and Variants
About the Standardization Packages
Creating Custom FSM-Based Data Types
Creating the Working Directory
To Create the Working Directory
Packaging and Importing the Data Type
To Package and Import the Data Type
Creating Custom FSM-Based Variants
Creating the Working Directory
To Create the Working Directory
To Define the Service Instance
Defining the State Model and Processing Rules
To Define the State Model and Processing Rules
Creating Normalization and Lexicon Files
To Create Normalization and Lexicon Files
Packaging and Importing the Variant
Master index applications rely on the Master Index Standardization Engine to process address data. To ensure correct processing of address information, you need to customize the Matching Service for the master index application according to the rules defines for the standardization engine. This includes modifying mefa.xml to define parsing and phonetic encoding of the appropriate fields. You can use the Master Index Configuration Editor to modify mefa.xml.
Standardization is defined in the StandardizationConfig section of mefa.xml, which is described in detail in Match Field Configuration in Oracle Java CAPS Master Index Configuration Reference . To configure the required fields for parsing and normalization, modify the standardization structure in mefa.xml. To configure phonetic encoding, modify the phonetic encoding structure. You can perform all of these tasks using the Master Index Configuration Editor.
Generally, the address data type processes data that requires parsing prior to processing. You should not need to configure fields to normalize for addresses. The following topics provide information about the fields used in processing address data and how to configure address data standardization for a master index application. The information provided in these topics is based on the default configuration.
When standardizing address data, not all fields in a record need to be processed by the Master Index Standardization Engine. The standardization engine only needs to process address fields that must be parsed, normalized, or phonetically converted. For a master index application, these fields are defined in mefa.xml and processing logic for each field is defined in the Standardization Engine node configuration files.
The Master Index Standardization Engine expects that street address data will be provided in a free-form text field containing several components that must be parsed. By default, the standardization engine is configured to parse these components and to normalize and phonetically encode the street name. You can specify additional fields for phonetic encoding.
If you specify the Address match type for any field in the wizard, a standardization structure for that field is defined in mefa.xml. The fields listed under Address Object Structure are automatically defined as the target fields. Each of these fields has several entries in the standardization structure. This is because different parsed components can be stored in the same field. For example, the house number, post office box number, and rural route identifier are all stored in the house number field. If you do not specify address fields for matching in the wizard but want to standardize the fields, you can create a standardization structure in mefa.xml using the Master Index Configuration Editor.
The address fields specified for standardization are parsed into several additional fields. If you specify the Address match type in the wizard, the following fields are automatically added to the object structure and database creation script.
field_name_HouseNo
field_name_StName
field_name_StDir
field_name_StType
field_name_StPhon
where field_name is the name of the field for which you specified address matching. For example, if you specify the Address match type for the AddressLine1 field, the following fields are automatically added to the structure: AddressLine1_HouseNo, AddressLine1_StName, AddressLine1_StDir, AddressLine1_StType, and AddressLine1_StPhon.
You can add these fields manually if you do not specify a match type in the wizard.
For free–form address fields, the source fields you define for parsing should include the standardization components that are predefined for parsing and normalization. For example, fields containing address information can include any of the field components listed in Address Data Standardization Components. The target fields can include any of these parsed fields. Follow the instructions under Defining Master Index Standardization Rules in Oracle Java CAPS Master Index Configuration Guide to define fields for standardization. For the standardization-type element, enter Address. For a list of field IDs to use in the standardized-object-field-id element, see Address Data Standardization Components.
Note - In the default configuration, the rules defined for the address data type assume that all input fields must be parsed as well as normalized. Thus, there is no need to configure fields only for normalization.
A sample standardization structure for address data is shown below. This structure parses the first two lines of street address into the standard street address fields. Only the United States variant is defined in this structure.
free-form-texts-to-standardize> <group standardization-type="ADDRESS" domain-selector="com.sun.mdm.index.matching.impl.SingleDomainSelectorUS"> <unstandardized-source-fields> <unstandardized-source-field-name>Person.Address[*].Address1 </unstandardized-source-field-name> <unstandardized-source-field-name>Person.Address[*].Address2 </unstandardized-source-field-name> </unstandardized-source-fields> <standardization-targets> <target-mapping> <standardized-object-field-id>HouseNumber </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].HouseNumber </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>RuralRouteIdentif </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].HouseNumber </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>BoxIdentif </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].HouseNumber </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>MatchStreetName </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].StreetName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>RuralRouteDescript </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].StreetName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>BoxDescript </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].StreetName </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>PropDesPrefDirection </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].StreetDir </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>PropDesSufDirection </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].StreetDir </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>StreetNameSufType </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].StreetType </standardized-target-field-name> </target-mapping> <target-mapping> <standardized-object-field-id>StreetNamePrefType </standardized-object-field-id> <standardized-target-field-name>Person.Address[*].StreetType </standardized-target-field-name> </target-mapping> </standardization-targets> </group> </free-form-texts-to-standardize>
When you match or standardize on street address fields, the street name should be specified for phonetic conversion (this is done by default in a master index application). Follow the instructions under Defining Phonetic Encoding for the Master Index in Oracle Java CAPS Master Index Configuration Guide to define fields for phonetic encoding.
A sample of the phoneticize-fields element is shown below. This sample only converts the address street name. You can define additional fields for phonetic encoding.
<phoneticize-fields> <phoneticize-field> <unphoneticized-source-field-name>Person.Address[*].StreetName </unphoneticized-source-field-name> <phoneticized-target-field-name>Person.Address[*].StreetName_Phon </phoneticized-target-field-name> <encoding-type>NYSIIS</encoding-type> </phoneticize-field> </phoneticize-fields>