|Skip Navigation Links|
|Exit Print View|
|Oracle Java CAPS Master Index Standardization Engine Reference Java CAPS Documentation|
The flexible framework of the Master Index Standardization Engine allows you to define new FSM-based variants on existing FSM-based data types so you can standardize different categories of the same type of data. For example, you might need to standardize names from several different countries. Variants are easily incorporated into a master index project and can be made globally available to all projects. Perform the following steps to create a custom variant.
The working directory for custom variants requires a specific structure. At a minimum, the working directory will look similar to the following:
/WorkingDir serviceInstance.xml /resource standardizer.xml
The resource directory might also contain several normalization and lexicon files.
The serviceInstance.xml file for each variant defines the name of the variant, the data type it modifies, and additional Java class information.
Tip - You can copy a service instance file from an existing variant in the data type to which you will add the new variant, and then modify it for the new variant.
This example defines a new Spanish variant to the PersonName data type.
<serviceInstance type="PersonName"> <description>Person Name Standardization: Spain</description> <parameter name="dataType" value="PersonName" /> <parameter name="variantType" value="SP" /> <componentManagerFactory class="com.sun.inti.components.component.BeanComponentManagerFactory"> <property name="stylesheetURL" value="classpath:/com/sun/mdm/standardizer/impl/standardizer.xsl"/> <property name="urlSource" > <bean class="com.sun.inti.components.url.ResourceURLSource"> <property name="resourceName" value="standardizer.xml /> </bean> </property> </componentManagerFactory> </serviceInstance>
Note - The value you enter for the variantType parameter must match the name you want the variant to display in the Standardization folder of the master index project.
The state model defines how the data is read, tokenized, parsed, and modified during standardization. The state model and processing rules are all defined in the standardizer.xml file.
Before you begin this step, determine the different forms in which the data to be standardized can be presented and how it should be standardized for each form. For example, name data might be in the form “First Name, Last Name, Middle Initial” or in the form “First Name, Middle Name, Last Name”. You need to account for each possibility. Determine each state in the process, and the input and output symbols used by each state. It might be useful to create a finite state machine model, as shown below. The model shows each state, the transitions to and from each state, and the output symbol for each state.
Figure 2 Sample Finite State Machine Model
For more information about the FSM model, see FSM Framework Configuration Overview.
Tip - You can copy the file from an existing variant in the data type to which you are adding the custom variant. Then you can modify the file for the new variant.
For information about the state model and the elements that define it, see Standardization State Definitions.
Note - The next several steps use the processing rules described in Standardization Processing Rules Reference. Some of these rules might require that you create normalization and lexicon files.
For more information, see Input Symbol Definitions.
For more information, see Output Symbol Definitions.
For more information, see Data Cleansing Definitions.
Lexicon files list the possible values for a field so the standardization engine can quickly and accurately recognize different field components. Normalization files list the nonstandard values that might be found in a field along with the standard version so the standardization engine can present a common form for the data. You need to create a file for each lexicon or normalization file you referenced from standardizer.xml.
COR|COURT CRT|COURT CR.|COURT CT|COURT CT.|COURT DR|DRIVE DR.|DRIVE DRV|DRIVE ...
E EAST ET N NO NORTH NTH S SO SOUTH ...
Once you have created all the files for the variant, you need to package them into a ZIP file to be imported into a master index application.
The ZIP file structure should be similar to the following. Note that this variant includes several normalization and lexicon files. Your variant might not contain any.
Figure 3 Custom Variant Zip File
Each data type variant is configured by a service definition file. Service type files define the fields to be standardized for a data type, and service instance definition files define the variant and Java factory class for the variant. Both files are in XML format.