You can define new data types and their corresponding variants using the flexible FSM framework of the standardization engine. Data types are easily incorporated into a master index project and can be made globally available to all projects. Perform the following steps to define a custom data type for the standardization engine.
The working directory for custom data types requires a specific structure. At a minimum, the working directory will look similar to the following:
/WorkingDir serviceType.xml /lib /instance /Generic serviceInstance.xml /resource standardizer.xml |
If the date type has several variants, the directory structure will not include the Generic folder, but will contain several folders named by the variants name in its place. Each variant folder must be of the same structure as the Generic folder shown above. The resource directory might also contain several normalization and lexicon files.
Create a working directory and add a lib and an instance directory at the top level.
Copy the files standardizer-api.jar and standardizer-impl.jar from /NetBeans_Home/soa2/modules/ext/mdm/standardizer/lib to the lib directory.
Do one of the following:
Continue to Defining the Service Type.
The serviceType.xml file defines information about the data type, and is a required file for each data type.
Create a file named serviceType.xml in your working directory.
You can copy the service type file from an existing data type and modify it for your use.
Enter text similar to the following, where description is the name of the data type and the value elements list the tokens, or standardization components, of the data type.
<serviceType configurationResource="standardizer.xml"> <description>My Data Type Standardization</description> <parameter name="fields"> <list> <value>Data Field1</value> <value>Data Field2</value> ... </list> </parameter> </serviceType> |
For more information about the elements in this file, see Service Type Definition File.
Save and close the file.
Continue to Defining the Variants.
For each data type you create, you need to create one or more variants that define the logic for processing a specific type of data.
Perform the following steps for each variant that will be used for the data type you are creating.
Define the service instance, as described in Defining the Service Instance.
Create the serviceInstance.xml file in /WorkingDir/instance/VariantName.
Define the state model and processing logic, as described in Defining the State Model and Processing Rules.
Create the standardizer.xml file in /WorkingDir/instance/VariantName/resource.
If needed, create normalization and lexicon files, as described in Creating Normalization and Lexicon Files.
Create the files in /WorkingDir/instance/VariantName/resource.
Continue to Packaging and Importing the Data Type.
Once you have created all the files for the data type, you need to package them into a ZIP file to be imported into a master index application.
In the working directory, select the folders and files at the top level and add them to a ZIP file.
Name the ZIP file the same name as the data type.
The ZIP file structure should like similar to the following:
Import the file into a master index application as described in Importing Standardization Data Types and Variants in Configuring Sun Master Indexes .
Each data type is configured by a service type definition file, serviceType.xml. Service type files define the fields to be standardized for a data type. The following table lists and describes the elements in the service type file.
Element |
Attribute |
Description |
---|---|---|
serviceType |
A description and any parameters for the data type. |
|
configurationResource |
The name of the standardization process file that defines the states and processing for the data type. |
|
description |
A brief description of the data type, such as “Address Standardization”. |
|
parameter |
A parameter for the configuration resource. By default, the name of the parameter is “fields”, and it is populated with a list of standardized field component names. |
|
name |
The name of the parameter. |
|
value |
One or more values for the parameter. |