Understanding the Master Index Standardization Engine

Creating Custom FSM-Based Data Types

You can define new data types and their corresponding variants using the flexible FSM framework of the standardization engine. Data types are easily incorporated into a master index project and can be made globally available to all projects. Perform the following steps to define a custom data type for the standardization engine.

Creating the Working Directory

The working directory for custom data types requires a specific structure. At a minimum, the working directory will look similar to the following:


/WorkingDir
   serviceType.xml
   /lib
   /instance
      /Generic
         serviceInstance.xml
         /resource
            standardizer.xml

If the date type has several variants, the directory structure will not include the Generic folder, but will contain several folders named by the variants name in its place. Each variant folder must be of the same structure as the Generic folder shown above. The resource directory might also contain several normalization and lexicon files.

ProcedureTo Create the Working Directory

  1. Create a working directory and add a lib and an instance directory at the top level.

  2. Copy the files standardizer-api.jar and standardizer-impl.jar from /NetBeans_Home/soa2/modules/ext/mdm/standardizer/lib to the lib directory.

  3. Do one of the following:

    • If the data type only has one variant, create the following directory structure in the instance directory:

      /Generic/resource/

    • If the data type has several variants, create the following directory structure in the instance directory for each variant:

      /VariantName/resource/

  4. Continue to Defining the Service Type.

Defining the Service Type

The serviceType.xml file defines information about the data type, and is a required file for each data type.

ProcedureTo Define the Service Type

  1. Create a file named serviceType.xml in your working directory.


    Tip –

    You can copy the service type file from an existing data type and modify it for your use.


  2. Enter text similar to the following, where description is the name of the data type and the value elements list the tokens, or standardization components, of the data type.


    <serviceType configurationResource="standardizer.xml">
      <description>My Data Type Standardization</description>
      <parameter name="fields">
        <list>
          <value>Data Field1</value>
          <value>Data Field2</value>
          ...
        </list>
      </parameter>
    </serviceType>

    Note –

    For more information about the elements in this file, see Service Type Definition File.


  3. Save and close the file.

  4. Continue to Defining the Variants.

Defining the Variants

For each data type you create, you need to create one or more variants that define the logic for processing a specific type of data.

ProcedureTo Define the Variants

Perform the following steps for each variant that will be used for the data type you are creating.

  1. Define the service instance, as described in Defining the Service Instance.

    Create the serviceInstance.xml file in /WorkingDir/instance/VariantName.

  2. Define the state model and processing logic, as described in Defining the State Model and Processing Rules.

    Create the standardizer.xml file in /WorkingDir/instance/VariantName/resource.

  3. If needed, create normalization and lexicon files, as described in Creating Normalization and Lexicon Files.

    Create the files in /WorkingDir/instance/VariantName/resource.

  4. Continue to Packaging and Importing the Data Type.

Packaging and Importing the Data Type

Once you have created all the files for the data type, you need to package them into a ZIP file to be imported into a master index application.

ProcedureTo Package and Import the Data Type

  1. In the working directory, select the folders and files at the top level and add them to a ZIP file.

  2. Name the ZIP file the same name as the data type.

    The ZIP file structure should like similar to the following:

    Figure 1 Custom Data Type Zip File

    Figure shows the ZIP file package for a custom data type.

  3. Import the file into a master index application as described in Importing Standardization Data Types and Variants in Configuring Sun Master Indexes .

Service Type Definition File

Each data type is configured by a service type definition file, serviceType.xml. Service type files define the fields to be standardized for a data type. The following table lists and describes the elements in the service type file.

Element 

Attribute 

Description 

serviceType 

 

A description and any parameters for the data type. 

 

configurationResource 

The name of the standardization process file that defines the states and processing for the data type.  

description 

 

A brief description of the data type, such as “Address Standardization”. 

parameter 

 

A parameter for the configuration resource. By default, the name of the parameter is “fields”, and it is populated with a list of standardized field component names. 

 

name 

The name of the parameter.  

 

value 

One or more values for the parameter.