Configuring Sun Master Indexes

Filtering Default Values From Master Index Processes

You might want to prevent certain default values from being used in the match process or from being populated into the single best record (SBR). For the blocking query, default values result in a large number of non-matching records being returned for a match query. For the match process, default values result in inaccurately high match weights. Finally, you do not want invalid values to appear in the SBR when it is possible to remove them.

The master index application provides a configuration file, filter.xml, in which you can define filtering rules. You can specify fields and values to filter directly in this file, and you can create exclusion lists in a flat file. Exclusion lists contain all the values to filter for a specific field.

ProcedureTo Filter Default Values From the SBR, Blocking Query, or Match Process

  1. To use flat files to define values to filter, create a flat file for each field to be filtered.

  2. In this file, list each value to exclude for each filter definition you will create and separate each value by a delimiter.

  3. In the Projects window, expand the master index application and then expand Filter.

  4. Open filter.xml.

  5. For each field to be filtered, define the elements and attributes described in Filter Definition Elements.

  6. Save and close the file.


Example 5 Filter Definitions

The filter.xml file provides a simple format for you to define filters for survivor calculation, blocking queries, and matching. The following sample defines a rule that filters out the listed values from the FirstName field in the SBR. It defines a second rule that filters out the values listed in FirstNameFilters.txt from the SSN field.


<exclusion-List module-name="ExclusionFilter" 
parser-class="com.sun.mdm.index.configurator.impl.ExclusionFilterConfig">
  <field sbr="true" matching="false" blocking="false">
    <name>Person.FirstName</name>
    <value
      <field-value>Baby</field-value>
      <field-value>Baby Boy</field-value>
      <field-value>Baby Girl</field-value>
    </value>
  </field>
  <field sbr="true" matching="true" blocking="true">
    <name>Person.SSN</name>
    <value>
      <file delimiter="|">
        <file-name>SSN.txt</file-name>
      </file>
    </value>
</exclusion-List>

The filter values file for the SSN field in the above example would look similar to the following:


000000000|111111111|222222222|333333333|444444444|555555555|666666666|777777777|
888888888|999999999

Filter Definition Elements

The following table lists and describes the XML elements and attributes that define the filters to be used for the SBR, blocking query, or match process. You can either define the exclusion lists directly in filter.xml or in a flat file that is referenced from filter.xml.

Element 

Attribute 

Description 

field

 

A filter definition for one field. The definition includes the following elements and attributes. You can define multiple filter definitions, and each can define filters for the SBR, blocking, matching, or any combination of the three. 

 

sbr

An indicator of whether to apply the filter to the SBR. Specify true to apply the filter to the SBR; otherwise specify false.

 

matching

An indicator of whether to apply the filter to the blocking query. Specify true to apply the filter to the blocking query; otherwise specify false.

 

blocking

An indicator of whether to apply the filter to the matching process. Specify true to apply the filter to the matching process; otherwise specify false.

name

 

The qualified name for the field; for example, Person.SSN or Person.Address.PostalCode. For more information about qualified field names, see Master Index Field Notations in Understanding Sun Master Index Configuration Options .

value

 

A list of field-value elements that specify the values to filter.

field-value

 

A value to filter from the SBR, blocking query, or matching process. You can define multiple field values. To use values listed in a flat file, define a file element instead of a field-value element.

file

 

A definition of the file that contains the list of values to filter. 

 

delimiter

The character that delimits the values listed in the exclusion list flat file.  

file-name

 

The path and name of a file that contains the list of values to filter. Be sure the values in this file are delimited by the character specified above.