Understanding Sun Master Index Configuration Options

SBR, Matching, and Blocking Filter Configuration

In filter.xml, you can define values to be excluded during the SBR calculation, during the matching process, and during the blocking query. The following topics describe the structure of filter.xml and provide information about defining filters.

Master Index Field Filters

Sun Master Index provides the ability to exclude unwanted values during key processes, such as blocking, matching, and SBR calculation. Data coming into a master index application frequently contains default values that are used when the actual value is unknown. One of the most common examples is using “999–99–9999” or “000–00–0000” for a social security number. Another example is the occurrence in patient data when the name of a newborn baby is not yet known and the name is entered as “Baby”, “Baby Boy”, or “Baby Girl”. Retrieving all of these values for a blocking query and performing subsequent matching on these values wastes valuable computer resources. Removing invalid or overused values from these key processes can improve the performance of the master index application.

The following topics provide additional information about each type of filter:

SBR Filters

When the survivor calculator determines the values to populate in the SBR for a record, you want to eliminate any values that obviously do not represent the best value for the field. These are most likely default values that are used when the actual value of a field is unknown. When a filter is defined for a field and a system object contains an excluded value in that field, the survivor calculator ignores that value and uses a value from a different system record for the survivor calculator. If there is only one system record in the enterprise record and that system record contains an excluded value, the excluded value is used for the SBR since there is no other value to use.

As an example, if you define a SBR filter for FirstName to exclude the value “Baby” and an enterprise record contains two system records, one with a FirstName of “Baby” and one with a FirstName of “Joel”, then the value populated into the SBR is “Joel” regardless of how the survivor calculator is defined. If you have the same filter definition with an enterprise record that contains only one system record and the value of the FirstName is “Baby”, then the value populated into the SBR is “Baby”.

Blocking Query Filters

When a message comes in to the master index application, values from the message are used as criteria for the blocking query used for matching. Several queries are created depending on the number of blocks that are defined. If the incoming message contains common default values, the query could result in an inordinate number of possible matches being returned from the master index database for the match process. You can reduce this overhead by excluding known invalid values from blocking query fields, thereby reducing the number of non-matching query results.

As an example, a blocking filter for the Phone field excludes the value “9999999999” and the blocking query contains a block on the FirstName and Phone fields. If an incoming record contains “9999999999” in the Phone field, the blocking query returns no matching records for that specific block of the query. Note that records containing the excluded value might be returned by other blocks in the query that do not include the Phone field.

Match String Filters

When a master index application matches incoming records against records that already exist in the master index database, you want to be sure the composite weights are not artificially inflated due to matching on default values in certain fields. One of the most common problems in matching arises from the SSN (or other national identifier) in person data. This field should be one of the most reliable identifiers of a person since the number is unique to each person and the field is typically required so it should not be null. This means that if the SSN of a person is unknown, the person entering the data must enter some value that is not a valid SSN. Often the numbers “999999999” or “000000000” are used. If an incoming record contains one of these values, the match process returns the full agreement weight for the SSN field against other records containing the default data. We know this match value is meaningless in this case.

You can reduce the number of inaccurate matches and potential matches by defining an exclusion list for specific fields in the match string. When a match filter is defined against a field and an incoming record contains an excluded value, that value is ignored in the match process and does not contribute to the composite match weight.

Exclusion Lists

An exclusion list defines all values to filter out or ignore for a specific field. You can define exclusion lists directly in the filter.xml file or you can create exclusion lists in text files and reference those files from filter.xml. You should create an exclusion list file for each field for which filters are defined, and you might need to create separate files for a field whose excluded values for SBR processing do not match the excluded values for matching or blocking, for example.

The filter.xml File

The filter.xml file provides a template from which you can define filters for the SBR, blocking query, or match process. The default version of the file does not define any exclusions, so you do not need to modify the file if you do not use the filter capability.

The following topics provide information about the filter.xml file.

Modifying filter.xml

You can modify filter.xml using the XML editor. The possible modifications to this file are restricted by the schema definition, so be sure to validate the file after making any changes. When you modify this file, you must regenerate the application and redeploy the project for the changes to take effect.

filter.xml File Structure

filter.xml consists primarily of a list of fields, each with their own filter definitions. Each field is defined within a field element and the filters are defined within a value element. The following table describes the elements and attributes of filter.xml.

Element 

Attribute 

Description 

field

 

A filter definition for one field. The definition includes the following elements and attributes. You can define multiple filter definitions, and each can define filters for the SBR, blocking, matching, or any combination of the three. 

 

sbr

An indicator of whether to apply the filter to the SBR. Specify true to apply the filter to the SBR; otherwise specify false.

 

matching

An indicator of whether to apply the filter to the blocking query. Specify true to apply the filter to the blocking query; otherwise specify false.

 

blocking

An indicator of whether to apply the filter to the matching process. Specify true to apply the filter to the matching process; otherwise specify false.

name

 

The qualified name for the field; for example, Person.SSN or Person.Address.PostalCode. For more information about qualified field names, see Qualified Field Name Notation.

value

 

A list of field-value elements that specify the values to filter.

field-value

 

A value to filter from the SBR, blocking query, or matching process. You can define multiple field values. To use values listed in a flat file, define a file element instead of a field-value element.

file

 

A definition of the file that contains the list of values to filter. 

 

delimiter

The character that delimits the values listed in the exclusion list flat file.  

file-name

 

The path and name of a file that contains the list of values to filter. Be sure the values in this file are delimited by the character specified above. 

filter.xml Example

The following example defines a filter for the SSN field for the SBR only, filtering out the values “999–99–9999” and “000–00–0000”. When the survivor calculator determines that the field value for the SBR should be “999–99–9999” or ”000–00–0000”, the survivor calculator ignores that value and either chooses a different value or ignores the field altogether, depending on how survivorship is defined.


<field sbr="true" matching="false" blocking="false">
  <name>Person.SSN</name>
  <value>
    <field-value>"999-99-9999"</field-value>
    <field-value>"000-00-0000"</field-value>
  </value>
</field>

The following example defines an exclusion list for matching and blocking, but not for the SBR. When a blocking query executes a query block that includes the DOB, it checks the values in the exclusion list and ignores any records where the DOB matches one of the values. When match weights are being generated, DOB fields that contain values found in the exclusion list are ignored.


<field sbr="false" matching="true" blocking="true">
  <name>Person.DOB</name>
  <value>
    <file delimter=";">
      <file-name>"./filters/DOB.txt"</file-name>
    </file>
  </value>
</field>

The exclusion list file for the above example would look similar to the following:


0000000000;2222222222;3333333333;...