Sun Master Index provides the ability to exclude unwanted values during key processes, such as blocking, matching, and SBR calculation. Data coming into a master index application frequently contains default values that are used when the actual value is unknown. One of the most common examples is using “999–99–9999” or “000–00–0000” for a social security number. Another example is the occurrence in patient data when the name of a newborn baby is not yet known and the name is entered as “Baby”, “Baby Boy”, or “Baby Girl”. Retrieving all of these values for a blocking query and performing subsequent matching on these values wastes valuable computer resources. Removing invalid or overused values from these key processes can improve the performance of the master index application.
The following topics provide additional information about each type of filter:
When the survivor calculator determines the values to populate in the SBR for a record, you want to eliminate any values that obviously do not represent the best value for the field. These are most likely default values that are used when the actual value of a field is unknown. When a filter is defined for a field and a system object contains an excluded value in that field, the survivor calculator ignores that value and uses a value from a different system record for the survivor calculator. If there is only one system record in the enterprise record and that system record contains an excluded value, the excluded value is used for the SBR since there is no other value to use.
As an example, if you define a SBR filter for FirstName to exclude the value “Baby” and an enterprise record contains two system records, one with a FirstName of “Baby” and one with a FirstName of “Joel”, then the value populated into the SBR is “Joel” regardless of how the survivor calculator is defined. If you have the same filter definition with an enterprise record that contains only one system record and the value of the FirstName is “Baby”, then the value populated into the SBR is “Baby”.
When a message comes in to the master index application, values from the message are used as criteria for the blocking query used for matching. Several queries are created depending on the number of blocks that are defined. If the incoming message contains common default values, the query could result in an inordinate number of possible matches being returned from the master index database for the match process. You can reduce this overhead by excluding known invalid values from blocking query fields, thereby reducing the number of non-matching query results.
As an example, a blocking filter for the Phone field excludes the value “9999999999” and the blocking query contains a block on the FirstName and Phone fields. If an incoming record contains “9999999999” in the Phone field, the blocking query returns no matching records for that specific block of the query. Note that records containing the excluded value might be returned by other blocks in the query that do not include the Phone field.
When a master index application matches incoming records against records that already exist in the master index database, you want to be sure the composite weights are not artificially inflated due to matching on default values in certain fields. One of the most common problems in matching arises from the SSN (or other national identifier) in person data. This field should be one of the most reliable identifiers of a person since the number is unique to each person and the field is typically required so it should not be null. This means that if the SSN of a person is unknown, the person entering the data must enter some value that is not a valid SSN. Often the numbers “999999999” or “000000000” are used. If an incoming record contains one of these values, the match process returns the full agreement weight for the SSN field against other records containing the default data. We know this match value is meaningless in this case.
You can reduce the number of inaccurate matches and potential matches by defining an exclusion list for specific fields in the match string. When a match filter is defined against a field and an incoming record contains an excluded value, that value is ignored in the match process and does not contribute to the composite match weight.
An exclusion list defines all values to filter out or ignore for a specific field. You can define exclusion lists directly in the filter.xml file or you can create exclusion lists in text files and reference those files from filter.xml. You should create an exclusion list file for each field for which filters are defined, and you might need to create separate files for a field whose excluded values for SBR processing do not match the excluded values for matching or blocking, for example.