Analyzing and Cleansing Data for Sun Master Index

Data Transformation Rules

You can define rules that modify the values or the patterns of certain fields. Use these rules alone to modify a field across all records, or use them within conditional rules to modify the values or patterns only in fields that meet the criteria you define. You can define multiple rules, which are listed in a ruleList element. Each rule you define is contained within a rule element, and each rule element only defines one rule (though one rule can be complex and include conditional rules and operators).

The following transformation rules are predefined to help you cleanse data. These rules can be used in conjunction with conditional rules and operators, as well as with validation rules.

assign

This rule assigns a new value to the specified field. The syntax for assign is:


<assign fieldName="name" value"new_value"/>

The parameters for assign are:


Example 12 Sample assign Rule

The following rule checks whether the value of the gender field equals 2, FEM, or FML. If any of those values are found, they are changed to “F”. This standardizes the values that indicate a gender of female to the correct processing code for female. If the three listed values are not found in the gender field, the record is returned as good data.


<rule>
  <if>
    <condition>
      <or>
        <equals value1="Person.Gender" value2="2" exact="true"/>
        <equals value1="Person.Gender" value2="FEM" exact="true"/>
        <equals value1="Person.Gender" value2="FML" exact="true"/>
      </or>
    </condition>
    <then>
      <assign fieldName="Person.Gender" value="F"/>
    </then>
    <else>
      <return fieldName="Person.Gender"/>
    </else>
  </if>
</rule>

patternReplace

This rule checks the value of the specified field against a pattern. If the patterns match, this rule replaces the existing pattern with a new pattern. Use regular expressions to define the patterns (see the Javadoc for java.util.regex for more information). The syntax for patternReplace is:


<patternReplace fieldName="name" matchPattern"old_pattern" replace="new_pattern"/>

The parameters for patternReplace are:


Example 13 Sample patternReplace Rule

The following sample searches the SSN field for 9–digit values without hyphens. If it finds such values, it inserts hyphens at the appropriate places.


<rule>
  <patternReplace fieldName="Person.SSN" matchPattern="[0-9]{9}"
   replace="[0-9]{3}-[0-9]{2}-[0-9]{4}"/>
</rule>

replace

This rule checks the value of the specified field against a given string. If the field value or part of the field value matches the string you specify, this rule replaces the existing string with a new string. The syntax for replace is:


<replace fieldName="name" matchPattern="old_string" replace="new_string"/>

The parameters for replace are:


Example 14 Sample replace Rule

The following sample looks for the values 1, MALE, or MAL in the gender field. If those values are found, it replaces them with “M” in order to standardize them to the processing code for male. Note that the replacement for MALE is defined before the replacement for MAL. Since this rule looks for and replaces full field values or partial field values, performing the process on MAL first would replace the “MAL” in MALE with “M”, resulting in “ME” for the new field value.


<rule>
  <replace fieldName="Person.Gender" matchPattern="1" replace="M"/>
  <replace fieldName="Person.Gender" matchPattern="MALE" replace="M"/>
  <replace fieldName="Person.Gender" matchPattern="MAL" replace="M"/>
</rule>

truncate

This rule checks the value of the specified field against its length as defined in object.xml. If the field value exceeds the length, the rule truncates the field to its actual length. The syntax for truncate is:


<truncate fieldName="name"/>

The parameter for truncate is:


Example 15 Sample truncate Rule

The following sample checks the lengths defined for the LastName, MaidenName, and MotherMN fields in object.xml. If any of these field values are found to be longer than the defined length, the value is truncated to the exact length defined in object.xml.


<rule>
  <truncate fieldName="Person.LastName"/>
  <truncate fieldName="Person.MaidenName"/>
  <truncate fieldName="Person.MotherMN"/>
</rule>