Analyzing and Cleansing Data for Sun Master Index

Conditional Data Rules

You can define conditional rules to use in conjunction with the validation and transformation rules described in Data Validation Rules and Data Transformation Rules. Conditional rules return either true or false. They only define a condition and not an action, so they must be used with other types of rules.

Conditional rules use if, then, and else statements in the following format:


<rule>
  <if>
    <condition>
      ...
    </condition>
    <then>
      ...
    </then>
    <else>
      ...
    </else>
  </if>
</rule>

The following conditional rules are predefined:

In addition, you can use the conditional operators described in Conditional Operators.

dataLength

This rule checks the length of the value of the specified field against the length defined in object.xml. You can check for lengths greater than or equal to the defined length or less than the defined length. This rule returns “true” if the length matches the specified length range; otherwise it returns “false”. The syntax for dataLength is:


<dataLength fieldName="name" len="length" more="true/false"/>

The parameters for dataLength are:


Example 16 Sample dataLength Conditional Rule

The following sample checks postal code fields to verify they do not exceed the United States length of five characters. If a postal code containing more than five characters is found, the record is rejected an written to the bad data file. If a postal code contains five or fewer characters, the field is returned as good.


<rule>
  <if>
    <condition>
      <dataLength fieldName="Person.Address.PostalCode" len="6" more="true"/>
    </condition>
    <then>
      <reject fieldName="Person.Address.PostalCode"/>
    </then>
    <else>
      <return fieldName="Person.Address.PostalCode"/>
    </else>
  </if>
</rule>

equals

This rule checks whether a specific value is equal to or found within the value of the specified field. This rule returns true if the conditions are matched; otherwise it returns false. The syntax for equals is:


<equals fieldName="name" value2="value" exact="true/false"/>

The parameters for equals are:


Example 17 Sample equals Rule

The following rule checks whether the value of the gender field equals 2, FEM, or FEMALE. If any of those values are found, they are changed to “F”. This standardizes the values that indicate a gender of female to the correct processing code for female. If the three listed values are not found in the gender field, the record is returned as good data.


<rule>
  <if>
    <condition>
      <or>
        <equals value1="Person.Gender" value2="FEMALE" exact="true"/>
        <equals value1="Person.Gender" value2="2" exact="true"/>
        <equals value1="Person.Gender" value2="FEM" exact="true"/>
        <equals value1="Person.Gender" value2="FML" exact="true"/>
      </or>
    </condition>
    <then>
      <assign fieldName="Person.Gender" value="F"/>
    </then>
    <else>
      <return fieldName="Person.Gender"/>
    </else>
  </if>
</rule>

isnull

This rule checks the value of the specified field and returns true if the field value is null. The syntax for isnull is:


<isnull fieldName="name"/>

The parameter for isnull is:


Example 18 Sample isnull Rule

The following sample checks the first name field for null values. If any first names are null, the record is rejected and written to the bad data file. Otherwise, the field is returned as good data.


<rule>
  <if>
    <condition>
      <isnull fieldName="Person.FirstName"/>
    </condition>
    <then>
      <reject fieldName="Person.FirstName"/>
    </then>
    <else>
      <return fieldName="Person.LastName"/>
    </else>
  </if>
</rule>

matches

This rule checks the value of the specified field for a certain pattern. If the pattern is found, the rule returns true. Use regular expressions to define the pattern. For more information, see the Javadoc for java.util.regex. The syntax for matches is:


<matches fieldName="name" pattern="pattern"/>

The parameters for matches are:


Example 19 Sample matches Rule

The following sample checks for phone numbers of the format (ddd)ddd-dddd (where “d” is a digit). Phone fields that are in that format are returned as good data. Fields that are not in that format are rejected and the records are written to the bad data file.


<rule>
  <if>
    <condition>
      <matches fieldName="Person.Phone.Phone" pattern="([0-9]{3})[0-9]{3}-[0-9]{4}"/>
    </condition>
    <then>
      <return fieldName="Person.Phone.Phone"/>
    </then>
    <else>
      <reject fieldName="Person.Phone.Phone"/>
    </else>
  </if>
</rule>

Conditional Operators

You can use the following conditional operators with the conditional rules.Example 20 and Example 21 provide examples of conditional operator usage for multiple conditions.


Example 20 Sample for Nested Conditional Operators

The following sample checks for phone numbers of the format (ddd)ddd-dddd (where “d” is a digit). Phone fields that are in that format are returned as good data. Fields that are not in that format are rechecked to see if the field value is a string of 10 digits. Phone fields that return true for that validation are reformatted to include the parentheses and hyphen. Phone fields that return false for the final validation are rejected and the records are written to the bad data file.


<rule>
  <if>
    <condition>
      <matches fieldName="Person.Phone.Phone" pattern="([0-9]{3})[0-9]{3}-[0-9]{4}"/>
    </condition>
    <then>
      <return fieldName="Person.Phone.Phone"/>
    </then>
    <else>
      <if>
        <condition>
          <matches fieldName="Person.Phone.Phone" pattern="[0-9]{10}"/>
        </condition>
        <then>
          <patternReplace fieldname"Person.Phone.Phone" matchPattern="[0-9]{10}" 
           replace="([0-9]{3})[0-9]{3}-[0-9]{4}"/>
        </then>
        <else>
          <reject fieldName="Person.Phone.Phone"/>
        </else>
      </if>
    </else>
  </if>
</rule>


Example 21 Sample for Conditional Operations Using “and”

The following sample checks postal code fields to verify they do not exceed the United States length of five characters. If a postal code containing more than five characters is found, the record is rejected an written to the bad data file. If a postal code contains five or fewer characters, the field is then checked to see whether it contains nine characters (indicating the extension might be appended to the postal code). If it is nine characters, the value is truncated to five characters, leaving the postal code.


<rule>
  <if>
    <condition>
      <and>
        <dataLength fieldName="Person.Address.PostalCode" len="5" more="true"/>
        <datalength fieldName="Person.Address.PostalCode" len="6" more="false"/>
      </and>
    </condition>
    <then>
      <return fieldName="Person.Address.PostalCode"/>
    </then>
    <else>
      <if>
        <condition>
          <and>
            <dataLength fieldName="Person.Address.PostalCode" len="9" more="true"/>
            <datalength fieldName="Person.Address.PostalCode" len="10" more="false"/>
          </and>
        </condition>
        <then>
          <truncate fieldName="Person.Address.PostalCode"/>
        </then>
        <else>
            <reject fieldName="Person.Address.PostalCode"/>
        </else>
      </if>
    </else>
  </if>
</rule>