Analyzing and Cleansing Data for Sun Master Index

Pattern Frequency Analysis Report Samples

Pattern frequency analysis reports list the frequencies of various data patterns found in the values of the specified fields. Patterns are expressed as regular expressions. This topic includes sample reports based on the pattern frequencies defined below for social security number and date of birth patterns.


<PatternFrqueencyAnalysis>
  <topNpatterns ="5" showall="true"/>
  <fields>
    <field fieldName="Person.SSN"/>
  </fields>
</PatternFrequencyAnalysis>

<PatternFrequencyAnalysis>
  <topNpatterns ="5" increasing="true"/>
  <fields>
    <field fieldName="Person.DOB"/>
  </fields>
</PatternFrequencyAnalysis>

The above rules generate two reports, one for social security number patterns and one for date of birth patterns. The reports only lists the top 5 patterns. Below are sample outputs for each. You can easily determine invalid values based on the patterns listed.

PF_PROFILE_PATTERN_FRQ_1_1–10000.csv

PERSON.SSN 

FREQUENCY 

NNN/NN/NNNN 

11 

 

51 

NNN-NN-NNN 

64 

NNNNNNNNN 

92 

NNN-NN-NNNN 

7614 

PF_PROFILE_PATTERN_FRQ_2_1–10000.csv

PERSON.DOB 

FREQUENCY 

NNNN/NN/NN 

14 

 

22 

NNNNNNNN 

62 

NN/NN/NN 

84 

NN/NN/NNNN 

9766