The following topics provide sample frequency configurations along with excerpts of the reports they produce.
Simple frequency analysis reports list the frequencies of various data values found in the specified fields without using any data verification, transformation, or conditional rules. You can specify a sort order for this type of report, and you can specify a frequency threshold. The sample frequency rule defined below analyzes first and last names, reporting the top 6 frequencies and only if the frequency is three or more.
<SimpleFrequencyAnalysis> <fields> <field fieldName="Person.FirstName"/> <field fieldName="Person.LastName"/> </fields> <sortOrder fieldName="Person.FirstName" increasing="false"/> <threshold value="3" more="true"/> <topNpatterns value="6" showall="false"/> </SimpleFrequencyAnalysis> |
This analysis generates a report similar to the following:
SF_PROFILE_SIMPLE_FRQ_1_1–0.csv
PERSON.LAWSTNAME |
PERSON.FIRSTNAME |
FREQUENCY |
SMITH |
ANN |
38 |
JONES |
SUSAN |
31 |
SMITH |
JOHN |
31 |
THOMPSON |
JAMES |
28 |
JOHNSON |
BETH |
26 |
MILLER |
FRANK |
25 |
The sample frequency rule defined below analyzes social security numbers and analyzes whether there are duplicates (two or more occurrences).
<SimpleFrequencyAnalysis> <fields> <field fieldName="Person.SSN"/> </fields> <sortOrder fieldName="Person.SSN" increasing="false"/> <threshold value="2" more="true"/> </SimpleFrequencyAnalysis> |
This analysis generates a report similar to the following excerpt:
SF_PROFILE_SIMPLE_FRQ_2_1–0.csv
PERSON.SSN |
FREQUENCY |
999999999 |
457 |
000000000 |
125 |
123456789 |
41 |
222423535 |
4 |
992203847 |
2 |
Constrained frequency analysis reports list the frequencies of various data values found in the specified fields based on defined rules. For example, you can define rules that will only include certain patterns or that will exclude certain values. This topic includes two constrained analysis definitions along with corresponding sample reports.
<ConstrainedFrequencyAnalysis> <fields> <field fieldName="Person.SSN"/> </fields> <ruleList> <rule> <dataLength fieldName="Person.SSN" len="10" more="false"/> </rule> </ruleList> </ConstrainedFrequencyAnalysis> |
The above analysis generates a report for social security numbers with less than 10 characters (which means the hyphens are likely missing). Below is a sample output.
CF_PROFILE_CONSTRAINED_FRQ_1_1–100000.csv
PERSON.SSN |
FREQUENCY |
300555444 |
1 |
299557777 |
1 |
822331111 |
2 |
999999999 |
98 |
000000000 |
115 |
The following analysis generates a report for dates of birth that are prior to 01/01/1899 (which means they likely contain typographical errors). Below is a sample output.
<ConstrainedFrequencyAnalysis> <fields> <field fieldName="Person.DOB"/> </fields> <ruleList> <rule> <dataRange fieldName="Person.DOB" min="01/01/0001" max="01/01/1899"/> </rule> </ruleList> </ConstrainedFrequencyAnalysis> |
CF_PROFILE_CONSTRAINED_FRQ_2_1–100000.csv
PERSON.DOB |
FREQUENCY |
07/08/53 |
1 |
10/04/51 |
1 |
09/16/1682 |
1 |
12/28/1680 |
2 |
05/09/1898 |
2 |
Pattern frequency analysis reports list the frequencies of various data patterns found in the values of the specified fields. Patterns are expressed as regular expressions. This topic includes sample reports based on the pattern frequencies defined below for social security number and date of birth patterns.
<PatternFrqueencyAnalysis> <topNpatterns ="5" showall="true"/> <fields> <field fieldName="Person.SSN"/> </fields> </PatternFrequencyAnalysis> |
<PatternFrequencyAnalysis> <topNpatterns ="5" increasing="true"/> <fields> <field fieldName="Person.DOB"/> </fields> </PatternFrequencyAnalysis> |
The above rules generate two reports, one for social security number patterns and one for date of birth patterns. The reports only lists the top 5 patterns. Below are sample outputs for each. You can easily determine invalid values based on the patterns listed.
PF_PROFILE_PATTERN_FRQ_1_1–10000.csv
PERSON.SSN |
FREQUENCY |
NNN/NN/NNNN |
11 |
|
51 |
NNN-NN-NNN |
64 |
NNNNNNNNN |
92 |
NNN-NN-NNNN |
7614 |
PF_PROFILE_PATTERN_FRQ_2_1–10000.csv
PERSON.DOB |
FREQUENCY |
NNNN/NN/NN |
14 |
|
22 |
NNNNNNNN |
62 |
NN/NN/NN |
84 |
NN/NN/NNNN |
9766 |