1.3.7.6 Frequency Profiler

The Frequency Profiler examines each attribute and returns the values contained in each attribute, organized by their frequency of occurrence.

The Frequency Profiler is a vital profiling tool used to discover the common and uncommon values in the data. Use the results of frequency profiling to build reference lists of valid and invalid values for each data attribute, for use in validation.

The following table describes the configuration options:

Configuration Description

Inputs

Specify any attributes that you want to analyze for value frequency.

Options

None.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

None.

The Frequency Profiler requires a batch of records to produce its statistics (for example, in order to tell how often values occur in each attribute analyzed). It must therefore run to completion before its results are available, and is not suitable for a process that requires a real time response.

When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached.

The following table describes the statistics for each attribute the Frequency Profiler analyzes. Note that each attribute is shown in a separate tab in the Results Browser.

Statistic Description

Value

The value found.

Count

The number of times the value occurs in the attribute

%

The percentage of records analyzed with the value in the attribute.

Example

In this example, the Frequency Profiler is run on the Title attribute in a table of Customer records. The following summary view is displayed:

Value Count %

Mr

816

40.8

Ms

468

23.4

Mrs

309

15.4

Miss

251

12.5

[Null]

139

6.9

Dr

15

0.7

Prof.

1

<0.1

Col.

1

<0.1

Rev

1

<0.1

Sorting the view by the Count column allows you quickly to see the most common and least common values for each attribute analyzed, allowing you to construct Reference Data lists of valid and invalid values.