1.3.7.5 Equal Attributes Profiler

The Equal Attributes Profiler searches records across a number of attributes for pairs of attributes where values are frequently equal - for example where FirstName and GivenName attributes are both stored, and normally the same. A threshold option is used to drive whether or not to relate pairs of attributes together, depending on the percentage of values in each attribute that have the same value.

Use the Equal Attributes Profiler to find possibly redundant attributes, or pairs of attributes where values are normally equal, but in some cases are not. The Equal Attributes Profiler can help find bad data where two values in related attributes do not relate to each other when they should.

The following table describes the configuration options:

Configuration Description

Inputs

Specify any attributes that you want to examine for equal attribute linkage.

Options

Describes options you can specify.

Equal attribute threshold

Controls the percentage of values that must be equal in two attributes for those two attributes to be considered as related, and to appear in the results. Specified as a percentage. Default is 80%. Note that the value must be between 50% and 100% inclusive.

Treat nulls as equal?

Controls whether or not pairs of Null values are considered to be equal, and therefore whether or not they will be considered when appraising the Equal attribute threshold (above). Specified as Yes or No. Default is Yes.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

None.

The Equal Attributes Profiler requires a batch of records to produce its statistics; that is, in order to find meaningful relationships between pairs of attributes, it must run to completion. Therefore, its results are not available until the full data set has been processed, and this processor is not suitable for a process that requires a real time response.

When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached.

The Equal Attributes Profiler provides a summary view of any pairs of attributes that have a high enough percentage of equal values. The following table describes the statistics for each pair of related (equal) attributes:

Statistic Description

Equal

.The number of records where the values for both the related attributes were the same.

Null pairs

The number of records where the values for both the related attributes were null.

Note: If the option to treat nulls as equal is selected, this will be zero, as the null pairs will be included in the Equal statistic.

Not equal

The number of records where the values for the related attributes were not the same.

Click on the Additional Data button to display the above statistics as percentages of the records analyzed.

Drill-down on the number of records where the pair of attributes matched exactly to see a breakdown of the frequency of occurrence of each matching value. Drill-down again to see the records.

Alternatively, drill-down on the number of records where the pair of attributes were not equal to see the records directly. If there should be a relationship between attributes, these will be the records where the relationship is broken.

Example

In this example, a Customer table is analyzed to see if any of its attributes are commonly equal to each other, using the default configuration. The Equal Attributes Profiler finds that the DT_PURCHASED and DT_ACC_OPEN attributes are normally equal:

Field 1 Field 2 Equal Null Pairs Not Equal

DT_PURCHASED

DT_ACC_OPEN

1983

16

11

By drilling down on the number of records where the two fields were equal, you can see a view of all the pairs of equal values:

DT_ACC_OPEN DT_PURCHASED Count

03/02/1997

03/02/1997

5

30/11/1993

30/11/1993

4

09/08/1996

09/08/1996

4

10/09/1993

10/09/1993

4

07/12/1992

07/12/1992

4

07/08/1996

07/08/1996

4

25/05/1993

25/05/1993

4

24/02/1994

24/02/1994

4

21/11/1996

21/11/1996

4

17/12/1996

17/12/1996

4

13/11/1992

13/11/1992

4

27/08/1992

27/08/1992

4

05/10/1992

05/10/1992

4

27/09/1992

27/09/1992

3