1.3.7.2 Contained Attributes Profiler

The Contained Attributes Profiler searches records across a number of attributes for pairs of attributes where one attribute value often contains the other attribute's value. A threshold option is used to drive whether or not to relate pairs of attributes together, depending on the percentage of records where one attribute value contains the other.

Use the Contained Attributes Profiler to find attributes which are, or should be, related. Where there is strong attribute linkage, this may indicate a potentially redundant attribute.

Alternatively, attributes may be supposed to be related, but that relationship may be broken; that is, one column value may be blank but could be derived from another column's value.

The following table describes the configuration options:

Configuration Description

Inputs

Specify any attributes that you want to examine for contained attribute linkage.

Options

None.

Contained attribute threshold %

Controls the percentage of values that must match using Contains matching in two attributes for those two attributes to be considered as related, and to appear in the results. Specified as a percentage. Default value is 80%. Note that the value must be between 50% and 100% inclusive.

Ignore case?

Controls whether or not case will be ignored when checking if one attribute value contains another. Specified as Yes or No. Default is Yes.

Outputs

Describes any data attribute or flag attribute outputs.

Data Attributes

None.

Flags

None.

The Contained Attributes Profiler requires a batch of records to produce its statistics; that is, in order to find meaningful relationships between pairs of attributes, it must run to completion. Therefore, its results are not available until the full data set has been processed, and this processor is not suitable for a process that requires a real time response.

When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached.

The Contained Attributes Profiler provides a summary view of any pairs of attributes that have a high enough percentage of related values, where one attribute value often contains the other. The following table describes a top-level view showing the following statistics for each pair of related attributes:

Statistic Description

Contained

The number of records where the values for both the related attributes were the same.

Not contained

The number of records where the values for the related attributes were not the same.

Click on the Additional Data button to display the above statistics as percentages of the records analyzed.

Drill-down on the number of records where the pair of attributes matched exactly to see a breakdown of the frequency of occurrence of each matching value. Drill-down again to see the records.

Alternatively, drill-down on the number of records where the pair of attributes were not equal to see the records directly. If there should be a relationship between attributes, these will be the records where the relationship is broken.

Example

In this example, a number of attributes are checked for a Contains relationship. A relationship is found between the FirstName and EmailAddress attributes, where the FirstName is often contained in the EmailAddress. The summary data:

Field 1 Field 2 Contained (desc) Not Contained

EmailAddress

FirstName

1829

172

Drilling down on the 1829 records where the EmailAddress contains the FirstName attribute reveals the following view of all the distinct pairs of records where the relationship was found:

EmailAddress FirstName Count

LINDA.COOKSON@M-AND-I.COM

LINDA

2

PAUL.MARKAR@DISCOUNT-FEVER.COM

PAUL

2

SHEILA.ROBINSON@SUNRISE-HOLIDAYS.COM

SHEILA

2

NORMAN.SCANLON@ECA.COM

NORMAN

2

TONY.GIBSON@TOMBURN.COM

TONY

2

PAULINE.BEEDHAM@BLUEYONDER.CO.UK

PAULINE

2

ROWLAND.BROWN@BTINTERNET.COM

ROWLAND

2

JOHN@DARWINS.COM

JOHN

2

TEST@TEST.COM

TEST

2

EILEEN_BEARD@WILSONS_PENARTH.COM

EILEEN

1

BRIGETTE.WALLACE@UNIQUE-INTERIORS.COM

BRIGETTE

1

MICHAEL.CONNOLLY@GEMINI-VISUALS.COM

MICHAEL

1

JOYCE.AITKEN@RDM-ELECTRONICS.COM

JOYCE

1

JOANNA.TEMLETT@BTOPENWORLD.COM

JOANNA

1

MAHAJAN.DEBELLOTT@NTLWORLD.COM

MAHAJAN

1