Association Rules Algorithm

You use the association rules algorithm to discover rules in a series of events.

About the Algorithm

The typical application for this algorithm is market basket analysis: people who buy particular items also buy which other items. For example, the result of a market basket analysis might be that men who buy beer also buy diapers.

You define support and confidence parameters for the algorithm. The algorithm selects sufficiently frequent subsets from a predefined set of items. On input, it reads a sequence of item sets and looks for an item set (or subset) whose frequency in the whole sequence is greater than the support level. Such an item set is broken into antecedent-consequent pairs, called rules. Rule confidence is the ratio of the item set frequency to the antecedent frequency of all item sets. Rules with confidence greater than the given confidence level are added to the list of confident rules.

Although the algorithm uses logical shortcuts during computations, thus avoiding the need to consider all combinations of item sets, whose number can be practically infinite, the speed with which the algorithm executes depends on the number of attributes to consider and the frequency with which the attributes occur.

Parameter Values

The following table shows the parameters for the association rules algorithm.

Parameter Name

Description

Support defines the minimal support level that a rule must meet to be returned by the algorithm.

Support is the frequency with which a particular association occurs in the database. For example, if 23 transactions out of one thousand consist of diapers and beer, the support for this association is 2.3%. Rules that do not meet the minimal support level are not returned by the algorithm.

Type: double

Range: 0 - 1, where 1 is equivalent to 100%

Example: Setting .023 means the support is 2.3%.

Default: 1. Support is very specific to the data being mined. Be certain to change the default value.

Confidence defines the minimal confidence level that an association must meet to be returned by the algorithm.

Confidence is the frequency with which an association occurs in the database relative to the frequency with which the antecedent in the association occurs. For example, if A is associated with B, confidence is the frequency of A and B both occurring divided by the frequency of A occurring. Associations that do not meet the minimal confidence level are not returned by the algorithm.

Type: double

Range: 0 - 1, where 1 is equivalent to 100%

Example: Setting .4 means the confidence is 40%.

Default: 1. Confidence is very specific to the type of data being mined, so be certain to change the default value.

MaxItem defines the maximum number of items in item sets to be considered by the algorithm.

This parameter limits the size of the item sets to be considered. An item set that contains more than the specified maximum is not considered by the algorithm. Setting too high a maximum can slow the operation of the algorithm. Setting too low a maximum can cause the algorithm to miss important associations.

Type: integer

Example: Setting 9 means that item sets of 10 or more items are not considered by the algorithm. Note that number of items refers to types of items. For example, an item set with apples, oranges, pears, cherries and mangoes contains 5 items. The actual number of apples, oranges, pears, and mangoes does not matter for this parameter.

Default: 0

Accessor Values

The following table shows the required and optional accessors for the association rules algorithm.

Expression

Sample Expression

Predictor.Basket specifies member or member set to use for the predictor domain.

{[Hot_drinks].Children, [Wine].Children, [Beer].Children}

Predictor.Sequence defines the sequence to be traversed for the predictor, generally a time dimension range.

{[Jan 1].Level.Members}

Predictor.External (optional) defines the scope of the predictor

Predictor (optional) specifies additional restrictions from other dimensions.

{([2001], [Actual], [Sales], [Phoenix])}

Model Data

This information will be made available at a later release.

Result Data

This information will be made available at a later release.

Related Information