Patterns

As you create a pattern filter for a transaction model, you can select among the following patterns.

  • Mean: This pattern calculates the mean for the values of an attribute. It also calculates means for subsets of those attribute values, and identifies those that are too far above or below the overall mean. For example, the pattern may calculate an average for the Amount attribute of the Expense Report Details business object. It may then calculate the average amount for each person who's submitted an expense report, and identify amounts for individual people that are outliers from the overall average. Parameters include:

    • Greater Than and Less Than: Set percentages above and below the overall mean at which values are considered outliers.

    • Variance: Select an attribute that determines how records are grouped into subsets. In the current example, this would be the Person Identifier attribute of the Expense Report Details business object.

  • Benford: Benford's Law states that even in widely varied sets of numeric data, the frequency distribution of leading digits is predictable. For example, approximately 30 percent of values begin with the digit 1 (if values are expressed in base 10).

    This pattern compares the distribution of leading digits in sets of numbers with the distribution predicted by Benford's Law, and identifies discrepancies. To define the data sets, specify one or more attributes that return number values. Discrepancies are values that are some percentage above or below the Benford values. Set Greater Than and Less Than parameters to define these percentages.

  • Clustering: This pattern distributes data records into clusters. It applies K Means analysis to attribute values: it distributes values into a number of clusters (that number being expressed by the variable k) so that each value belongs to the cluster with the nearest mean. For best results, select attributes that return large data sets.

    The pattern determines how many clusters to create based on the number of records it evaluates. However, you can influence this number by setting a Resolution parameter, whose values are Very High, High, Medium, Low, and Very Low. The Very High value results in the most clusters, and the Very Low value in the fewest clusters.

  • Anomaly Detection: This pattern calculates a normal distribution of values for a specified attribute, then compares it with the actual distribution of values. Pattern results appear in a graph. In it, you can identify anomalies: actual values that stand out sharply from the expected (normal-distribution) values. For best results, specify attributes that return large data sets.

  • Absolute Deviation: This pattern calculates absolute deviations for values of an attribute. Absolute deviation is the difference (expressed as a positive number) between each value in a set of values and the average for all values within that set.

    The pattern actually defines multiple sets, and returns deviations for each set. To define sets, you select an attribute for an Aggregation Pivot parameter and another attribute for a Categorization parameter. The pattern then calculates absolute deviation per Categorization value within each Aggregation Pivot value.

    For example, suppose you want to apply the pattern to expenses incurred by employees within each business unit of a company. Begin by selecting the Amount attribute of the Expense Report Details business object. Select Person Identifier for the Categorization parameter and Business Unit for the Aggregation Pivot parameter.

    The result is a scatter plot. Its x axis represents Aggregation Pivot values (business units in the example), and its y axis represents absolute deviation values. Each point on the graph is a count of the records per aggregation pivot/absolute deviation value.

    Other parameters for this pattern include Scale and Sensitivity. Typically select Linear for the Scale parameter. When values are widely spread, however, you may choose one of the Logarithm options for better graphing. The Sensitivity parameter enables you to choose whether to plot all results or a subset ranging from normal to highly anomalous.

  • Pareto: The Pareto Principle asserts that, for many events, roughly 80 percent of the effects come from 20 percent of the causes. This pattern uses the Pareto Principle to divide a set of records into ever-smaller groups.

    It sorts an initial set of records so that values of an attribute you select (or derivatives of those values) are in descending order. It selects the top 20 percent of those records. It performs repeated iterations, with each selecting 20 percent of the records remaining from the previous iteration. The second iteration, for example, creates a group that consists of 4 percent of the original set (20 percent of the first 20 percent); the group created in the first iteration therefore retains 16 percent of the original data set.

    The pattern determines how many iterations to perform based on the number of records it evaluates. However, you can influence this number by setting a Resolution parameter, whose values are Very High, High, Medium, Low, and Very Low. The Very High value results in the largest number of iterations, and the Very Low value in the smallest number of iterations.

    You may also set a Derivative parameter, which determines whether the pattern works with attribute values or with derivatives of those values. Derivative options include:

    • None: The pattern sorts attribute values from high to low, then begins the process of selecting records for groups.

    • First Derivative: The pattern sorts attribute values from high to low, subtracts each value from the value immediately above it, sorts the resulting values, and then begins the process of selecting records for groups.

    • Second Derivative: The pattern uses first-derivative values to perform a second derivative calculation before selecting records for groups.

  • Normalize: This pattern establishes a common scale for values measured initially on differing scales. It sorts input attribute values in ascending order, then assigns a normalized score to each value: the ratio of individual rank to maximum rank. The pattern then multiplies each normalized score by a user-specified multiplier. To use the pattern, select one or more attributes that supply long, int, float, or double data types, and specify a multiplier value.

  • Lexical Tokenization: This pattern separates the values of a specified attribute into parts. It adds columns to the values returned by the filter that cites the pattern. Each reports one of the parts that attribute values are separated into. Typically, a model that uses this pattern in one filter would contain at least one more filter that cites values in one of the columns that the Lexical Tokenization pattern creates.

    For example, the Address: Postal Code attribute of the Supplier Site Location business object may contain nine-digit postal codes, with the first five digits separated from the last four by a hyphen. You may want to work only with the first five digits. The Lexical Tokenization pattern can specify the hyphen as a delimiter; results would include one column reporting only the first five digits, and another column reporting only the last four digits, of each postal code.

    Parameters include the following:

    • Delimiter determines the point where attribute values are separated. This may be a character (such the hyphen in the postal code example) or a regular expression (its use requires some knowledge of software coding languages and conventions).

    • Maximum Limit sets the number of columns that attribute values should be separated into.

    • Prefix sets a text value that appears in the heading for each return column the pattern creates. (For each column, this prefix is followed by a sequential number that distinguishes it from other return columns.)

    • Type specifies whether return values should be formatted as text, number, or date.