Improving Performance of Rule Patterns

Data miners should apply the following tips to improve the performance of rule patterns:

  • Limit the rows retrieved in the datasets that a pattern uses. Specifically, if certain records cannot satisfy the pattern criteria, it is best to apply logic to filter them out up-front. If certain detection conditions can be checked in the dataset, it is generally more efficient to allow the database to filter them out rather than applying this logic in the pattern. It is also helpful to determine the appropriate look-period that is necessary to identify the behaviors of interest, so that the pat- terns do not look through a larger volume of historical data than is really necessary.
  • Apply the most restrictive pattern conditions in the primary rule. Rule patterns access data incrementally, retrieving records only for the entities that satisfy the conditions of the primary rule. Applying the most limiting conditions up-front helps to reduce the total number of records that are retrieved and evaluated in the detection process.
  • Use periodic checkpoints to terminate the search at the point when the conditions for a match can no longer be satisfied. Rule patterns include explicit checkpoints that must be satisfied in order for the detection process to move forward. If the conditions in a checkpoint are not met for a particular entity, the detection process does not continue to retrieve data for that entity. Developers should apply checkpoints in patterns wherever they can substantially reduce the population of records being considered for a match.
  • Use aggregate functions and conditional expressions to avoid performing redundant evaluation of the same data records. The rule matcher has several built-in functions to perform aggregate analysis of the activity that a particular entity performs. These functions can be used in conjunction with conditional expressions to analyze a set of records in two different ways without retrieving the records twice. For example, if a scenario requires computing the total monetary amount for all transactions, and the total amount for the subset of transactions that are considered high-risk, accomplish this through the use of two aggregate functions such as:

    TOT_TRANS_AM = Sum(trans_am)

    HR_TRANS_AM = Sum(cond(risk_lvl_nb  @high_risk_nb; trans_am; 0)

    Note:

    If you want to create a bound variable that compares these results (for example, to compute a high-risk transaction percentage), you can do so in the same rule as long as the new binding appears further down on the list than the bindings that it is comparing.
  • Use datasets that have a leading index on the focal entity attribute. All sub-rule datasets retrieve records for the subset of entities that have satisfied the conditions of the primary rule, as well as the conditions of any checkpoints that appear before the sub-rule. For example, if the scenario is account-focused, the sub-rule datasets retrieve data only for those ACCT_INTRL_IDs that are candidates for a match based on the constraints of the primary rule and any preceding check-points. The retrieval of data from the sub-rule datasets is much more efficient if it can access the records for this entity using an index (in the case of this example, an index on ACCT_INTRL_ID is required).