Improving Performance of Sequence Patterns

The data miners should apply the following tips to improve the performance of sequence patterns:

  • Choose an unusual occurrence to serve as the initial event in a pattern, whenever possible. The sequence matcher creates a match state each time an event satisfies the initial condition for the pattern, and holds this condition in memory as it checks for the remaining conditions. Selecting an unusual event to start the sequence reduces the number of match states that are created during a run.
  • Limit the rows retrieved in the datasets that a pattern uses. Specifically, if certain records cannot satisfy the pattern criteria, it is best to apply logic to filter them out up-front. If certain detection conditions can be checked in the dataset, it is generally more efficient to allow the database to filter them out rather than applying this logic in the pattern. It is also helpful to determine the appropriate lookback period range that is necessary to identify the behaviors of interest, so that the patterns do not look through a larger volume of historical data than is necessary.
  • Use dataset sorting to the maximum extent possible. To optimize performance, sort records that are used together to form an alert into adjacent positions in the dataset. For example, if a pattern is looking for a behavior within one account, sort the datasets it uses so that the records are clus- tered together by account. Doing this minimizes the amount of data through which the pattern must search to find a match, and reduces the number of partial matches that are held in memory at any one time.
  • Apply criteria that terminate a search as soon as it becomes evident that a particular case cannot result in a match. Ending a search as early as possible prevents the pattern from performing unnecessary work or creating extra match states. For example, the activity of interest in a partic- ular scenario must occur in the same account and the same security within a specified time frame, the pattern should have explicit conditions that cause it to terminate when any of these conditions can no longer be satisfied.