About Scenario Constructions to Avoid
We recommend avoiding several techniques in scenario design, because they may either lead to errors or performance problems. Scenario developers should refrain from using the following constructions:
- Jobs with looping entities: Setting up a job with a looping entity causes it to execute in multiple runs—one run for each unique value of the looping entity. The purpose of this is to create small job runs that include only data for single entity, so that evaluation of the records for that entity can occur quickly. However, using looping often results in creating a large number of runs, which can result in an unacceptable completion time for the job. Instead of using looping, scenario developers should use dataset sorting (multi-dataset row ordering). This technique sorts the records for a particular entity into adjacent positions in the recordset, which is similar to the way they are arranged in a looping job. However, the dataset sorting approach is more efficient because it selects all of the records for the job at once, avoiding the overhead of running multiple queries to retrieve the records.
- Failing to account for all possible terminating conditions in a sequence pattern: All sequence patterns use sorted datasets, and many patterns do not generate a match until they have evaluated all of the records that may be of interest based on the starting condition. For example, if a pattern is looking for a particular behavior within one account, the datasets that it uses are typically sorted by account, and the pattern may not check the final alert criteria until discovery of a record for a different account. This approach ensures evaluation of all of the records for one account before generation of a match.
- If multiple datasets are involved, note the pattern of all possible datasets in which the next account record may be found. Failure to do so causes the pattern to terminate when it does not find the next account record in the expected dataset, which results in missed alerts.
- Looping-within-looping: The most common form of this error is
to have: SEQ loop 1-
or
ROW A loop 1-
ROW B loop 1-
ROW C loop 1-
The desired effect of this pattern is to find one or more As, Bs, or Cs. The pattern structure above is inefficient because if it sees an A followed by another A, the sequence matcher does not know whether to match the same A ROW again or to loop back through the SEQ to match the A. So the sequence matcher branches and does both. This can cause exponential growth in the number of states maintained in memory and slow down the algorithm significantly. A better structure is as follows:
SEQ loop 1-
or
ROW A loop
ROW B loop
ROW C loop
If an A is followed by an A, only one path can be followed.
- Misuse of bind or rebind for pattern variables: If a variable has already been bound
once in a pattern, if it is bound again later in the pattern, use rebind. If the
variable is bound again using bind causes, the sequence matcher does not produce an
error message and also does not change the variable’s value. This can have
unintended results if the pattern writer thinks the vari- able’s value is being
changed. Some recommended practices are as follows:
- Bind all variables that you intend to use in a pattern in the very first row using bind even if they have to be initialized to 0 or a dummy value. Then use rebind rebind everywhere else in the pattern.
- Always use rebind within a looping construct.
- Alternation conditions that overlap or fail to cover all possible conditions: Mutual
exclusion and covering all possible cases. When a pattern contains an OR statement,
it is usually (but not always) the case that the pattern writer intends that the
ROWs in the OR statement are both mutually exclusive and cover all possible cases.
This is especially true when the OR statement is within a looping construct. Pattern
writers often accidentally write OR statements that have gaps or overlap. For
example:
- OR statement with a gap is OR: ROW x > 5, ROW x < 5. This has a gap because if x equals 5, it does not match either row and the match dies.
- OR statement with overlap is OR: ROW x >= 5, ROW x <= 5. This has overlap because if x equals 5, it matches both rows and branches.
If a pattern is missing alerts that it should be catching, an OR statement may have a gap. You may need to add junk rows that match everything other than the entities in the other rows. The constraints in junk rows can be complex and are often where the gap is introduced. If a pattern that you are writing has a gap or overlap, be sure that is what you intend, and be sure to document it.