H Appendix – RPASCE Rule Writing Tips

This appendix includes tips and information to help you write RPASCE rules that are as efficient as possible. In many cases, a good understanding of the internal workings of the calculation engine and RPASCE I/O fundamentals is required to create good RPASCE rules. This appendix also provides functional solutions to some general functional problems that may be encountered including:

Basic RPASCE Rules Information

The following sections provide basic RPASCE rules information. RPASCE standard practice does not encourage using RPASCE security measures in custom rules. Instead, the security settings for measures at position, template, and user level are set through the Security Administration Workbook.

Full and Incremental Evaluation Modes

As a rule writer, you must know that there are two evaluation modes used in the RPASCE calculation engine: full and incremental.

Full evaluation mode is used in instances when the whole workbook is being calculated. These instances include the following:

  • Committing, loading, or refreshing workbooks

  • Using custom menus

  • Running mace in batch mode

  • During rule group transitions, such as between the load and calculate operations that are part of a workbook build

Incremental mode is used when calculating workbooks with the Calculate function. Incremental mode evaluates only the cells that the user has changed as well as any cells that are affected by the user's changes. For instance, if a user changes one cell in a sales measure, the RPASCE calculation engine evaluates that cell and all cells associated with calculations that use that sales measure, such a variance measures, inventory measures, and so on. However, the RPASCE calculation engine does not evaluate measures that are not affected by that sales change, such a receipt variance of last year's sales. In addition, the RPAS CE calculation engine does not calculate any unchanged cells in that sales measure, meaning that if the user changed the sales in the Week1 cell, the calculation engine evaluates only the Week1 cell and not any other week's cell. In short, unaffected cells are not calculated.

Note:

As a rule writer, you can use techniques to optimize rules, but these techniques rarely apply when rules are run in incremental mode.

Rule Group Transitions

Rule group transitions occur when the active rule group is changed. There are automatic rule group transitions when transitioning from the load rule group to the calc rule group during a workbook build as well as when transitioning between rule groups for refreshing and committing. Rule group transitions may also occur through user-defined menu options.

Rule group transitions ensure that the integrity of the rule group being transitioned to is enforced. This is done by evaluating an expression from every rule (and knock-on effects) that is not already known to be correct. The only rules known to be correct are those that were active in the rule group being transitioned from. Rule group transitions are inherently very expensive because they must operate in full evaluation mode. Therefore, logically at least, every intersection is visited, although the iteration efficiencies outlined previously are employed. Therefore, rule group transitions must be avoided wherever possible.

NA Values and Iterators

NA values (also known as navals) are designed to store values that occur often so that the measures can be iterated efficiently. Cell values in the internal RPASCE array are stored only when the cell value differs from the naval. Maximum efficiency is achieved if the naval is the value that is most often (logically) present in the data. This is typically zero for numeric data. The sparsity levels are likely to vary considerably from application to application and customer to customer. When data is sparse, efficient iteration patterns that visit only the populated cells are needed to support fast response times. RPASCE will change the naval of the array as necessary to allow it to use this technique. Thus, the naval of the array may differ from the naval of the measure (which do not dynamically change). Since the array naval may change dynamically and expressions use the array naval instead of the measure naval, rule writers must not write expressions that assume a specific naval.

In full evaluation mode, the measure is logically recalculated at every cell location, including those where the value is the naval. RPASCE employs a collection of iteration optimizations that reduces the number of physical cell evaluations. The optimization decisions are dependent on the following: expression syntax, navals, logical cell counts, and populated (non naval) cell counts. Therefore, it is vital that data only be added or changed through RPASCE processes that maintain this information. Otherwise, the wrong optimization decisions may be made that result in poor performance results.

In full evaluation mode, there are two fundamentally different iteration approaches. The basic iterator is to pass the cells sequentially (in the order of the positions in the hierarchies). Although, this can be very expensive if the data, and thus the cells that need to be evaluated, is sparse. There is also an iterator that iterates over just the cells that have a value different from the naval. These cells are referred to as populated. Processing is likely to be much faster when this iterator is used, especially where the data is sparse. For example, an expression such as a = b + c is evaluated by iterating over just those cells where b or c are populated.

With an expression such as a = if(condition, b,0) where the naval of a and b is zero, the fastest way to evaluate the expression is to remove all the data for a and then perform one of the following two options:

  • Iterate over the intersections where the condition is true, calculating a=b

  • Iterate over the positions where b is populated, setting a=b if the condition is true

The intersections that are not visited already have the correct value for a because they were effectively set to zero (the naval) when the data for a was removed. If the naval of a is not zero, RPASCE sets it to zero so that the efficient iteration pattern can be deployed. RPASCE automatically selects the methods to use based on the population density of the expressions along with other factors. RPASCE used a runtime heuristic to guess which of these two evaluation modes is most efficient and then uses that method.

Principles for Writing Efficient Rules

The following sections describe the principles for writing efficient rules in RPASCE.

Expensive Functions, Modifiers and Procedures

As a rule writer, you should understand the relative cost of rule functions, modifiers, and procedures. If you know these costs, you can understand that you should use expensive functions only where necessary and that you must avoid any unnecessary evaluation of those functions.

Functions

You can assume that most basic functions are inexpensive. Functions that can be more expensive are those that require large amounts of data to be processed.

  • All of the time series functions (such as tssum) can potentially require large amounts of data (depending on how long the time series being used is) and are therefore potentially expensive.

  • The same is true of the normalization functions (such as norm, resize, resizenorm).

  • Cover and uncover functions can also use large time series, depending upon the data contents.

Modifiers

The use of aggregation type modifiers is not in itself expensive; although it does mean that extra aggregation effort is required to support them.

Procedures

In general, you should assume that all procedures are expensive.

Caching Intermediate Results

If an expression must be evaluated, then the whole expression is evaluated, and the results of intermediate phrases in the calculation of the expression are not cached. Many expressions are relatively short and simple, and therefore no issues arise. There are, however, two particular cases where you must be careful when writing expressions so that you avoid expensive, redundant calculation of phrases.

Case 1

The first case is where a phrase changes infrequently and is relatively expensive to evaluate. Consider an expression in the following format:

a = b + c + functionof(d, e, f)

Here, if functionof(d, e, f) is relatively expensive to evaluate, you should keep its evaluation to a minimum. If d, e, and f change infrequently, then functionof(d, e, f) will also change infrequently. However, if b and/or c change frequently, then every time b and/or c changes, the whole expression is evaluated, including the functionof(d, e, f) phrase. This portion of the calculation is usually redundant since d, e and f change infrequently. The result from evaluating the phrase is likely to be the same as from the previous evaluation. You can avoid this inefficiency by forcing the intermediate result of the phrase to be cached by writing:

x = functionof(d, e, f)

a = b + c + x

Note that the rule syntax forces the instantiation of the result of a procedure, since procedures must be the only phrase in an expression. Since procedures often are particularly expensive to evaluate, the technique of caching intermediate results is automatically applied.

It is important to note that this method does not always increase performance. In some cases, it may actually decrease performance. For instance, if the normal usage of the rule group is in full evaluation mode, the caching intermediate results approach may be less efficient. This is because the phrase may only be evaluated once, and there could be an unnecessary write (and subsequent read) of the measure. See the Expression Iteration Examples section to learn how the expression is iterated. This can help you determine if using this method is beneficial.

Case 2

The second case when you should avoid expensive calculations is when the same phrase is used repeatedly in the same or multiple expressions. Since the phrase is not automatically cached, it can be evaluated multiple times. There are cases with nested If statements where, down some paths, the same condition can be evaluated three or four times. Using the cache intermediate results technique in these situations could lead to a significant performance increase.

Automatic Caching of Expression Phrases

The conditions that RPASCE automatically instantiates behind the scenes are single time-based conditions where the current keyword (which translates to the index number of the current position in the time dimension) is compared (using one of the comparison operators ==, !=, >, <, >=, <=) directly with one of the following keywords: first, last, elapsed, or today. For example, the expression

if(current > elapsed, x, y)

will have the condition automatically instantiated. However, the expression

if(current > elapsed + leadtime, x, y)

where lead time is a measure, will not have the condition automatically instantiated, and the condition current > elapsed + leadtime may need to be instantiated into a measure for improved performance.

Similarly, an expression such as

if(current > elapsed && current < last, x, y)

is not automatically evaluated by instantiating the condition. Either of the sub-conditions would be evaluated by instantiation if they stood alone, but when they are combined, RPASCE cannot automatically evaluate them. The efficient way to write such conditions is to instantiate the result of each sub-condition as described in Caching Intermediate Results.

This specific expression could be written as

z = current > elapsed && current < last

if(z, x, y)

Tips

The following sections describe tips for writing efficient rules in RPASCE.

Rule Groups

The RPASCE engine recognizes rules that are the same in the rule group being transitioned from and to by rule name. Therefore, it is important to ensure that the same rule (and not a copy of the rule with a different name) is used in both rule groups.

Non-Materialized Measures

Non-materialized measures must be used whenever possible. These measures are not necessarily calculated as part of the calculation cycle, but rather they are calculated only if needed to evaluate another expression. Therefore, these non-materialized measures have the potential to significantly decrease the regular calculation time. Note that several different types of non-materialized measures are available, and their usage and performance profile can vary.

Display-Only Non-Materialized Measures

Display-only non-materialized measures are intended for display-only use. They cannot be manipulated and cannot be used in calculations. They must have an aggregation type of recalc, and they must be at the very end of a calculation. Since they cannot be used in calculations (other than the one used to calculate them), they do not need to be calculated by the calculation engine during a normal calculation. They are therefore ignored during the calculate operation. They are calculated only when required and only for the positions required at the time. Typically, the calculation is initiated during the fetch cycle when the measure is to be displayed.

Normally, all measures that are candidates to be display-only non-materialized measures should be defined that way. The exceptions to this rule are when the measure is likely to be viewed much more frequently than it would be calculated, such that calculating it normally through the calc cycle would provide an overall saving.

The If Statement

The following sections describe the If statement.

See Expression Iteration Examples for examples that show how the If statement is iterated.

Caching the If Condition Phrase

The simple way for RPASCE to iterate over the cells where a condition is true, is for that condition to be instantiated into a Boolean measure with a naval of false. This allows RPASCE to iterate over just the populated cells. Otherwise, the engine must pass all intersections to determine whether the condition is true, which may be a very expensive operation. There are some specific time-based conditions where RPASCE instantiates the condition behind the scenes (see the “Automatic Caching of Expression Phrases section for these examples). Otherwise, you must instantiate the result of evaluating the condition to have efficient calculations. This can be done by setting a measure to the result of the condition. This is especially important if the same condition is used repeatedly in expressions or if the input measures used to calculate the condition change infrequently.

This is not necessary if the condition phrase meets the requirements to be automatically cached (see Automatic Caching of Expression Phrases for more information).

The Ignore Keyword

The If function supports an Ignore argument. This argument means that the calculation must not change the current value of the cell. Under some circumstances, expressions that use this construct in full evaluation mode can be particularly expensive to evaluate, so it should be used only when necessary. For instance, with an expression such as:

a = if(b, ignore, 1)

where the naval of a is 0, and the naval of b is false, the effective naval of the expression is 1. RPASCE cannot simply change the naval of a from 0 to 1 because of the ignore. Therefore, RPASCE must iterate over all logical cells of b. This problem happens when using ignore whenever the naval of the expression is different from the naval of the measure being calculated.

Note:

Using multiple ignore keywords in the if expressions should be avoided while configuring expressions as such expressions can cause performance issues.

Expression Iteration Examples

This section demonstrates how RPASCE iterates over various expressions. Use Table H-1 as a reference for evaluating the relative performance of an expression due to iteration. None of these expressions are necessarily bad. The relative populated/logical size and naval of the measures must be taken into consideration to determine if the performance can be improved.

The naval of all measures are 0 or false unless otherwise noted. Measures are 10% populated unless otherwise noted. The calendar dimension has 100 positions.

Table H-1 Evaluating the Relative Performance of an Expression Due to Iteration

Sample Expression Iteration Based On Notes

N1 = if(B1, 0, 1)

B1

This is the simplest case, iteration-wise.

N1 = if(B1, 0, N2)

N2

Navalue of N2 equals navalue of 0. N2 and 0 has a fill factor of 10%, while N2 combined with B1 has a combined fill factor of 19%. Therefore, iterate just N2.

N1 = if(B1, 0, N2)

Same as previous but navalue of N2 is 1.

B1, N2

Navalues do not match.

N1 = if(B1, N2, N3)

(N2 fill factor = 30%)

B1, N3

N2 x N3 fill factor: 1 – (1 - .3) * (1 – 0.1) or 37%. B1 x N3 fill factor: 1 – (1 – 0.1) * (1 – 0.1) or 19%. Thus, iterate B1 and N3.

N1 = if(B1, N2, 0)

B1 (N2 could be chosen if it had a lower fill factor).

Navalue of if is 0.

N1 = prefer(10 / N2, N3)

N2, N3

Navalue of first prefer subexpression causes error, so no optimization applies here.

N1 = prefer(10 / (N2 + 1), N3)

N2

Navalue of first prefer subexpression does not cause error, so optimization applies here.

N1 = if(B1, ignore, 0)

B1, N1

Expression navalue is 0.

N1 = if(B1, 0, ignore)

B1

Expression navalue is ignore.

N1 = if(B1, ignore, N2)

B1, N2

Because B1's navalue is false, the RHS navalue comes from N2. No optimization applies here. However, if the navalue of the RHS (in this case, N2) does not match the pre-existing navalue of N1, then “all cells are dirty" mode is invoked to avoid changing the navalue of N1 (and thus changing the value of unpopulated cells that should have been ignored).

N1 = if(B1, N2, ignore)

B1

Expression navalue is ignore.

N1 = if(B1, N2, ignore) + if(B2, ignore, N2)

B1, B2, N2

Expression navalue is ignore.

Note: RPASCE does not support the syntax if() + if(). Although, it still illustrates a useful point from an iteration point of view.

N1 = current

All logical cells

current is not used in a comparison so no optimization applies here.

B1 = (current == first)

first calendar position

current is used in a comparison so simple comparisons to current apply here.

N1 = if (current > elapsed, N2, N3)

elapsed = 25; N3 is 50% populated

First 25 calendar positions and N2.

Fill factor of N2 x N3 is 55%. Fill factor of current > elapsed x N2 is 33%. Therefore, iterate current > elapsed and N2.

Tips to Design Efficient RPASCE Expressions

Sometimes fairly simple expressions take a very long time to run. Most of the time, the problem exists in the way the expressions are designed and configured. The following is an attempt to explain how to design efficient expressions.

  1. Consider the following nested if expression:

    L1 = if (R1 > 0, V, if (R2 > 0, V, if (R3 > 0, V, if( R4 > 0, V, if( R5 > 0, V, 0)))))

    Table H-2 Nested if Expression

    Measure Type BaseInt

    L1

    Real

    SKU/STR/DAY

    R1, R2, R3, R4, R5

    Real

    DEPT/DAY

    V

    Real

    SKU/STR

    The application is partitioned on DEPT where SKU rolls up to DEPT.

    This expression is not designed correctly and may have a performance hit because the left-hand side (LHS) measure L1 is at SKU/STR/DAY, the R1, R2, R3, R4, R5 measures are at DEPT/DAY, and the V measure is at SKU/STR. Here, you need to first spread the R1, R2, and so on, measures from DEPT/DAY to SKU/STR/DAY and the V measure from SKU/STR to SKU/STR/DAY. Spreading is a time-consuming process, but any effort to reduce it within an expression should give a performance benefit.

    If you redesign the expression as follows, performance can be improved:

    1. Register a temporary measure, such as tmpmask, which is at SKU/DAY.

    2. Add an expression to generate the tmpmask measure as follows:

      tmpmask = R1 > 0 || R2 > 0 || R3 > 0 || R4 > 0 || R5 > 0

    3. Modify the previous expression as follows:

      L1 = if (tmpmask, V, 0)

    4. Those two expressions will only take a fraction of the time taken to run the original expression.

  2. Consider an expression which uses the RPASCE prefer function:

    L2 = prefer(A/B, 0)

    Table H-3 Prefer Expression

    Measure Type BaseInt

    L2

    Real

    SKU/STR/DAY

    A

    Real

    SKU/STR/DAY

    B

    Real

    SKU

    Here also, there is a lot of spreading to be done for the B measure from SKU to SKU/STR/DAY.

    There are two approaches to optimize this expression:

    1. Make use of the RPASCE CalcEngine's division by zero support where it will now return a 0 when it encounters a division by 0. It effectively is mimicking the prefer behavior and it evaluates faster than prefer.

      The function can be revised as:

      L2 = A / B

    2. When the preferred value is not equal to 0, then use the following approach as the RPASCE CalcEngine only returns zero in division by 0 situations.

      If baseint of B is much higher than A, use a temporary intermediate measure.

      Since B is at SKU and A is at SKU/STR/DAY, use an intermediate measure C at either SKU/STR or SKU/STR/DAY and spread B to C using the expression:

      C = B

      Then, modify the prefer expression as follows:

      L2 = prefer(A/C, 5)

      The spreading is much less when C is used inside prefer compared to B and it should evaluate faster than:

      L2 = prefer(A/B, 5)