This section contains miscellaneous hints on preparing efficient, easy-to-change, and clear specifications.
It is difficult to provide rules with substantial actions and still have a readable specification file. The following are a few style hints.
Use all uppercase letters for token names and all lowercase letters for nonterminal names. This is useful in debugging.
Put grammar rules and actions on separate lines to make editing easier.
Put all rules with the same left-hand side together. Put the left-hand side in only once and let all following rules begin with a vertical bar.
Put a semicolon only after the last rule with a given left-hand side and put the semicolon on a separate line. This allows new rules to be easily added.
Indent rule bodies by one tab stop and action bodies by two tab stops.
Put complicated actions into subroutines defined in separate files.
Examples in this section are written following this style. The central problem is to make the rules visible through the maze of action code.
The algorithm used by the yacc parser encourages so-called left recursive grammar rules. Rules of the following form match this algorithm:
name : name rest_of_rule ;
Rules such as:
list : item | list ',' item ;
and:
seq : item | seq item ;
frequently arise when writing specifications of sequences and lists. In each of these cases, the first rule is reduced for the first item only; and the second rule is reduced for the second and all succeeding items.
With right-recursive rules, such as:
seq : item | item seq ;
the parser is somewhat larger; the items are seen and reduced from right to left. More seriously, an internal stack in the parser is in danger of overflowing if an extremely long sequence is read (although yacc can now process very large stacks). Thus, you should use left recursion wherever reasonable.
It is worth considering if a sequence with zero elements has any meaning, and if so, consider writing the sequence specification as:
seq : /* empty */ | seq item ;
using an empty rule. Once again, the first rule is always reduced exactly once before the first item is read, and the second rule is reduced once for each item read. Permitting empty sequences often leads to increased generality. However, conflicts might arise if yacc is asked to decide which empty sequence it has seen when it hasn't seen enough to know.