Programming Utilities Guide

Ambiguity and Conflicts

A set of grammar rules is ambiguous if some input string can be structured in two or more different ways. For example, the following grammar rule is a natural way of expressing the fact that one way of forming an arithmetic expression is to put two other expressions together with a minus sign between them:

expr : expr '-' expr

Notice that this grammar rule does not completely specify the way that all complex inputs should be structured. For example, if the input is the following:

expr - expr - expr

the rule allows this input to be structured as either:

( expr - expr ) - expr

or as:

expr - ( expr - expr )

The first is called left association, the second right association.

yacc detects such ambiguities when it is attempting to build the parser. Given that the input is as follows, consider the problem that confronts the parser:

expr - expr - expr

When the parser has read the second expr, the input seen is:

expr - expr

It matches the right side of the grammar rule above. The parser could reduce the input by applying this rule. After applying the rule, the input is reduced to expr (the left side of the rule). The parser then reads the final part of the input (as represented below) and again reduce:

- expr

The effect of this is to take the left associative interpretation.

Alternatively, if the parser sees the following:

expr - expr

it could defer the immediate application of the rule and continue reading the input until the following is seen:

expr - expr - expr

It could then apply the rule to the rightmost three symbols, reducing them to expr, which results in the following being left:

expr - expr

Now the rule can be reduced once more. The effect is to take the right associative interpretation. Thus, having read the following, the parser can do one of two legal things, shift or reduce:

expr - expr

It has no way of deciding between them. This is called a shift-reduce conflict. It might also happen that the parser has a choice of two legal reductions. This is called a reduce-reduce conflict. There are never any shift-shift conflicts.

When there are shift-reduce or reduce-reduce conflicts, yacc still produces a parser. It does this by selecting one of the valid steps wherever it has a choice. A rule describing the choice to make in a given situation is called a disambiguating rule.

yacc invokes two default disambiguating rules:

  1. In a shift-reduce conflict, the default is to shift.

  2. In a reduce-reduce conflict, the default is to reduce by the earlier grammar rule (in the yacc specification).

    Rule 1 implies that reductions are deferred in favor of shifts when there is a choice. Rule 2 gives the user rather crude control over the behavior of the parser in this situation, but reduce-reduce conflicts should be avoided when possible.

    Conflicts can arise because of mistakes in input or logic or because the grammar rules (while consistent) require a more complex parser than yacc can construct. The use of actions within rules can also cause conflicts if the action must be done before the parser can be sure which rule is being recognized.

    In these cases, the application of disambiguating rules is inappropriate and leads to an incorrect parser. For this reason, yacc always reports the number of shift-reduce and reduce-reduce conflicts resolved by rules 1 and 2 above.

    In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also possible to rewrite the grammar rules so that the same inputs are read but there are no conflicts. For this reason, most previous parser generators have considered conflicts to be fatal errors.

    This rewriting is somewhat unnatural and produces slower parsers. Thus, yacc produces parsers even in the presence of conflicts.

As an example of the power of disambiguating rules, consider:

stat	  	: IF '(' cond ')' stat 
      		| IF '(' cond ')' stat ELSE stat 
        ;

which is a fragment from a programming language involving an if-then-else statement. In these rules, IF and ELSE are tokens, cond is a nonterminal symbol describing conditional (logical) expressions, and stat is a nonterminal symbol describing statements. The first rule is called the simple if rule and the second the if-else rule.

These two rules form an ambiguous construction because input of the form:

IF ( C1 ) IF ( C2 ) S1 ELSE S2

can be structured according to these rules in two ways:

IF ( C1 ) 
{ 
     IF ( C2 ) 
       		S1 
}
ELSE 
     S2

or:

IF ( C1 ) 
{ 
    IF ( C2 ) 
      		S1 
   	ELSE 
       	S2 
}

where the second interpretation is the one given in most programming languages having this construct; each ELSE is associated with the last preceding un-ELSE'd IF. In this example, consider the situation where the parser has seen the following and is looking at the ELSE:

IF ( C1 ) IF ( C2 ) S1

It can immediately reduce by the simple if rule to get:

IF ( C1 ) stat

and then read the remaining input:

ELSE S2

and reduce:

IF ( C1 ) stat ELSE S2 

by the if-else rule. This leads to the first of the above groupings of the input.

On the other hand, the ELSE may be shifted, S2 read, and then the right-hand portion of:

IF ( C1 ) IF ( C2 ) S1 ELSE S2

can be reduced by the if-else rule to get:

IF ( C1 ) stat

which can be reduced by the simple if rule.

This leads to the second of the above groupings of the input, which is usually the one desired.

Once again, the parser can do two valid things -- there is a shift-reduce conflict. The application of disambiguating rule 1 tells the parser to shift in this case, which leads to the desired grouping. This shift-reduce conflict arises only when there is a particular current input symbol, ELSE, and particular inputs, such as:

IF ( C1 ) IF ( C2 ) S1

have already been seen. In general, there can be many conflicts, and each one is associated with an input symbol and a set of previously read inputs. The previously read inputs are characterized by the state of the parser.

The conflict messages of yacc are best understood by examining the -v output. For example, the output corresponding to the above conflict state might be:

23: shift-reduce conflict (shift 45, reduce 18) on ELSE 

state 23 
     	stat :	  IF ( cond ) stat_ (18) 
     	stat :   IF ( cond ) stat_ELSE stat 
     
     	ELSE	   	shift 45 
           .	  reduce 18

where the first line describes the conflict -- giving the state and the input symbol. The ordinary state description gives the grammar rules active in the state and the parser actions. Recall that the underscore marks the portion of the grammar rules that has been seen.

Thus in the example, in state 23, the parser has seen input corresponding to:

IF ( cond ) stat

and the two grammar rules shown are active at this time. The parser can do two possible things. If the input symbol is ELSE, it is possible to shift into state 45.

State 45 has, as part of its description, the line:

stat : IF ( cond ) stat ELSE_stat

because the ELSE has been shifted in this state. In state 23, the alternative action (specified by .) is to be done if the input symbol is not mentioned explicitly in the actions. In this case, if the input symbol is not ELSE, the parser reduces to:

stat : IF '(' cond ')' stat

by grammar rule 18.

Once again, notice that the numbers following shift commands refer to other states, while the numbers following reduce commands refer to grammar rule numbers. In the y.output file, rule numbers are printed in parentheses after those rules that can be reduced. In most states, there is a reduce action possible, and reduce is the default command. If you encounter unexpected shift-reduce conflicts, look at the -v output to decide whether the default actions are appropriate.