Programming Utilities Guide

Error Handling

Error handling contains many semantic problems. When an error is found, for example, it might be necessary to reclaim parse tree storage, delete or alter symbol table entries, and, typically, set switches to avoid generating any further output.

It is seldom acceptable to stop all processing when an error is found. It is more useful to continue scanning the input to find further syntax errors. This leads to the problem of getting the parser restarted after an error. A general class of algorithms to do this involves discarding a number of tokens from the input string and attempting to adjust the parser so that input can continue.

To allow the user some control over this process, yacc provides the token name error. This name can be used in grammar rules. In effect, it suggests where errors are expected and recovery might take place.

The parser pops its stack until it enters a state where the token error is legal. It then behaves as if the token error were the current lookahead token and performs the action encountered. The lookahead token is then reset to the token that caused the error. If no special error rules have been specified, the processing halts when an error is detected.

To prevent a cascade of error messages, the parser, after detecting an error, remains in error state until three tokens have been successfully read and shifted. If an error is detected when the parser is already in error state, no message is given, and the input token is deleted.

As an example, a rule of the form:

stat : error

means that on a syntax error the parser attempts to skip over the statement in which the error is seen. More precisely, the parser scans ahead, looking for three tokens that might legally follow a statement, and starts processing at the first of these. If the beginnings of statements are not sufficiently distinctive, it might make a false start in the middle of a statement and end up reporting a second error where there is, in fact, no error.

Actions can be used with these special error rules. These actions might attempt to reinitialize tables, reclaim symbol table space, and so forth. Error rules such as the above are very general but difficult to control.

Rules such as the following are somewhat easier:

stat : error ';'

Here, when there is an error, the parser attempts to skip over the statement but does so by skipping to the next semicolon. All tokens after the error and before the next semicolon cannot be shifted and are discarded. When the semicolon is seen, this rule is reduced and any cleanup action associated with it is performed.

Another form of error rule arises in interactive applications where it may be desirable to permit a line to be reentered after an error. The following example:

input	   	: error '\n' 
           { 
             		(void) printf("Reenter last line: " ); 
          	} 
        			input 
           { 
              	$$ = $4; 
           } 
;

is one way to do this. There is one potential difficulty with this approach. The parser must correctly process three input tokens before it admits that it has correctly resynchronized after the error. If the reentered line contains an error in the first two tokens, the parser deletes the offending tokens and gives no message. This is unacceptable.

For this reason, there is a mechanism that can force the parser to believe that error recovery has been accomplished. The statement:

yyerrok ;

in an action resets the parser to its normal mode. The last example can be rewritten as:

input		    : error '\n' 
           	{ 
              		yyerrok; 
             			(void) printf("Reenter last line: " ); 
            } 
          		input 
          		{
             			$$ = $4; 
          		} 
           	;

As previously mentioned, the token seen immediately after the error symbol is the input token at which the error was discovered. Sometimes this is inappropriate; for example, an error recovery action might take upon itself the job of finding the correct place to resume input. In this case, the previous lookahead token must be cleared. The statement:

yyclearin ;

in an action has this effect. For example, suppose the action after error were to call some sophisticated resynchronization routine (supplied by the user) that attempted to advance the input to the beginning of the next valid statement. After this routine is called, the next token returned by yylex() is presumably the first token in a legal statement. The old illegal token must be discarded and the error state reset. A rule similar to:

stat     	: error 
          { 
           			resynch(); 
           			yyerrok ; 
           			yyclearin; 
         	} 
          ;

could perform this.

These mechanisms are admittedly crude but they do allow a simple, fairly effective recovery of the parser from many errors. Moreover, the user can get control to deal with the error actions required by other portions of the program.