Programming Utilities Guide

Advanced Topics

This part discusses a number of advanced features of yacc.

Simulating error and accept in Actions

The parsing actions of error and accept can be simulated in an action by use of macros YYACCEPT and YYERROR. The YYACCEPT macro causes yyparse() to return the value 0; YYERROR causes the parser to behave as if the current input symbol had been a syntax error; yyerror() is called, and error recovery takes place.

These mechanisms can be used to simulate parsers with multiple end-markers or context-sensitive syntax checking.

Accessing Values in Enclosing Rules

An action can refer to values returned by actions to the left of the current rule. The mechanism is the same as ordinary actions, $ followed by a digit.

sent	   	: adj noun verb adj noun 
         	{ 
            		look at the sentence ...  
          } 
          ; 
adj	      : THE 
         	{ 
           			$$ = THE; 
         	} 
          | YOUNG 
          { 
           			$$ = YOUNG; 
          } 
            	...  
            		; 
noun	    	: DOG 
        		{ 
           			$$ = DOG; 
         	} 
         	| CRONE 
          { 
           			if ( $0 = = YOUNG ) 
           			{ 
              				(void) printf( "what?\n" ); 
             	} 
           			$$ = CRONE; 
          } 
        		; 
...

In this case, the digit can be 0 or negative. In the action following the word CRONE, a check is made that the preceding token shifted was not YOUNG. Notice, however, that this is only possible when a great deal is known about what might precede the symbol noun in the input. Nevertheless, at times this mechanism prevents a great deal of trouble, especially when a few combinations are to be excluded from an otherwise regular structure.

Support for Arbitrary Value Types

By default, the values returned by actions and the lexical analyzer are integers. yacc can also support values of other types including structures. In addition, yacc keeps track of the types and inserts appropriate union member names so that the resulting parser is strictly type checked. The yacc value stack is declared to be a union of the various types of values desired. You declare the union and associate union member names with each token and nonterminal symbol having a value. When the value is referenced through a $$ or $n construction, yacc automatically inserts the appropriate union name so that no unwanted conversions take place.

Three mechanisms provide for this typing. First, there is a way of defining the union. This must be done by the user since other subroutines, notably the lexical analyzer, must know about the union member names. Second, there is a way of associating a union member name with tokens and nonterminals. Finally, there is a mechanism for describing the type of those few values where yacc cannot easily determine the type.

To declare the union, you include:

%union 
{ 
     body of union 
}

in the declaration section. This declares the yacc value stack and the external variables yylval and yyval to have type equal to this union. If yacc was invoked with the -d option, the union declaration is copied into the y.tab.h file as YYSTYPE.

Once YYSTYPE is defined, the union member names must be associated with the various terminal and nonterminal names. The construction:

<name>

is used to indicate a union member name. If this follows one of the keywords %token, %left, %right, and %nonassoc, the union member name is associated with the tokens listed.

Thus, saying

%left <optype> '+' '-' 

causes any reference to values returned by these two tokens to be tagged with the union member name optype. Another keyword, %type, is used to associate union member names with nonterminals. You could use the rule

%type <nodetype> expr stat

to associate the union member nodetype with the nonterminal symbols expr and stat.

There remain a couple of cases where these mechanisms are insufficient. If there is an action within a rule, the value returned by this action has no a priori type. Similarly, reference to left context values (such as $0) leaves yacc with no easy way of knowing the type. In this case, a type can be imposed on the reference by inserting a union member name between < and > immediately after the first $. The example below:

rule	     	: aaa 
          	{ 
           					$<intval>$ = 3; 
        			} 
          	bbb 
          	{ 
             			fun( $<intval>2, $<other>0 ); 
          	} 
           ;

shows this usage. This syntax has little to recommend it, but the situation arises occasionally.

The facilities in this subsection are not triggered until they are used. In particular, the use of %type turns on these mechanisms. When they are used, there is a fairly strict level of checking.

For example, use of $n or $$ to refer to something with no defined type is diagnosed. If these facilities are not triggered, the yacc value stack is used to hold ints.

yacc Input Syntax

This section has a description of the yacc input syntax as a yacc specification. Context dependencies and so forth are not considered. Although yacc accepts an LALR(1) grammar, the yacc input specification language is specified as an LR(2) grammar; the difficulty arises when an identifier is seen in a rule immediately following an action.

If this identifier is followed by a colon, it is the start of the next rule; otherwise, it is a continuation of the current rule, which just happens to have an action embedded in it. As implemented, the lexical analyzer looks ahead after seeing an identifier and figures out whether the next token (skipping blanks, newlines, comments, and so on) is a colon. If so, it returns the token C_IDENTIFIER.

Otherwise, it returns IDENTIFIER. Literals (quoted strings) are also returned as IDENTIFIERs but never as part of C_IDENTIFIERs.

						/* grammar for the input to yacc */ 

  						/* basic entries */ 
%token IDENTIFIER				/* includes identifiers and literals */ 

%token C_IDENTIFIER			/* identifier (but not literal) */
                     		/* followed by a : */ 

%token NUMBER				  	/* [0-9]+ */

    			/* reserved words: %type=>TYPE %left=>LEFT,etc.  */ 

%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION 

%token MARK			     		/* the %% mark */ 

%token LCURL			 		/* the %{ mark */ 

%token RCURL					/* the %) mark */ 

    				/* ASCII character literals stand for themselves */ 

%token spec t

%% 

spec   		: defs MARK rules tail 
          	; 
tail	   	: MARK 
         	{ 
          			In this action,read in the rest of the file 
         	} 
       		|	     	/* empty: the second MARK is optional */ 
       		; 
defs	   	:	      		/* empty */ 
       		| defs def 
       		; 
def	    	: START IDENTIFIER 
       		| UNION 
       		{ 
            	Copy union definition to output 
         	} 
         	| LCURL 
       		{ 
          		Copy C code to output file 
       		} 
       		RCURL 
           | rword tag nlist
         		;
rword    : TOKEN 
       		| LEFT 
       		| RIGHT 
       		| NONASSOC 
         	| TYPE 
       		; 
tag	  	:		    		/* empty: union tag is optional */ 
       		| '<' IDENTIFIER '>' 
       		; 
nlist	  	: nmno 
       		| nlist nmno 
       		| nlist ',' nmno 
       		; 
nmno	   : IDENTIFIER	/* Note: literal illegal with % type */
 	       	| IDENTIFIER NUMBER		/* Note: illegal with % type */  
         	; 

                 			/* rule section */ 
rules	  	: C_IDENTIFIER rbody prec 
       		| rules rule 
       		; 
rule		   : C_IDENTIFIER rbody prec
       		| '|' rbody prec 
       		; 
rbody	  	:        /* empty */ 
       		| rbody IDENTIFIER 
       		| rbody act 
       		; 
act		: '{' 
       		{ 
            	Copy action translate $$ etc.  
       		} 
       		'}' 
       		; 
prec	   	:        /* empty */ 
         	| PREC IDENTIFIER 
       		| PREC IDENTIFIER act 
         	| prec ';' 
       		;