Programming Utilities Guide

Examples

A Simple Example

The following code sample shows the complete yacc applications for a small desk calculator. The calculator has 26 registers labeled a through z and accepts arithmetic expressions made up of the operators +, -, *, /, %, &, |, and the assignment operators.

If an expression at the top level is an assignment, only the assignment is made; otherwise, the expression is printed. As in the C language, an integer that begins with 0 is assumed to be octal; otherwise, it is assumed to be decimal.

As an example of a yacc specification, the desk calculator shows how precedence and ambiguities are used and demonstrates simple recovery. The major oversimplifications are that the lexical analyzer is much simpler than for most applications, and the output is produced immediately, line by line.

Note the way that decimal and octal integers are read by grammar rules. This job can also be performed by the lexical analyzer.

%{ 
# include <stdio.h> 
# include <ctype.h> 

int regs[26]; 
int base; 
%} 

%start list 
%token DIGIT LETTER 
%left '|' 
%left '&' 
%left '+' '-' 
%left '*' '/' %' 
%left UMINUS		      /* supplies precedence for unary minus */ 

%%			         		/* beginning of rules section */ 

list	    	:		     	/* empty */ 
        		| list stat '\n' 
        		| list error '\n' 
        		{ 
           			yyerrok; 
        		}
        		; 
stat	    	: expr 
        		{ 
           			(void) printf( "%d\n", $1 ); 
        		} 
         	| LETTER '=' expr 
        		{ 
           			regs[$1] = $3; 
        		} 
        		; 
expr	    	: '(' expr ')' 
         	{ 
            		$$ = $2; 
        		} 
        		| expr '+' expr 
        		{ 
           			$$ = $1 + $3; 
        		} 
        		| expr '-' expr 
        		{ 
           			$$ = $1 - $3; 
        		{ 
        		| expr '*' expr 
        		{ 
           			$$ = $1 * $3; 
        		} 
         	| expr '/' expr 
        		{ 
           			$$ = $1 / $3; 
      		} 
         	| exp '%' expr 
         	{
           			$$ = $1 % $3; 
      		} 
         	| expr '&' expr 
         	{ 
            		$$ = $1 & $3; 
      		} 
         	| expr '|' expr 
         	{ 
            		$$ = $1 | $3; 
       	 	} 
         	| '-' expr %prec UMINUS 
         	{
           			$$ = -$2; 
         	} 
         	| LETTER 
       	 	{ 
             	$$ = reg[$1]; 
        		} 
       	 	| number 
       	 	; 
number   	: DIGIT 
       	 	{ 
             	$$ = $1; base = ($1= =0) ? 8 ; 10; 
       	 	} 
       	 	| number DIGIT 
       	 	{ 
           			$$ = base * $1 + $2; 
       	 	} 
       	 	; 

%% 		      	/* beginning of subroutines section */ 
int yylex( )		/* lexical analysis routine */ 
{				      /* return LETTER for lowercase letter, */ 
            		/* yylval = 0 through 25 returns DIGIT */ 
             		/* for digit, yylval = 0 through 9 */ 
        			   /* all other characters are returned immediately */ 
   int c;             					/*skip blanks*/ 
   while ((c = getchar()) = = ' ') 
     		; 
           					/* c is now nonblank */ 

   if (islower(c)) {
    		yylval = c - 'a'; 
    		return (LETTER); 
  	} 
   if (isdigit(c)) {
    		yylval = c - '0'; 
    		return (DIGIT); 
   } 
   return (c);  
}

An Advanced Example

This section gives an example of a grammar using some of the advanced features. The desk calculator in Example 1 is modified to provide a desk calculator that performs floating point interval arithmetic. The calculator understands floating point constants, and the arithmetic operations +, -, *, /, and unary -. It uses the registers a through z. Moreover, it understands intervals written

(X,Y)

where X is less than or equal to Y. There are 26 interval-valued variables A through Z that can also be used. The usage is similar to that in Example 1; assignments return no value and print nothing while expressions print the (floating or interval) value.

This example explores a number of features of yacc and C. Intervals are represented by a structure consisting of the left and right endpoint values stored as doubles. This structure is given a type name, INTERVAL, by means of typedef.

The yacc value stack can also contain floating point scalars and integers (used to index into the arrays holding the variable values). Notice that the entire strategy depends strongly on being able to assign structures and unions in C language. In fact, many of the actions call functions that return structures as well.

Note the use of YYERROR to handle error conditions -- division by an interval containing 0 and an interval presented in the wrong order. The error recovery mechanism of yacc is used to throw away the rest of the offending line.

In addition to the mixing of types on the value stack, this grammar also demonstrates a syntax to keep track of the type (for example, scalar or interval) of intermediate expressions. Notice that scalar value can be automatically promoted to an interval if the context demands an interval value. This causes a large number of conflicts when the grammar is run through yacc: 18 shift-reduce and 26 reduce-reduce.

The problem can be seen by looking at the two input lines:

2.5 + (3.5 - 4.)

and:

2.5 + (3.5, 4)

Notice that the 2.5 is to be used in an interval value expression in the second example, but this fact is not known until the comma is read. By this time, 2.5 is finished, and the parser cannot go back and do something else. More generally, it might be necessary to look ahead an arbitrary number of tokens to decide whether to convert a scalar to an interval.

This problem is evaded by having two rules for each binary interval valued operator -- one when the left operand is a scalar and one when the left operand is an interval. In the second case, the right operand must be an interval, so the conversion is applied automatically.

Despite this evasion, there are still many cases where the conversion can be applied or not, leading to the above conflicts. They are resolved by listing the rules that yield scalars first in the specification file; in this way, the conflict is resolved in the direction of keeping scalar-valued expressions scalar valued until they are forced to become intervals.

This way of handling multiple types is instructive. If there were many kinds of expression types instead of just two, the number of rules needed would increase dramatically and the conflicts even more dramatically. Thus, while this example is instructive, it is better practice in a more normal programming language environment to keep the type information as part of the value and not as part of the grammar.

Finally, a word about the lexical analysis. The only unusual feature is the treatment of floating-point constants. The C-language library routine atof() is used to do the actual conversion from a character string to a double-precision value. If the lexical analyzer detects an error, it responds by returning a token that is illegal in the grammar, provoking a syntax error in the parser and thence error recovery. The following code sample is a yacc specification.

%{ 
#include <stdio.h> 
#include <ctype.h> 

typedef struct interval { 
      double lo, hi; 
} INTERVAL; 

INTERVAL vmul(), vdiv(); 
double atof(); 
double dreg[26]; 
INTERVAL vreg[26]; 

%} 

%start lines 

%union { 
      	int ival; 
       	double dval; 
      	INTERVAL vval; 
} 

%token <ival> DREG VREG 		/* indices into dreg, vreg arrays */ 
%token <dval> CONST		    	/* floating point constant */ 
%type <dval> dexp		     		/* expression */ 
%type <vval> vexp							 	/* interval expression */ 

         /* precedence information about the operators */ 

%left '+' '/-' 
%left '*' '/' 

%%			     /* beginning of rules section */ 

lines	   	: /* empty */ 
            | lines line 
         	; 
line		   : dexp '\n' 
            { 
              (void)printf("%15.8f\n", $1); 
            } 
            | vexp '\n' 
            { 
            	(void)printf("(%15.8f, %15.8f)\n", $1.lo, $1.hi);
            } 
            | DREG '=' dexp '\n' 
            {
           		dreg[$1] = $3; 
            } 
            | VREG '=' vexp '\n' 
            { 
          		vreg[$1] = $3; 
            } 
            | error '\n' 
            { 
          		yyerrok; 
            } 
            ; 
dexp		    : CONST 
            | DREG 
            { 
          		$$ = dreg[$1]; 
            } 
            | dexp '+' dexp 
            { 
              $$ = $1 + $3; 
            } 
            | dexp '-' dexp 
            { 
         	   $$ = $1 - $3; 
            } 
            | dexp '*' dexp 
            { 
         	   $$ = $1 * $3; 
            } 
            | dexp '/' dexp 
            { 
         	   $$ = $1 / $3; 
            } 
            | '-' dexp 
            { 
          	   $$ = -$2; 
            } 
            | '(' dexp ')' 
            { 
         	   $$ = $2; 
            } 
            ; 
vexp	    	: dexp 
            { 
         		$$.hi = $$.lo = $1; 
            } 
            | '(' dexp ',' dexp ')' 
            { 
         	   $$.lo = $2; 
          	   $$.hi = $4; 
               if($$.lo > $$.hi) { 
            		  (void) printf("interval out of order\n"); 
            			YYERROR; 
         		 } 
            } 
            | VREG 
            { 
             	$$ = vreg[$1]; 
            } 
            | vexp '+' vexp 
            {
              $$.hi = $1.hi + $3.hi; 
              $$.lo = $1.lo + $3.lo; 
            } 
            | dexp '+' vexp 
            { 
              $$.hi = $1 + $3.hi; 
              $$.lo = $1 + $3.lo; 
            } 
            | vexp '-' vexp
            {
              $$.hi = $1.hi - $3.lo; 
              $$.lo = $1.lo - $3.hi; 
            } 
            | dexp '-' vexp 
            { 
              $$.hi = $1 - $3.lo; 
              $$.lo = $1 - $3.hi; 
            } 
            | vexp '*' vexp 
            { 
              $$ = vmul($1.lo, $1.hi, $3); 
            } 
            | dexp '*' vexp 
            {
              $$ = vmul($1, $1, $3); 
            } 
            | vexp '/' vexp 
            { 
              if (dcheck($3)) YYERROR; 
              $$ = vdiv($1.lo, $1.hi, $3); 
            } 
            | dexp '/' vexp 
            { 
              if (dcheck($3)) YYERROR; 
              $$ = vdiv($1, $1, $3); 
            } 
            | '-' vexp 
            {
              $$.hi = -$2.lo; 
              $$.lo = -$2.hi; 
            } 
            | '(' vexp ')' 
            { 
              $$ = $2; 
            } 
            ; 

%%					         /* beginning of subroutines section */ 

# define BSZ 50			/* buffer size for floating point number */ 

/* lexical analysis */ 

int yylex() 
{ 
       	register int c; 
                   		/* skip over blanks */ 
       	while ((c=getchar()) = = ' ') 
             			; 
       	if (isupper(c)) {
            	yylval.ival = c - 'A'; 
              return(VREG); 
       	} 
       	if (islower(c)) {
              yylval.ival = c - 'a'; 
            	return(DREG); 
       	} 

                   		/* digits, points, exponents */ 
       	if (isdigit(c) || c = = '.') {
            	char buf[BSZ + 1], *cp = buf; 
            	int dot = 0, exp = 0; 
               for (;(cp - buf) < BSZ; ++cp, c = getchar()) {
                *cp = c; 
                if (isdigit(c)) 
              		continue; 
                if (c = = '.') {
                  		if (dot++ || exp) 
                   	return('.');	/* will cause syntax error */
                    continue; 
               	} 
                if (c = = 'e') { 
                				if (exp++) 
               					return('e');	/* will cause syntax error */
                  		   	continue; 
                } 
                /* end of number */
                break; 
             } 
         *cp = '\0'; 
         if (cp - buf >= BSZ) 
            (void)printf("constant too long -- truncated\n");
           else
            ungetc(c, stdin);	/* push back last char read */
            yylval.dval = atof(buf); 
         return(CONST); 
      } 
      return(c); 
} 

INTERVAL 
hilo(a, b, c, d) 
double a, b, c, d; 
  	/* returns the smallest interval containing a, b, c, and d 
     	used by vmul, vdiv routines */ 

    	INTERVAL v; 

    	if (a > b)	{ 
      		v.hi = a; 
      		v.lo = b; 
    	} 
     else	{ 
      		v.hi = b; 
      		v.lo = a; 
    	} 
     if (c > d) { 
      		if (c > v.hi) 
        			v.hi = c;
         		if (d < v.lo)
           			v.lo = d;
     } 
     else { 
        if (d > v.hi) 
         		v.hi = d;
           if (c < v.lo)  
             	v.lo = c;:
     } 
     return(v); 
} 

INTERVAL
vmul(a, b, v) 
double a, b; 
INTERVAL v; 
{
      return(hilo(a * v.hi, a * v.lo, b * v.hi, b * v.lo)); 
}
dcheck(v) 
INTERVAL v; 
{ 
      if (v.hi >= 0.  && v.lo <= 0.) { 
       		(void) printf("divisor interval contains 0.\n"); 
         return(1); 
     	} 
return(0); 
} 

INTERVAL 
vdiv(a, b, v) 
double a, b; 
INTERVAL v; 
{ 
      	return(hilo(a / v.hi, a / v.lo, b / v.hi, b / v.lo)); 
}