Programming Utilities Guide

Lexical Analysis

You must supply a lexical analyzer to read the input stream and communicate tokens (with values, if desired) to the parser. The lexical analyzer is an integer-valued function called yylex(). The function returns an integer, the token number, representing the kind of token read. If a value is associated with that token, it should be assigned to the external variable yylval.

The parser and the lexical analyzer must agree on these token numbers in order for communication between them to take place. The numbers can be chosen by yacc or the user. In either case, the #define mechanism of C language is used to allow the lexical analyzer to return these numbers symbolically.

For example, suppose that the token name DIGIT has been defined in the declarations section of the yacc specification file. The relevant portion of the lexical analyzer might look like the following to return the appropriate token:

int yylex() 
{ 
     extern int yylval; 
     int c; 
      	...  
     c = getchar(); 
      	...  
     switch (c) 
     { 
        	...  
       case '0': 
       case '1': 
        	...  
       case '9': 
         		yylval = c - '0'; 
         		return (DIGIT); 
        	...  
      } 
        	...  
}

The intent is to return a token number of DIGIT and a value equal to the numerical value of the digit. You put the lexical analyzer code in the subroutines section and the declaration for DIGIT in the declarations section. Alternatively, you can put the lexical analyzer code in a separately compiled file, provided you: