Programming Utilities Guide

Using lex and yacc Together

If you work on a compiler project or develop a program to check the validity of an input language, you might want to use the system tool yacc (Chapter 3, yacc -- A Compiler Compiler ). yacc generates parsers, programs that analyze input to insure that it is syntactically correct.

lex and yacc often work well together for developing compilers.

As noted, a program uses the lex-generated scanner by repeatedly calling the function yylex(). This name is convenient because a yacc-generated parser calls its lexical analyzer with this name.

To use lex to create the lexical analyzer for a compiler, end each lex action with the statement return token, where token is a defined term with an integer value.

The integer value of the token returned indicates to the parser what the lexical analyzer has found. The parser, called yyparse() by yacc, then resumes control and makes another call to the lexical analyzer to get another token.

In a compiler, the different values of the token indicate what, if any, reserved word of the language has been found or whether an identifier, constant, arithmetic operator, or relational operator has been found. In the latter cases, the analyzer must also specify the exact value of the token: what the identifier is, whether the constant is, say, 9 or 888, whether the operator is + or *, and whether the relational operator is = or >.

Consider the following portion of lex source for a scanner that recognizes tokens in a "C-like" language:

Table 2-2 Sample lex Source Recognizing Tokens
begin				   	 		return(BEGIN); 
end 			       		return(END); 
while					     	return(WHILE); 
if						       	return(IF); 
package					 	 	return(PACKAGE); 
reverse				 		 	return(REVERSE); 
loop				    		 	return(LOOP); 
             { tokval = put_in_tabl(); 
           			          return(IDENTIFIER); } 
[0-9]+						 { tokval = put_in_tabl(); 
                        return(INTEGER); } 
\+					     	{ tokval = PLUS;
                        return(ARITHOP); } 
\- 			      	{ tokval = MINUS;     		 			      
                        return(ARITHOP); } 
>			      			{ tokval = GREATER;  							        
                        return(RELOP); } 
>=				      	{ tokval = GREATEREQL; 			          
                      		return(RELOP); } 

The tokens returned, and the values assigned to tokval, are integers. Good programming style suggests using informative terms such as BEGIN, END, and WHILE, to signify the integers the parser understands, rather than using the integers themselves.

You establish the association by using #define statements in your C parser calling routine. For example:

#define BEGIN 1 
#define END 2 
#define PLUS 7 

Then, to change the integer for some token type, change the #define statement in the parser rather than change every occurrence of the particular integer.

To use yacc to generate your parser, insert the following statement in the definitions section of your lex source:

#include ""

The file, which is created when yacc is invoked with the -d option, provides #define statements that associate token names such as BEGIN and END with the integers of significance to the generated parser.

To indicate the reserved words in Table 2-2, the returned integer values suffice. For the other token types, the integer value is stored in the variable tokval.

This variable is globally defined so that the parser and the lexical analyzer can access it. yacc provides the variable yylval for the same purpose.

Note that Table 2-2 shows two ways to assign a value to tokval.

As noted, the -d option creates the file, which contains the #define statements that associate the yacc-assigned integer token values with the user-defined token names. Now you can invoke lex with the following command:

$ lex lex.l

You can then compile and link the output files with the command:

$ cc lex.yy.c -ly -ll 

The yacc library is loaded with the -ly option before the lex library with the -ll option to insure that the supplied main() calls the yacc parser.

Also, to use yacc with CC, especially when routines like yyback(), yywrap(), and yylook() in .l files are to be extern C functions, the command line must include the following.

$ CC -D__EXTERN_C__ ... filename