Generating C++ Scanners with flex

[An excerpt from flex manual]
flex provides two different ways to generate scanners for use with C++.  The first way is to simply compile  a  scanner  generated  by flex  using  a  C++ compiler instead of a C compiler.  You should not encounter any compilations errors (please report any you find to
the email address given in the Author section below).  You can then use C++ code in your rule actions instead of C  code.   Note  that the default input source for your scanner remains yyin, and default echoing is still done to yyout.  Both of these remain FILE * variables and not C++ streams.

You can also use flex to generate a C++ scanner class, using the -+ option (or, equivalently, %option  c++),  which  is  automatically specified  if  the name of the flex executable ends in a ‘+’, such as flex++.  When using this option, flex defaults to generating the scanner to the file instead of lex.yy.c.  The generated scanner includes the header  file FlexLexer.h,  which  defines  the interface to two C++ classes.

The  first  class, FlexLexer, provides an abstract base class defining the general scanner class interface.  It provides the following member functions:

const char* YYText()
returns the text of the most recently matched token, the equivalent of yytext.

int YYLeng()
returns the length of the most recently matched token, the equivalent of yyleng.

int lineno() const
returns the current input line number (see %option yylineno), or 1 if %option yylineno was not used.

void set_debug( int flag )
sets the debugging flag for the scanner, equivalent to assigning to yy_flex_debug .   Note  that you must build the scanner using %option debug to include debugging information in it.

int debug() const
returns the current setting of the debugging flag.

The  second  class defined in FlexLexer.h is yyFlexLexer, which is derived from FlexLexer.  It defines the following additional member functions:

yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
constructs a yyFlexLexer object using the given streams for input and output.  If not specified, the streams default to cin and cout, respectively.

virtual int yylex()
performs  the same role is yylex() does for ordinary flex scanners: it scans the input stream, consuming tokens, until a rule’s action returns a value.  If you derive a subclass S from yyFlexLexer and want to access the member functions and variables of S inside  yylex(),  then  you  need  to  use  %option  yyclass=”S” to inform flex that you will be using that subclass instead of yyFlexLexer.  In this case, rather than generating yyFlexLexer::yylex(), flex generates S::yylex() (and also generates a  dummy yyFlexLexer::yylex() that calls yyFlexLexer::LexerError() if called).

virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0) reassigns yyin to new_in (if non-nil) and yyout to new_out (ditto), deleting the previous input buffer if
yyin is reassigned.

int yylex( istream* new_in, ostream* new_out = 0 )
first switches the input streams via switch_streams( new_in, new_out ) and then returns the value of yylex().

In addition, yyFlexLexer defines the following protected virtual functions which you can redefine in derived  classes  to  tailor  the scanner:

virtual int LexerInput( char* buf, int max_size )
reads up to max_size characters into buf and returns the number of characters read.  To indicate end-of-input, return 0 characters.  Note that “interactive” scanners (see the -B and -I flags) define the macro YY_INTERACTIVE.  If  you  redefine  LexerInput() and need to take different actions depending on whether or not the scanner might be scanning an interactive input source, you can test for the presence of this name via #ifdef.

virtual void LexerOutput( const char* buf, int size )
writes out size characters from the buffer buf, which, while NUL-terminated, may also contain “internal” NUL’s if the scanner’s rules can match text with NUL’s in them.

virtual void LexerError( const char* msg )
reports a fatal error message.  The default version of this function writes the message to the stream cerr and exits.

Note  that  a yyFlexLexer object contains its entire scanning state.  Thus you can use such objects to create reentrant scanners.  You can instantiate multiple instances of the same yyFlexLexer class, and you can also combine multiple C++ scanner  classes  together  in the same program using the -P option discussed above.

Finally, note that the %array feature is not available to C++ scanner classes; you must use %pointer (the default).

An example is presented here.


This entry was posted in flex, lex. Bookmark the permalink.

Comments are closed.