Class LexerBuffer

Next: Class IOLexerBuffer Up: Lexer Specification Previous: The matchscan statement

Class `LexerBuffer`

We'll next describe the class LexerBuffer and its subclasses.

Class LexerBuffer is the base class in the lexical buffer hierarchy. It is defined in the library include file <AD/automata/lexerbuf.h>. This class is responsible for implementing a string buffer for use during lexical analysis.

As it stands, it can be used directly if the lexer input is directly from a string. Memory management of the buffer is assumed to be handled by the user.

The class LexerBuffer has three constructors. The default constructor initializes the string buffer to NULL. The two other constructors initialize the string buffer to a string given by the user. In the case when the length is not supplied, the buffer is assumed to be '\0'-terminated. The two set_buffer methods can be used to set the current string buffer. Notice that all lexical analysis operations are done in place. The user should not alter the string buffer directly, but should use the interface provided by this class instead.

class LexerBuffer {
public:
   LexerBuffer();
   LexerBuffer(char *);
   LexerBuffer(char *, size_t);
   virtual ~LexerBuffer();
   virtual void set_buffer (char *, size_t);
           void set_buffer (char *);
};

The following methods are used access the string buffer. Method capacity returns the size of the buffer. Method length returns the length of the current matched token. Methods text can be used to obtain a point to location of the current matched token. The string returned is guaranteed to be '\0'-terminated. Methods operator [] return the ith character of the token. Finally, method lookahead returns the character code of the next character to be matched.

   int capacity () const;
   int length   () const;
   const char * text () const;
         char * text ();
   char  operator [] (int i) const;
   char& operator [] (int i);
   int lookahead () const;
   void push_back (int n)

In addition to the string buffer, the class LexerBuffer keeps track of two additional types of information: the current context of the DFA, and whether the next token starts at the beginning of the line, or in our terminology, whether it is anchored. These are manipulated with the following methods:

   int context    () const;
   void set_context (int c = 0);
   Bool is_anchored() const;
   void set_anchored(Bool a = true);

Finally, the following methods should be redefined by subclasses to alter the behavior of this class. By default, the class LexerBuffer calls fill_buffer() when it reaches the end of the string; subclasses can use this method to refill the buffer and return the number of characters read. Currently, fill_buffer is defined to do nothing and return 0. When it reaches the end of the file (i.e. when fill_buffer() fails to refill the buffer and the scanning process finishes), method end_of_file is called. Currently, this is a no-op. Finally, the error handling routine error() is called with the position of the beginning and the end of the buffer in which the error occurs. By default, this routine prints out the buffer.

protected:
   virtual size_t fill_buffer();
   virtual void   end_of_file();
   virtual void   error(const char * start, const char * stop);

Next: Class IOLexerBuffer Up: Lexer Specification Previous: The matchscan statement

Allen Leung
Mon Apr 7 14:33:55 EDT 1997