next up previous contents index
Next: The matchscan statement Up: Lexer Specification Previous: Lexeme class

Tokens

Lexical tokens are defined using the datatype declaration. The syntax is as follows.


Tokens_Decl ::=datatype Id :: lexeme =
    Token_Spec  - ¼ - Token_Spec ;
Token_Spec ::=lexeme class Id Include a lexeme class
  |  Id [ Regexp ] Single token spec

A token datatype is defined by including one of more lexeme classes, and by defining stand-alone tokens. If a lexeme class is included, all the tokens defined within the lexeme class are included. A C++ enum type of the same name as the token datatype is generated. If a token is given an identifier name, then the same name is used as the enum literal. On the other hand, if a string is used to denote a token, then it can be referred to by prefixing the string with a dot . For example, the token "=>" can be referenced as ."=>" within a program.

As an example, the following token datatype definition is used within the Prop translator. Here, the keywords are first partitioned into 6 different lexeme classes. In additional, the tokens ID_TOK, REGEXP_TOK, etc. are defined.

datatype PropToken :: lexeme =
    lexeme class MainKeywords
  | lexeme class Keywords
  | lexeme class SepKeywords
  | lexeme class Symbols
  | lexeme class Special
  | lexeme class Literals
  | ID_TOK       /{patvar}/
  | REGEXP_TOK   /{regexp}/
  | QUARK_TOK    /#{string}/
  | BIGINT_TOK   /#{sign}{integer}/
  | PUNCTUATIONS /[\<\>\,\.\;\&\|\^\!\~\+\-\*\/\%\?\=\:\\]/
  ;



Allen Leung
Mon Apr 7 14:33:55 EDT 1997