Tokens

Next: The matchscan statement Up: Lexer Specification Previous: Lexeme class

Tokens

Lexical tokens are defined using the datatype declaration. The syntax is as follows.

Tokens_Decl ::= datatype Id :: lexeme =

Token_Spec - Ľ - Token_Spec ;

Token_Spec ::= lexeme class Id Include a lexeme class

| Id [ Regexp ] Single token spec

A token datatype is defined by including one of more lexeme classes, and by defining stand-alone tokens. If a lexeme class is included, all the tokens defined within the lexeme class are included. A C++ enum type of the same name as the token datatype is generated. If a token is given an identifier name, then the same name is used as the enum literal. On the other hand, if a string is used to denote a token, then it can be referred to by prefixing the string with a dot . For example, the token "=>" can be referenced as ."=>" within a program.

As an example, the following token datatype definition is used within the Prop translator. Here, the keywords are first partitioned into 6 different lexeme classes. In additional, the tokens ID_TOK, REGEXP_TOK, etc. are defined.

datatype PropToken :: lexeme =
    lexeme class MainKeywords
  | lexeme class Keywords
  | lexeme class SepKeywords
  | lexeme class Symbols
  | lexeme class Special
  | lexeme class Literals
  | ID_TOK       /{patvar}/
  | REGEXP_TOK   /{regexp}/
  | QUARK_TOK    /#{string}/
  | BIGINT_TOK   /#{sign}{integer}/
  | PUNCTUATIONS /[\<\>\,\.\;\&\|\^\!\~\+\-\*\/\%\?\=\:\\]/
  ;

Allen Leung
Mon Apr 7 14:33:55 EDT 1997

Tokens_Decl	::=	`datatype` Id `::` `lexeme` `=`
		Token_Spec `-` Ľ `-` Token_Spec `;`
Token_Spec	::=	`lexeme class` Id	Include a lexeme class
	\|	Id [ Regexp* ]*	Single token spec