Lexical tokens are defined using the datatype declaration.
The syntax is as follows.
Tokens_Decl ::= datatype Id :: lexeme = Token_Spec - ¼ - Token_Spec ; Token_Spec ::= lexeme class Id Include a lexeme class | Id [ Regexp ] Single token spec
A token datatype is defined by including one of more lexeme classes,
and by defining stand-alone tokens. If a lexeme class is included,
all the tokens defined within the lexeme class are included. A C++
enum type of the same name as the token datatype is generated.
If a token is given an identifier name, then the same name is used
as the enum literal. On the other hand, if a string is used
to denote a token, then it can be referred to by prefixing the
string with a dot . For example, the token "=>" can
be referenced as ."=>" within a program.
As an example, the following token datatype definition is used within the Prop translator. Here, the keywords are first partitioned into 6 different lexeme classes. In additional, the tokens ID_TOK, REGEXP_TOK, etc. are defined.
datatype PropToken :: lexeme =
lexeme class MainKeywords
| lexeme class Keywords
| lexeme class SepKeywords
| lexeme class Symbols
| lexeme class Special
| lexeme class Literals
| ID_TOK /{patvar}/
| REGEXP_TOK /{regexp}/
| QUARK_TOK /#{string}/
| BIGINT_TOK /#{sign}{integer}/
| PUNCTUATIONS /[\<\>\,\.\;\&\|\^\!\~\+\-\*\/\%\?\=\:\\]/
;